Skip to content

[bug] Ubuntu 24 → cgroups issue: app-level scrape via io.cadvisor.metric.prometheus-* label fails after container (re)starts #3739

@idanlodzki

Description

@idanlodzki

Summary:

Since upgrading hosts from Ubuntu 20.04 to Ubuntu 24.04, cAdvisor intermittently fails to register external Prometheus collectors configured via the io.cadvisor.metric.prometheus- container label. Core host/container metrics (CPU/RAM/etc.) keep working, but application-level scraping (HTTP /metrics) stops for all containers on the machine after a container starts/restarts. Restarting cAdvisor temporarily restores app-level scraping.

Using --cgroupns=host makes it work initially, but the failure reappears on the next container (re)start.

How we configure per-container scraping

We use cAdvisor’s feature to scrape app metrics from additional endpoints, keyed by a label on the target container:

  • Label on the target container:

io.cadvisor.metric.prometheus-myapp=/var/cadvisor/cadvisor_config.json

  • The JSON file cAdvisor reads:

{"endpoint": "http://169.244.0.1:19100/metrics"}

What happens

After a container (re)start, cAdvisor logs:

failed to register collectors for /system.slice/docker-.scope: failed to read config file "prometheus-myapp" for config /var/cadvisor/cadvisor_config.json container /system.slice/docker-.scope failed to execute "/usr/sbin/chroot" command exit status 1

From that point, app-level scraping stops for all containers on the host. Host/container resource metrics continue to be exported. Restarting the cAdvisor container restores app scraping until the next container (re)start.

Steps to reproduce

  1. Host on Ubuntu 24.04 (worked on 20.04).
  2. Run cAdvisor and mount the config path that the label references.
  3. Run a target container with label:
    io.cadvisor.metric.prometheus-myapp=/var/cadvisor/cadvisor_config.json
    Ensure /var/cadvisor/cadvisor_config.json inside cAdvisor contains:
    {"endpoint": "http://:/metrics"}
  4. Observe that app metrics are scraped.
  5. Restart the target container or start another container on the host.
  6. Observe the error above; app-level scraping stops for all containers.

Expected behavior

cAdvisor should continue to read the label-referenced config and scrape the configured HTTP endpoint for every labeled container, across container (re)starts.

Actual behavior

After a container (re)start, cAdvisor logs a failed to execute "/usr/sbin/chroot" error for the systemd slice and stops app-level scraping globally until cAdvisor is restarted.

Workaround tried

Running cAdvisor with --cgroupns=host:

Works initially, but the failure recurs after the next container (re)start.

Restarting cAdvisor:

Temporarily restores app-level scraping.

Environment

Host OS: Ubuntu 24.04 (upgraded from 20.04)
Container runtime: Docker
cAdvisor : v0.52.1
cgroups: v2 on Ubuntu 24.04
cAdvisor flags: --docker_only=true --max_procs=1 --application_metrics_count_limit=1000 --storage_duration=30s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions