-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Summary:
Since upgrading hosts from Ubuntu 20.04 to Ubuntu 24.04, cAdvisor intermittently fails to register external Prometheus collectors configured via the io.cadvisor.metric.prometheus- container label. Core host/container metrics (CPU/RAM/etc.) keep working, but application-level scraping (HTTP /metrics) stops for all containers on the machine after a container starts/restarts. Restarting cAdvisor temporarily restores app-level scraping.
Using --cgroupns=host makes it work initially, but the failure reappears on the next container (re)start.
How we configure per-container scraping
We use cAdvisor’s feature to scrape app metrics from additional endpoints, keyed by a label on the target container:
- Label on the target container:
io.cadvisor.metric.prometheus-myapp=/var/cadvisor/cadvisor_config.json
- The JSON file cAdvisor reads:
{"endpoint": "http://169.244.0.1:19100/metrics"}
What happens
After a container (re)start, cAdvisor logs:
failed to register collectors for /system.slice/docker-.scope: failed to read config file "prometheus-myapp" for config /var/cadvisor/cadvisor_config.json container /system.slice/docker-.scope failed to execute "/usr/sbin/chroot" command exit status 1
From that point, app-level scraping stops for all containers on the host. Host/container resource metrics continue to be exported. Restarting the cAdvisor container restores app scraping until the next container (re)start.
Steps to reproduce
- Host on Ubuntu 24.04 (worked on 20.04).
- Run cAdvisor and mount the config path that the label references.
- Run a target container with label:
io.cadvisor.metric.prometheus-myapp=/var/cadvisor/cadvisor_config.json
Ensure /var/cadvisor/cadvisor_config.json inside cAdvisor contains:
{"endpoint": "http://:/metrics"} - Observe that app metrics are scraped.
- Restart the target container or start another container on the host.
- Observe the error above; app-level scraping stops for all containers.
Expected behavior
cAdvisor should continue to read the label-referenced config and scrape the configured HTTP endpoint for every labeled container, across container (re)starts.
Actual behavior
After a container (re)start, cAdvisor logs a failed to execute "/usr/sbin/chroot" error for the systemd slice and stops app-level scraping globally until cAdvisor is restarted.
Workaround tried
Running cAdvisor with --cgroupns=host:
Works initially, but the failure recurs after the next container (re)start.
Restarting cAdvisor:
Temporarily restores app-level scraping.
Environment
Host OS: Ubuntu 24.04 (upgraded from 20.04)
Container runtime: Docker
cAdvisor : v0.52.1
cgroups: v2 on Ubuntu 24.04
cAdvisor flags: --docker_only=true --max_procs=1 --application_metrics_count_limit=1000 --storage_duration=30s