-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cadvisor:latest will not start #2190
Comments
Huh, I don't see any errors that should cause it to exit. Those errors will just cause it not to have OOM events or the machine id. It looks like cadvisor was given an external exit signal: |
I see - so any idea why my server would be killing cAdvisor repeatedly while docker Swarm keeps restarting it. Even a hint on where to look would be appreciated. |
can you try turning up the log verbosity? |
|
Will try turning up log verbosity, though I see that -logtostderr is set in the command: (I pulled this from a recipe for swarmprom). |
Silly question - where would I set log_verbosity via the docker image? Using an ENV variable with the same name? |
Its just a command line argument, like |
Like this? |
yes, that looks correct. |
Same error:
|
we are facing the same issue, turns out the health check is killing the container "Log": [ but the question is why it doesn't answer in the 3 sec. Also interesting to observe is that in the second call that was still successful the response time was already 2.1 sec where the one before was only 0.3 so it was slowing down The logs also show: |
Same here :( when I run: I get: |
Turns out that it's related to the amount of health checks running at the same time on a node moby/moby#33933 After fixing the intervals its started working again. But I would advice to change health check config of cadvisor |
@Tovli it looks like your issue is with the amound of file handles cadvisor wants to open and probably opens more that the default 8124 and crashes. We also had to up the setting |
Thanks @g-bohncke |
Yes, see this issue |
Thanks but that didn't help |
this made it for me.
but the container is up |
Hi, I had the same info with OOM detection and after the container crashed and swarm just restarted it. volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /:/rootfs:ro
- /var/run:/var/run
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
- /sys/fs/cgroup:/cgroup:ro |
Hi. I had the same issue with docker swarm and termination. Cadvisor version v0.34.0 |
Confirm, same problem as above when adding version: '3.7'
services:
cadvisor:
command: ["-logtostderr", "-docker_only", "-v=4"]
#command: ["-url_base_prefix='/cadvisor'", "-logtostderr", "-docker_only", "-v=4"]
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.cadvisor.rule=PathPrefix(`/cadvisor`)"
- "traefik.http.services.cadvisor.loadbalancer.server.port=8080"
mode: global
resources:
limits:
memory: 256M
reservations:
memory: 128M
image: gcr.io/google-containers/cadvisor:latest
volumes:
- /:/rootfs:ro
- /dev/disk/:/dev/disk:ro
- /sys:/sys:ro
- /sys/fs/cgroup:/cgroup:ro
- /var/lib/docker/:/var/lib/docker:ro
- /var/run:/var/run:ro |
same problem as above when adding |
when setting version: '3.7'
services:
cadvisor:
command: ["-docker_only", "-v=4", "--url_base_prefix=/cadvisor"]
healthcheck:
test: ["CMD-SHELL", "wget --quiet --tries=1 --spider http://localhost:8080/cadvisor/healthz || exit 1"]
interval: 30s
timeout: 3s
retries: 0 |
Throws the following error:
info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
manager.go:353] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
manager.go:353] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
manager.go:1246] Exiting thread watching subcontainers
manager.go:462] Exiting global housekeeping thread
cadvisor.go:223] Exiting given signal: terminated
The text was updated successfully, but these errors were encountered: