Cadvisor:latest will not start #2190

gkoerk · 2019-03-06T17:13:55Z

Throws the following error:

info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
manager.go:353] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
manager.go:353] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
manager.go:1246] Exiting thread watching subcontainers
manager.go:462] Exiting global housekeeping thread
cadvisor.go:223] Exiting given signal: terminated

The text was updated successfully, but these errors were encountered:

dashpole · 2019-03-06T18:34:44Z

Huh, I don't see any errors that should cause it to exit. Those errors will just cause it not to have OOM events or the machine id. It looks like cadvisor was given an external exit signal: Exiting given signal: terminated

gkoerk · 2019-03-06T21:43:33Z

I see - so any idea why my server would be killing cAdvisor repeatedly while docker Swarm keeps restarting it. Even a hint on where to look would be appreciated.

dashpole · 2019-03-06T21:46:21Z

can you try turning up the log verbosity? --v=4 should provide more info. It is also really strange that you get each message twice. Mind posting your swarm config for cAdvisor?

gkoerk · 2019-03-06T22:51:12Z

    image: google/cadvisor
    networks:
      - internal
    command: -logtostderr -docker_only
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M
...
networks:
  traefik_public:
    external: true
  internal:
    driver: overlay
    ipam:
      config:
        - subnet: 172.16.29.0/24```

gkoerk · 2019-03-06T22:53:14Z

Will try turning up log verbosity, though I see that -logtostderr is set in the command: (I pulled this from a recipe for swarmprom).

gkoerk · 2019-03-06T23:26:17Z

Silly question - where would I set log_verbosity via the docker image? Using an ENV variable with the same name?

dashpole · 2019-03-07T01:06:08Z

Its just a command line argument, like logtostderr, except it is -v

gkoerk · 2019-03-08T12:34:12Z

Like this?
cadvisor: image: google/cadvisor
networks: - internal
command: -logtostderr -v=4 -docker_only volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro - /:/rootfs:ro
- /var/run:/var/run - /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro deploy:
mode: global resources:
limits: memory: 128M
reservations: memory: 64M

dashpole · 2019-03-08T18:37:05Z

yes, that looks correct.

gkoerk · 2019-03-08T23:07:14Z

Same error:

W0308 23:05:54.984173       1 info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id",
W0308 23:05:55.172163       1 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory,```

g-bohncke · 2019-06-18T11:09:40Z

we are facing the same issue, turns out the health check is killing the container

"Log": [
{
"Start": "2019-06-18T11:39:46.537308094+02:00",
"End": "2019-06-18T11:39:46.806361719+02:00",
"ExitCode": 0,
"Output": "ok % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 2 100 2 0 0 2000 0 --:--:-- --:--:-- --:--:-- 2000\n"
},
{
"Start": "2019-06-18T11:40:16.806662031+02:00",
"End": "2019-06-18T11:40:18.908108447+02:00",
"ExitCode": 0,
"Output": "ok % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 2 100 2 0 0 2000 0 --:--:-- --:--:-- --:--:-- 2000\n"
},
{
"Start": "2019-06-18T11:40:48.908466881+02:00",
"End": "2019-06-18T11:40:51.90882334+02:00",
"ExitCode": -1,
"Output": "Health check exceeded timeout (3s)"
},
{
"Start": "2019-06-18T11:41:23.783973073+02:00",
"End": "2019-06-18T11:41:26.784284629+02:00",
"ExitCode": -1,
"Output": "Health check exceeded timeout (3s)"
},
{
"Start": "2019-06-18T11:41:58.583837334+02:00",
"End": "2019-06-18T11:42:01.584022531+02:00",
"ExitCode": -1,
"Output": "Health check exceeded timeout (3s)"
}

but the question is why it doesn't answer in the 3 sec.

Also interesting to observe is that in the second call that was still successful the response time was already 2.1 sec where the one before was only 0.3 so it was slowing down

The logs also show:
2019/06/18 08:55:31 http: multiple response.WriteHeader calls
2019/06/18 08:57:44 http: multiple response.WriteHeader calls
2019/06/18 09:00:01 http: multiple response.WriteHeader calls
which might be related

Tovli · 2019-07-11T04:43:42Z

Same here :(

when I run:
sudo docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8081:8080 --name=cadvisor google/cadvisor:latest

I get:
W0711 04:46:22.735755 1 info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
W0711 04:46:22.748084 1 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
F0711 04:46:22.748793 1 cadvisor.go:172] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct: no such file or directory

g-bohncke · 2019-07-11T06:14:44Z

Turns out that it's related to the amount of health checks running at the same time on a node moby/moby#33933
moby/moby#31487

After fixing the intervals its started working again. But I would advice to change health check config of cadvisor

g-bohncke · 2019-07-11T06:16:55Z

@Tovli it looks like your issue is with the amound of file handles cadvisor wants to open and probably opens more that the default 8124 and crashes. We also had to up the setting

Tovli · 2019-07-11T06:25:57Z

Thanks @g-bohncke
you mean the fs.inotify.max_user_watches ?
Can you share the value that worked for you?

g-bohncke · 2019-07-11T06:53:50Z

Yes, see this issue
#1581

Tovli · 2019-07-11T08:57:40Z

Thanks but that didn't help
I've set max_user_watches to 1048576 and still get the same logs

Tovli · 2019-07-11T09:02:11Z

this made it for me.
I still see the first two lines in logs

W0711 04:46:22.735755 1 info.go:53] Couldn't collect info from any of the files in "/rootfs/etc/machine-id,/var/lib/dbus/machine-id"
W0711 04:46:22.748084 1 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory

but the container is up

bdoublet91 · 2019-11-05T13:51:43Z

Hi,

I had the same info with OOM detection and after the container crashed and swarm just restarted it.
Mount /sys/fs/cgroup fixed the problem of restart but I still have the info of OOM detection. I read in other issues that doesnt matter except if "you are relying on cAdvisor to report OOM events"

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
      - /sys/fs/cgroup:/cgroup:ro

NikolayMurha · 2019-12-05T16:07:53Z

Hi.

I had the same issue with docker swarm and termination.
When I add -url_base_prefix argument I have problem with container termination and restart.
Error Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory persisted always.

Cadvisor version v0.34.0

ct27stf · 2019-12-12T09:28:08Z

Confirm, same problem as above when adding -url_base_prefix='/cadvisor'
But only on swarm!

version: '3.7'

services:
  cadvisor:
    command: ["-logtostderr", "-docker_only", "-v=4"]
    #command: ["-url_base_prefix='/cadvisor'", "-logtostderr", "-docker_only", "-v=4"]
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.cadvisor.rule=PathPrefix(`/cadvisor`)"
        - "traefik.http.services.cadvisor.loadbalancer.server.port=8080"
      mode: global
      resources:
        limits:
          memory: 256M
        reservations:
          memory: 128M
    image: gcr.io/google-containers/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /dev/disk/:/dev/disk:ro
      - /sys:/sys:ro
      - /sys/fs/cgroup:/cgroup:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /var/run:/var/run:ro

yiranzai · 2020-01-09T01:53:22Z

same problem as above when adding -url_base_prefix='/cadvisor'

ct27stf · 2020-01-09T11:53:39Z

when setting -url_base_prefix= the healthcheck should be overridden to reflect it
eg

version: '3.7'

services:
  cadvisor:
    command: ["-docker_only", "-v=4", "--url_base_prefix=/cadvisor"]
    healthcheck:
      test: ["CMD-SHELL", "wget --quiet --tries=1 --spider http://localhost:8080/cadvisor/healthz || exit 1"]
      interval: 30s
      timeout: 3s
      retries: 0

dashpole added the kind/support label Mar 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cadvisor:latest will not start #2190

Cadvisor:latest will not start #2190

gkoerk commented Mar 6, 2019

dashpole commented Mar 6, 2019

gkoerk commented Mar 6, 2019

dashpole commented Mar 6, 2019

gkoerk commented Mar 6, 2019

gkoerk commented Mar 6, 2019

gkoerk commented Mar 6, 2019

dashpole commented Mar 7, 2019

gkoerk commented Mar 8, 2019

dashpole commented Mar 8, 2019

gkoerk commented Mar 8, 2019

g-bohncke commented Jun 18, 2019 •

edited

Loading

Tovli commented Jul 11, 2019 •

edited

Loading

g-bohncke commented Jul 11, 2019

g-bohncke commented Jul 11, 2019

Tovli commented Jul 11, 2019 •

edited

Loading

g-bohncke commented Jul 11, 2019

Tovli commented Jul 11, 2019

Tovli commented Jul 11, 2019

bdoublet91 commented Nov 5, 2019

NikolayMurha commented Dec 5, 2019 •

edited

Loading

ct27stf commented Dec 12, 2019 •

edited

Loading

yiranzai commented Jan 9, 2020

ct27stf commented Jan 9, 2020

Cadvisor:latest will not start #2190

Cadvisor:latest will not start #2190

Comments

gkoerk commented Mar 6, 2019

dashpole commented Mar 6, 2019

gkoerk commented Mar 6, 2019

dashpole commented Mar 6, 2019

gkoerk commented Mar 6, 2019

gkoerk commented Mar 6, 2019

gkoerk commented Mar 6, 2019

dashpole commented Mar 7, 2019

gkoerk commented Mar 8, 2019

dashpole commented Mar 8, 2019

gkoerk commented Mar 8, 2019

g-bohncke commented Jun 18, 2019 • edited Loading

Tovli commented Jul 11, 2019 • edited Loading

g-bohncke commented Jul 11, 2019

g-bohncke commented Jul 11, 2019

Tovli commented Jul 11, 2019 • edited Loading

g-bohncke commented Jul 11, 2019

Tovli commented Jul 11, 2019

Tovli commented Jul 11, 2019

bdoublet91 commented Nov 5, 2019

NikolayMurha commented Dec 5, 2019 • edited Loading

ct27stf commented Dec 12, 2019 • edited Loading

yiranzai commented Jan 9, 2020

ct27stf commented Jan 9, 2020

g-bohncke commented Jun 18, 2019 •

edited

Loading

Tovli commented Jul 11, 2019 •

edited

Loading

Tovli commented Jul 11, 2019 •

edited

Loading

NikolayMurha commented Dec 5, 2019 •

edited

Loading

ct27stf commented Dec 12, 2019 •

edited

Loading