New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cAdvisor CPU and memory stats are interleaved with zero values using influxdb storage #1596

Open
kykc opened this Issue Feb 20, 2017 · 1 comment

Comments

Projects
None yet
3 participants
@kykc

kykc commented Feb 20, 2017

I have following setup: two machines with cAdvisor instances for docker containers monitoring are storing stats in one influxdb instance. Values from one host are fine, but CPU and memory values from other host are interleaved with zero values (network RX and TX are fine).

Memory

> select value from memory_working_set where container_name = 'automatl-nginx' and machine='yes' and time > now() - 2m;
name: memory_working_set
time                           value
----                           -----
2017-02-20T09:59:44.35382052Z  5427200
2017-02-20T09:59:55.918964183Z 5427200
2017-02-20T09:59:56.939628781Z 0
2017-02-20T10:00:03.534807368Z 0
2017-02-20T10:00:05.134850329Z 5427200
2017-02-20T10:00:14.253268778Z 5427200
2017-02-20T10:00:17.224685359Z 0
2017-02-20T10:00:25.407112131Z 0
2017-02-20T10:00:28.393461725Z 5427200
2017-02-20T10:00:34.160039931Z 0
2017-02-20T10:00:35.914869051Z 5427200
2017-02-20T10:00:39.437089206Z 0
2017-02-20T10:00:42.557736203Z 5427200
2017-02-20T10:00:45.222014704Z 0
2017-02-20T10:00:49.447857437Z 5427200
2017-02-20T10:00:53.224905968Z 0
2017-02-20T10:00:55.875725626Z 5427200
2017-02-20T10:01:05.236738595Z 5427200
2017-02-20T10:01:07.842789489Z 0
2017-02-20T10:01:12.361159655Z 5427200
2017-02-20T10:01:16.072569402Z 0
2017-02-20T10:01:28.09744548Z  5427200
2017-02-20T10:01:33.974212241Z 0

CPU

> select value from cpu_usage_total where container_name = 'automatl-nginx' and machine='yes' and time > now() - 2m;
name: cpu_usage_total
time                           value
----                           -----
2017-02-20T10:30:20.203098002Z 0
2017-02-20T10:30:23.89182392Z  5984427831
2017-02-20T10:30:27.364900331Z 0
2017-02-20T10:30:32.674973275Z 0
2017-02-20T10:30:33.74144536Z  5984427831
2017-02-20T10:30:40.40165672Z  0
2017-02-20T10:30:42.818223306Z 5988148102
2017-02-20T10:30:46.004102576Z 0
2017-02-20T10:30:51.845597263Z 5996921977
2017-02-20T10:30:55.271754316Z 0
2017-02-20T10:30:58.534550805Z 5996921977
2017-02-20T10:31:01.113449559Z 0
2017-02-20T10:31:04.693572706Z 5997105535
2017-02-20T10:31:06.85067426Z  0
2017-02-20T10:31:10.818531188Z 5997105535
2017-02-20T10:31:17.135278202Z 5997105535
2017-02-20T10:31:17.487825779Z 0
2017-02-20T10:31:26.299167961Z 5997105535
2017-02-20T10:31:36.748532793Z 5997105535
2017-02-20T10:31:51.179901934Z 0

RX bytes

> select value from rx_bytes where container_name = 'automatl-nginx' and machine='yes' and time > now() - 2m;
name: rx_bytes
time                           value
----                           -----
2017-02-20T10:32:12.911368038Z 17402280
2017-02-20T10:32:14.448737635Z 17402280
2017-02-20T10:32:19.741401432Z 17402335
2017-02-20T10:32:25.485651365Z 17402390
2017-02-20T10:32:26.882223754Z 17402390
2017-02-20T10:32:33.883747742Z 17402432
2017-02-20T10:32:35.098140642Z 17402487
2017-02-20T10:32:41.268285718Z 17402529
2017-02-20T10:32:42.587017456Z 17402529
2017-02-20T10:32:50.543289786Z 17402584
2017-02-20T10:32:50.945688242Z 17402584
2017-02-20T10:32:56.988594257Z 17402639
2017-02-20T10:32:58.868506473Z 17402639
2017-02-20T10:33:04.670031763Z 17402789
2017-02-20T10:33:05.673421709Z 17402789
2017-02-20T10:33:10.049249172Z 17402789
2017-02-20T10:33:10.857520763Z 17402789
2017-02-20T10:33:21.857757454Z 17402789
2017-02-20T10:33:22.291928346Z 17402789
2017-02-20T10:33:31.786741587Z 17402789
2017-02-20T10:33:41.138997526Z 17402789

cAdvisor configuration

automatl-cadvisor:
  image: google/cadvisor:v0.24.1
  command: -storage_driver=influxdb -storage_driver_db=cadvisor -storage_driver_host=influxsrv:8086 -storage_driver_secure -storage_driver_user=user -storage_driver_password=password -housekeeping_interval=5s -storage_driver_buffer_duration=30s -max_procs=4 -docker_only=true
  restart: always
  container_name: automatl-cadvisor
#  ports:
#    - "8099:8080"
  volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:rw
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro

influxdb configuration

automatl-influxdb:
  restart: always
  container_name: automatl-influxdb
  build: .
  ports:
#   - "172.17.0.1:8083:8083"
   - "8086:8086"
   - "25826:25826/udp"
  volumes:
   - ./data:/var/lib/influxdb
   - ./influxdb.conf:/etc/influxdb/influxdb.conf
   - ./types.db:/usr/share/collectd/types.db:ro
   - ./certificates:/certs:ro
   - ./auth_file:/etc/collectd/auth_file:ro
FROM influxdb:1.2

USER influxdb

ENTRYPOINT ["/entrypoint.sh"]
CMD ["influxd"]

Dockerfile is used just to switch user and drop influxdb process privileges.

Some details about both hosts

Both hosts are real machines.

The one with the issue

kykc@yes:~$ uname -r
4.4.0-62-generic
kykc@yes:~$ docker --version
Docker version 1.13.1, build 092cba3
kykc@yes:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.2 LTS
Release:	16.04
Codename:	xenial
kykc@yes:~$ cat /proc/cpuinfo |grep 'model name'
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Working fine

kykc@crimson:~$ uname -r
3.16.0-4-amd64
kykc@crimson:~$ docker --version
Docker version 1.13.1, build 092cba3
kykc@crimson:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.7 (jessie)
Release:	8.7
Codename:	jessie
kykc@crimson:~$ cat /proc/cpuinfo |grep 'model name'
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz
model name	: Intel(R) Atom(TM) CPU D525   @ 1.80GHz

Random observations

  1. Sometimes after restart of the docker engine with systemctl restart docker problem disappears, but after some time zero values start to appear once again.
  2. Restart of both cAdvisor and influxdb containers yields no results.
  3. I've tried tweaking various config values like -max_procs, -housekeeping_interval without any effect on the observed behavior.

Obvious workaround

Just in case someone will struggle with the same issue: for the time being I've just added AND value > 0 to the WHERE clause of my queries in grafana dashboards. It seems pretty safe thing to do as memory working set shouldn't be zero for running container (probably), and CPU values are integral, so they shouldn't drop to zero after single used cycle.

@Alger7w

This comment has been minimized.

Alger7w commented Jan 2, 2018

i met the same problem....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment