New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-container stats with lxd #673
Comments
There is a cgroups section in netdata.conf (get it from your running server http://your.netdata:19999/netdata.conf). Could you please post it here? This is mine:
I didn't have access to |
It looks like this:
|
I see in your cgroupsfs there is a
Could you please post your |
However looking in more detail, it seems the files exist in the directories netdata already checks. Could you please set:
Start netdata, wait 30 seconds, stop it and give me Remember to remove this flag before starting it again. It produces a lot of debugging output. |
working on the same machine as @candlerb nsrc@brian:~$ cat /proc/self/mountinfo
18 23 0:17 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
19 23 0:4 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
20 23 0:6 / /dev rw,relatime - devtmpfs udev rw,size=8156856k,nr_inodes=2039214,mode=755
21 20 0:14 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
22 23 0:18 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=1633688k,mode=755
23 0 252:0 / / rw,relatime - ext4 /dev/dm-0 rw,errors=remount-ro,data=ordered
24 18 0:19 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
25 18 0:20 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
26 18 0:7 / /sys/kernel/debug rw,relatime - debugfs none rw
27 18 0:12 / /sys/kernel/security rw,relatime - securityfs none rw
29 24 0:22 / /sys/fs/cgroup/cpuset rw,relatime - cgroup cgroup rw,cpuset
30 24 0:23 / /sys/fs/cgroup/cpu rw,relatime - cgroup cgroup rw,cpu
31 24 0:24 / /sys/fs/cgroup/cpuacct rw,relatime - cgroup cgroup rw,cpuacct
32 24 0:25 / /sys/fs/cgroup/blkio rw,relatime - cgroup cgroup rw,blkio
33 24 0:26 / /sys/fs/cgroup/memory rw,relatime - cgroup cgroup rw,memory
34 24 0:27 / /sys/fs/cgroup/devices rw,relatime - cgroup cgroup rw,devices
35 24 0:28 / /sys/fs/cgroup/freezer rw,relatime - cgroup cgroup rw,freezer
36 24 0:29 / /sys/fs/cgroup/net_cls rw,relatime - cgroup cgroup rw,net_cls
37 24 0:30 / /sys/fs/cgroup/perf_event rw,relatime - cgroup cgroup rw,perf_event
38 24 0:31 / /sys/fs/cgroup/net_prio rw,relatime - cgroup cgroup rw,net_prio
39 24 0:32 / /sys/fs/cgroup/hugetlb rw,relatime - cgroup cgroup rw,hugetlb
40 24 0:33 / /sys/fs/cgroup/pids rw,relatime - cgroup cgroup rw,pids
28 18 0:21 / /sys/firmware/efi/efivars rw,relatime - efivarfs none rw
41 22 0:34 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k
42 22 0:35 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
43 22 0:36 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755
44 18 0:37 / /sys/fs/pstore rw,relatime - pstore none rw
45 23 8:2 / /boot rw,relatime - ext2 /dev/sda2 rw,block_validity,barrier,user_xattr,acl,stripe=4
46 23 252:3 / /data rw,noatime - ext4 /dev/mapper/nsrc-data rw,data=ordered
47 45 8:1 / /boot/efi rw,relatime - vfat /dev/sda1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
48 23 252:3 /lxd /var/lib/lxd rw,noatime - ext4 /dev/mapper/nsrc-data rw,data=ordered
49 24 0:38 / /sys/fs/cgroup/systemd rw,relatime - cgroup name=systemd rw,name=systemd
50 19 0:39 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime - binfmt_misc binfmt_misc rw
52 22 0:40 / /run/rpc_pipefs rw,relatime - rpc_pipefs rpc_pipefs rw
54 22 0:42 / /run/lxcfs/controllers rw,relatime - tmpfs tmpfs rw,size=100k,mode=700
55 54 0:38 / /run/lxcfs/controllers/name=systemd rw,relatime - cgroup name=systemd rw,name=systemd
56 54 0:33 / /run/lxcfs/controllers/pids rw,relatime - cgroup pids rw,pids
57 54 0:32 / /run/lxcfs/controllers/hugetlb rw,relatime - cgroup hugetlb rw,hugetlb
58 54 0:31 / /run/lxcfs/controllers/net_prio rw,relatime - cgroup net_prio rw,net_prio
59 54 0:30 / /run/lxcfs/controllers/perf_event rw,relatime - cgroup perf_event rw,perf_event
60 54 0:29 / /run/lxcfs/controllers/net_cls rw,relatime - cgroup net_cls rw,net_cls
61 54 0:28 / /run/lxcfs/controllers/freezer rw,relatime - cgroup freezer rw,freezer
62 54 0:27 / /run/lxcfs/controllers/devices rw,relatime - cgroup devices rw,devices
63 54 0:26 / /run/lxcfs/controllers/memory rw,relatime - cgroup memory rw,memory
64 54 0:25 / /run/lxcfs/controllers/blkio rw,relatime - cgroup blkio rw,blkio
65 54 0:24 / /run/lxcfs/controllers/cpuacct rw,relatime - cgroup cpuacct rw,cpuacct
66 54 0:23 / /run/lxcfs/controllers/cpu rw,relatime - cgroup cpu rw,cpu
67 54 0:22 / /run/lxcfs/controllers/cpuset rw,relatime - cgroup cpuset rw,cpuset
68 23 0:43 / /var/lib/lxcfs rw,nosuid,nodev,relatime - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
69 48 252:3 /lxd/shmounts /var/lib/lxd/shmounts rw,noatime shared:1 - ext4 /dev/mapper/nsrc-data rw,data=ordered
nsrc@brian:~$ I've also attached the debug. |
Unfortunately it does not have the info I need. I have added cgroups devices and also the debug info I need in my private fork. Could you please clone and install https://github.com/ktsaou/netdata ? I need the debug info again. |
I think I know what is happening. User Can you check its permissions? |
Ah yes!
Also debug after building your private fork: |
Is there a configuration in lxd to create these directories with read permission to others? |
BTW, I added error logging to have such errors logged in error.log. |
But I can't see how /run/lxcfs gets populated...
And the contents are different:
I will dig further for lxd config. |
Check your /proc/self/mountinfo. You have |
merged the error logging addition and the examination the |
I can't find the lxd magic which sets up this directory. I could raise as issue on lxc/lxcfs or lxc/lxd. However I don't understand: should netdata be looking at
Access to the former seems permitted to normal users, so I've tried configuring this explicitly:
but still there's no cgroups/containers section in the web UI. New debug: |
I openly admit to having no clue about how cgroups work.
|
Yay! it works with:
(Aside: for the first container I looked at, "CPU usage per core" is peaking above 250,000% -but that's a separate issue) |
cgroups are very simple (ridiculously simple is probably a better term), but the variations among containers managers, kernel versions and inits (e.g. systemd) make it really chaotic. Every system is different. netdata finds the cgroups mount points by examining Now I see it parses the contents of the directories, but the there is no To understand how netdata works, this is the commands netdata emulates: find DIRECTORY -type f -a \( -name cpuacct.stat -o -name cpuacct.usage_percpu -o -name memory.stat -o -name blkio.io_service_bytes -o -name blkio.io_serviced -o -name blkio.throttle.io_service_bytes -o -name blkio.throttle.io_serviced -o -name blkio.io_merged -o -name blkio.io_queued \) It does this in the 4 directories mentioned above. These 4 directories are detected from For each file found, it tries to find the relative path in the cgroups hierarchy. Then it has some heuristics based on my findings on which are expected to be containers and which are not. These heuristics only enable or disable the given cgroup in netdata (you can overwrite this decision in netdata.conf) |
Here's mountinfo:
Using your find command on /sys/fs/cgroup:
If I understand correctly, lxcfs is to give a simulated cgroup hierarchy inside a container, but on the host itself I presume talking directly to /sys/fs/cgroup is fine? |
it seems so... |
Have you tried it? |
Yes it is working, thank you. Maybe netdata's search could prefer /sys/fs/cgroup over the other alternatives? But otherwise, configuring it manually is not a major problem. Or I can raise the question over with lxd as to why /run/lxcfs is root-only. |
wow! a lot of containers! I guess the per core cpu should |
my bad. I had divided with |
merged |
Yes. We are running a training workshop so lots of containers one for each participant. Thanks for tracking the bug. |
ok. I am closing this. If you need more help, just post. |
Thank you so much for your help, and for writing such a fantastic tool. As it happens, the training workshop is on network monitoring and management. Netdata will certainly get a special mention 👍 |
nice! thanks! |
https://build.opensuse.org/request/show/994622 by user mia + dimstar_suse - Drop netdata-smartd-log-path.patch - Update to 1.36.0 (go.d.plugin 0.35.0) Collectors * New + Add PgBouncer collector (go.d/pgbouncer) gh#netdata/netdata#748 + Add WireGuard collector (go.d/wireguard) gh#netdata/netdata#744 + Add PostgresSQL collector (go.d/postgres) gh#netdata/netdata#718 + Add Chrony collector (go.d/chrony) gh#netdata/netdata#678 + Add Kubernetes State collector (go.d/k8s_state) gh#netdata/netdata#673 * Improvements + Add WireGuard description and icon to dashboard info gh#netdata/netdata#13483 + Resolve nomad containers name (cgroups.plugin) gh#netdata/netdata#13481 + Update postgres dashboard info gh#netdata/netdata#13474 + Improve Chrony dashboard info gh#netdata/netdata#133
Platform: ubuntu 14.04, lxd from ppa, today's netdata from git
Problem: I am running a bunch of lxd containers but I don't see any of them in netdata output. I know there is support for lxc containers and docker.
I have tried uncommenting
# cgroups = yes
in/opt/netdata/etc/netdata/netdata.conf
but that didn't make a difference (in any case, the comment implies it's enabled by default)Could it be that there's something missing on my system at the time of building netdata, which is stopping the plugin from being compiled?
Or is it that the cgroup hierarchy for lxd is different to lxc and docker? I attach the hierarchy info for one container called "pc36"
cgroup-pc36.txt
lxcfs-pc36.txt
The text was updated successfully, but these errors were encountered: