FR memory and cpu statistic per systemd unit #1136

pastukhov · 2018-10-29T13:10:51Z

Host operating system: output of `uname -a`

Linux stage-br-31-node-1 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 0.15.2 (branch: HEAD, revision: 98bc64930d34878b84a0f87dfe6e1a6da61e532d) build user: root@d5c4792c921f build date: 20171205-14:50:53 go version: go1.9.2

node_exporter command line flags

opt/prometheus_exporters/node_exporter --web.listen-address=0.0.0.0:9100 --collector.conntrack --collector.diskstats --collector.entropy --collector.filefd --collector.filesystem --collector.loadavg --collector.mdadm --collector.meminfo --collector.netdev --collector.netstat --collector.sockstat --collector.stat --collector.textfile --collector.time --collector.uname --collector.vmstat --collector.ntp --collector.tcpstat --collector.systemd

Are you running node_exporter in Docker?

No

What did you do that produced an error?

There is no metric about memory and cpu usage per systemd unit

What did you expect to see?

Similar to https://github.com/cavaliercoder/zabbix-module-systemd

What did you see instead?

Nothing

The text was updated successfully, but these errors were encountered:

discordianfish · 2018-11-05T15:31:41Z

So far we decided to not support cgroup metrics and leave that to cadvisor.

@SuperQ What do you think? Tbh, I often thought that maybe cgroup metrics would be a better fit for the node-exporter than some rather specific stuff we already include..
On the other hand, maybe a complex system like systemd warrants it's own dedicated exporter (or even direct instrumentation).

SuperQ · 2018-11-07T16:28:45Z

I've been mostly fine with things that are systemd-specific in the systemd collector, but generic stuff like cgroup stats are probably best left to cAdvisor or some other stand-alone cgroup collector.

For now, I think the decision for this is still no cgroup metrics in node_exporter.

baryluk · 2019-03-22T12:46:10Z

I really miss memory and CPU (sys+user) totals per systemd unit in node_exporter. node exporter already exports various per systemd unit stats like number of tasks.

When I manually call systemctl status >unit<, it does already report to me Memory and CPU, so if there is an API to get this from systemd, node exporter doesn't even need to mess with cgroups directly at all.

cAdvisor is rather very heavy solution, and requires quiet a bit of configuration to get everything working. It should be simpler than that.

SuperQ · 2019-03-22T13:07:53Z

While I understand you want something simpler, it's simply out of scope for the node_exporter. The node_exporter is for the node, not the applications running on the node.

@povilasv has started work on https://github.com/povilasv/systemd_exporter. This will hopefully be a replacement for the systemd functionality in the node_exporter.

baryluk · 2019-03-22T13:26:40Z

@SuperQ systemd_exporter actually looks excellent.

My comment was mainly because there is already systemd functionality in node_exporter and some per-unit metrics there, so it was natural extension to have these extra metrics (that are already easy to obtain from systemd). The systemd per unit CPU and memory, are useful for monitoring and troubleshooting the node as a whole, i.e. to see which unit is consuming most resources, or to know preemptively that the node is running our of some resources and needs resizing or moving some services to separate machines. It is not necessarily about monitoring applications. In fact some of the functionality provided by systemd monitoring, can't even be easily done from inside application (i.e. if application is multi process, for example postgresql, and others).

Is there a plan to remove systemd support from node? I guess it make sense to not duplicate functionality. Just curious.

SuperQ · 2019-03-23T14:15:16Z

The systemd support in the node_exporter is to expose things that are systemd-specific. Like unit states or the number of systemd restart actions. Monitoring process CPU and memory is not really systemd-specific, it's process-specific. Hence, out of scope and better done with the process itself. For example, I don't need to get CPU and memory stats from CoreDNS, as the process is already monitored by Prometheus and has those metrics.

For the PostgreSQL example, it would be interesting to have the CPU and memory function a part of the postgres_exporter. I've thought about doing this with the mysqld_exporter as an optional collector.

Otherwise, it would be better to get these metrics using the process-exporter, or cAdvisor.

For example, we use the process-exporter to monitor PostgreSQL resource use.

baryluk · 2019-03-23T18:29:27Z

@SuperQ Thanks for your input. I do not want to make this into long discussion, especially as this issue here is closed, but...

I am aware of postgres_exporter and process-exporter, and in fact I do use them.

The problem with process_exporter is that it monitor processes, but postgresql creates many processes, many of which are created and destroyed rapidly. Because process_exporter is based on sampling procfs, it will miss many of these processes completely (using process creation/destruction notifications using inotify/netlink or whatever, could help, but it has systemwide overhead). Additionally the reported cpu_seconds_total will be often incorrect even if it doesn't miss these processes, because to correctly calculate rate over this (cumulative) metric (i.e. to calculate cpu-seconds-per-second), you need to points in time that can be substracted reliably (i.e. they need to be some reasonable sums itself, constructed from monotonic quantities). To do that, process-exporter does quiet a bit of number and statistic juggling, i.e. it does have stats for every matched process and thread, and on every collection it rereads per-thread stats, calculates differences, and creats a "fake" accumulated value, so it can be used as metric and taken rate of. This still means short lived threads and processed will be missed, and is still not 100% immune to some other issues like PID reuse. As of now, I do not trust the CPU usage stats from process-exporter at all. All of this also have quite a bit of overhead too (it scales poorly with number of processes and threads). It kind of works, but it is a bit silly, because kernel, or more specifically cgroups created by systemd do have this information cheaply available, and values provided there are precise and strict.

Additionally (right now at least, as I am going to open issue in process-exporter to fix it), it is not possible to monitor all processes for single uid (i.e. not specify cmd or exec name, but instead use specific uid / username), so if there is some postgresql extension, that executes external UNIX processes that do not conform to specific prefix, they will not be monitored at all, even with elaborate regexp. Similar for other use cases, i.e. monitoring webserver with cgi scripts, or entire user sessions, etc. It is also important, because if some other user on a machine runs an executable named just like my server, it will influence the metrics of unrelated servers and processes. Update: ncabatoff/process-exporter#86

This was totally offtopic tho.

I checked systemd_exporter, and unfortunately it does even worse job than process-exporter in properly tracking CPU usage. (It only monitors main PID in the systemd service; and there is a comment in sources to "add all process", which is also a wrong approach). At least right now. I will take this issue to systemd_exporter authors - prometheus-community/systemd_exporter#1

SuperQ · 2019-03-23T18:57:44Z

@baryluk Yea, for things like that, I usually use cAdvisor to get the metrics from the cgroups. We have an API service that calls out to git commands. We use cAdvisor to watch these.

povilasv · 2019-03-25T09:24:47Z

@baryluk systemd exporter is really early stage, so don't write it off yet :)
The plan is to support cgroups, but not sure what systemd gives back from dbus.

Right now it exposes process metrics, because it was fastest way to go (Prometheus already has procfs lib for that).

pastukhov · 2019-03-25T14:07:20Z

@povilasv Here is some c code exist about it https://github.com/cavaliercoder/zabbix-module-systemd/tree/master/src/modules/systemd

SuperQ closed this as completed Nov 7, 2018

SuperQ mentioned this issue Apr 8, 2019

collector/systemd: Add metrics for resource usage and limits #1311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR memory and cpu statistic per systemd unit #1136

FR memory and cpu statistic per systemd unit #1136

pastukhov commented Oct 29, 2018

discordianfish commented Nov 5, 2018

SuperQ commented Nov 7, 2018

baryluk commented Mar 22, 2019 •

edited

Loading

SuperQ commented Mar 22, 2019

baryluk commented Mar 22, 2019

SuperQ commented Mar 23, 2019

baryluk commented Mar 23, 2019 •

edited

Loading

SuperQ commented Mar 23, 2019

povilasv commented Mar 25, 2019

pastukhov commented Mar 25, 2019

FR memory and cpu statistic per systemd unit #1136

FR memory and cpu statistic per systemd unit #1136

Comments

pastukhov commented Oct 29, 2018

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

discordianfish commented Nov 5, 2018

SuperQ commented Nov 7, 2018

baryluk commented Mar 22, 2019 • edited Loading

SuperQ commented Mar 22, 2019

baryluk commented Mar 22, 2019

SuperQ commented Mar 23, 2019

baryluk commented Mar 23, 2019 • edited Loading

SuperQ commented Mar 23, 2019

povilasv commented Mar 25, 2019

pastukhov commented Mar 25, 2019

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`

baryluk commented Mar 22, 2019 •

edited

Loading

baryluk commented Mar 23, 2019 •

edited

Loading