-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR memory and cpu statistic per systemd unit #1136
Comments
So far we decided to not support cgroup metrics and leave that to cadvisor. @SuperQ What do you think? Tbh, I often thought that maybe cgroup metrics would be a better fit for the node-exporter than some rather specific stuff we already include.. |
I've been mostly fine with things that are systemd-specific in the systemd collector, but generic stuff like cgroup stats are probably best left to cAdvisor or some other stand-alone cgroup collector. For now, I think the decision for this is still no cgroup metrics in node_exporter. |
I really miss memory and CPU (sys+user) totals per systemd unit in When I manually call cAdvisor is rather very heavy solution, and requires quiet a bit of configuration to get everything working. It should be simpler than that. |
While I understand you want something simpler, it's simply out of scope for the node_exporter. The node_exporter is for the node, not the applications running on the node. @povilasv has started work on https://github.com/povilasv/systemd_exporter. This will hopefully be a replacement for the systemd functionality in the node_exporter. |
@SuperQ systemd_exporter actually looks excellent. My comment was mainly because there is already systemd functionality in node_exporter and some per-unit metrics there, so it was natural extension to have these extra metrics (that are already easy to obtain from systemd). The systemd per unit CPU and memory, are useful for monitoring and troubleshooting the node as a whole, i.e. to see which unit is consuming most resources, or to know preemptively that the node is running our of some resources and needs resizing or moving some services to separate machines. It is not necessarily about monitoring applications. In fact some of the functionality provided by systemd monitoring, can't even be easily done from inside application (i.e. if application is multi process, for example postgresql, and others). Is there a plan to remove systemd support from node? I guess it make sense to not duplicate functionality. Just curious. |
The systemd support in the node_exporter is to expose things that are systemd-specific. Like unit states or the number of systemd restart actions. Monitoring process CPU and memory is not really systemd-specific, it's process-specific. Hence, out of scope and better done with the process itself. For example, I don't need to get CPU and memory stats from CoreDNS, as the process is already monitored by Prometheus and has those metrics. For the PostgreSQL example, it would be interesting to have the CPU and memory function a part of the postgres_exporter. I've thought about doing this with the mysqld_exporter as an optional collector. Otherwise, it would be better to get these metrics using the process-exporter, or cAdvisor. For example, we use the process-exporter to monitor PostgreSQL resource use. |
@SuperQ Thanks for your input. I do not want to make this into long discussion, especially as this issue here is closed, but... I am aware of postgres_exporter and process-exporter, and in fact I do use them. The problem with process_exporter is that it monitor processes, but postgresql creates many processes, many of which are created and destroyed rapidly. Because process_exporter is based on sampling procfs, it will miss many of these processes completely (using process creation/destruction notifications using inotify/netlink or whatever, could help, but it has systemwide overhead). Additionally the reported cpu_seconds_total will be often incorrect even if it doesn't miss these processes, because to correctly calculate rate over this (cumulative) metric (i.e. to calculate cpu-seconds-per-second), you need to points in time that can be substracted reliably (i.e. they need to be some reasonable sums itself, constructed from monotonic quantities). To do that, process-exporter does quiet a bit of number and statistic juggling, i.e. it does have stats for every matched process and thread, and on every collection it rereads per-thread stats, calculates differences, and creats a "fake" accumulated value, so it can be used as metric and taken rate of. This still means short lived threads and processed will be missed, and is still not 100% immune to some other issues like PID reuse. As of now, I do not trust the CPU usage stats from process-exporter at all. All of this also have quite a bit of overhead too (it scales poorly with number of processes and threads). It kind of works, but it is a bit silly, because kernel, or more specifically cgroups created by systemd do have this information cheaply available, and values provided there are precise and strict. Additionally (right now at least, as I am going to open issue in process-exporter to fix it), it is not possible to monitor all processes for single uid (i.e. not specify cmd or exec name, but instead use specific uid / username), so if there is some postgresql extension, that executes external UNIX processes that do not conform to specific prefix, they will not be monitored at all, even with elaborate regexp. Similar for other use cases, i.e. monitoring webserver with cgi scripts, or entire user sessions, etc. It is also important, because if some other user on a machine runs an executable named just like my server, it will influence the metrics of unrelated servers and processes. Update: ncabatoff/process-exporter#86 This was totally offtopic tho. I checked systemd_exporter, and unfortunately it does even worse job than process-exporter in properly tracking CPU usage. (It only monitors main PID in the systemd service; and there is a comment in sources to "add all process", which is also a wrong approach). At least right now. I will take this issue to systemd_exporter authors - prometheus-community/systemd_exporter#1 |
@baryluk Yea, for things like that, I usually use cAdvisor to get the metrics from the cgroups. We have an API service that calls out to |
@baryluk systemd exporter is really early stage, so don't write it off yet :) Right now it exposes process metrics, because it was fastest way to go (Prometheus already has procfs lib for that). |
@povilasv Here is some c code exist about it https://github.com/cavaliercoder/zabbix-module-systemd/tree/master/src/modules/systemd |
Host operating system: output of
uname -a
Linux stage-br-31-node-1 4.4.0-131-generic #157-Ubuntu SMP Thu Jul 12 15:51:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of
node_exporter --version
node_exporter, version 0.15.2 (branch: HEAD, revision: 98bc64930d34878b84a0f87dfe6e1a6da61e532d) build user: root@d5c4792c921f build date: 20171205-14:50:53 go version: go1.9.2
node_exporter command line flags
opt/prometheus_exporters/node_exporter --web.listen-address=0.0.0.0:9100 --collector.conntrack --collector.diskstats --collector.entropy --collector.filefd --collector.filesystem --collector.loadavg --collector.mdadm --collector.meminfo --collector.netdev --collector.netstat --collector.sockstat --collector.stat --collector.textfile --collector.time --collector.uname --collector.vmstat --collector.ntp --collector.tcpstat --collector.systemd
Are you running node_exporter in Docker?
No
What did you do that produced an error?
There is no metric about memory and cpu usage per systemd unit
What did you expect to see?
Similar to https://github.com/cavaliercoder/zabbix-module-systemd
What did you see instead?
Nothing
The text was updated successfully, but these errors were encountered: