-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Host operating system: output of uname -a
# uname -a
Linux s415vm1206 3.12.69-60.64.35-default #1 SMP Thu Mar 23 13:28:20 UTC 2017 (4cebac5) x86_64 x86_64 x86_64 GNU/Linux
# systemctl --version
systemd 210
+PAM +LIBWRAP +AUDIT +SELINUX -IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ +SECCOMP +APPARMOR
node_exporter version: output of node_exporter -version
# /opt/node_exporter/node_exporter --version
node_exporter, version 0.14.0 (branch: master, revision: 840ba5dcc71a084a3bc63cb6063003c1f94435a6)
build user: root@bb6d0678e7f3
build date: 20170321-12:12:54
go version: go1.7.5
Are you running node_exporter in Docker?
No, node_exporter is directly installed to host machine.
What did you do that produced an error?
This is something in-between a bug report and change request. The node_exporter behavior is not acutally wrong, but does not seem reasonable, either. The systemd collector exports units that are not available in systemd, but referenced by other units for example in after-hooks. This results in
- unnecessary overhead in monitoring because of irrelevant metrics and
- issues with alerting (because of services that seem down, but aren't even installed) -- you cannot simply rely on "if a systemd unit metric exists, the service also should be available" semantics any more in rule files.
For example, we have etcd.service not installed, but systemd lists it as not-found:
# systemctl list-units --all etcd.service
UNIT LOAD ACTIVE SUB DESCRIPTION
etcd.service not-found inactive dead etcd.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
because of flanneld.service listing it as after hook:
# systemctl show flanneld.service | grep etcd.service
After=network.target network-online.target etcd.service systemd-journald.socket basic.target system.slice
What did you expect to see?
I do not expect to see metrics for a unit that is not even installed, ie. declared as not-found by systemd. Especially as I cannot discriminate non-installed and otherwise inactive units.
Alternatively, I can imagine another state "non-installed", in addition to the ones already existing, if anybody wants to monitor whether a unit file is actually installed. Might also be implemented by providing the loaded metrics as 0=not found, 1=loaded, it seems rather binary to me. I'm open to no or a completely different solution, this is not a use case for us anyway.
What did you see instead?
node_exporter lists the service as inactive in consequence:
# curl -s localhost:10255/metrics | grep etcd
node_systemd_unit_state{name="etcd.service",state="activating"} 0
node_systemd_unit_state{name="etcd.service",state="active"} 0
node_systemd_unit_state{name="etcd.service",state="deactivating"} 0
node_systemd_unit_state{name="etcd.service",state="failed"} 0
node_systemd_unit_state{name="etcd.service",state="inactive"} 1