Skip to content

systemd collector exports unavailable units referenced by other units #567

@JensErat

Description

@JensErat

Host operating system: output of uname -a

# uname -a
Linux s415vm1206 3.12.69-60.64.35-default #1 SMP Thu Mar 23 13:28:20 UTC 2017 (4cebac5) x86_64 x86_64 x86_64 GNU/Linux
# systemctl --version
systemd 210
+PAM +LIBWRAP +AUDIT +SELINUX -IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ +SECCOMP +APPARMOR

node_exporter version: output of node_exporter -version

# /opt/node_exporter/node_exporter --version
node_exporter, version 0.14.0 (branch: master, revision: 840ba5dcc71a084a3bc63cb6063003c1f94435a6)
  build user:       root@bb6d0678e7f3
  build date:       20170321-12:12:54
  go version:       go1.7.5

Are you running node_exporter in Docker?

No, node_exporter is directly installed to host machine.

What did you do that produced an error?

This is something in-between a bug report and change request. The node_exporter behavior is not acutally wrong, but does not seem reasonable, either. The systemd collector exports units that are not available in systemd, but referenced by other units for example in after-hooks. This results in

  • unnecessary overhead in monitoring because of irrelevant metrics and
  • issues with alerting (because of services that seem down, but aren't even installed) -- you cannot simply rely on "if a systemd unit metric exists, the service also should be available" semantics any more in rule files.

For example, we have etcd.service not installed, but systemd lists it as not-found:

# systemctl list-units --all etcd.service
UNIT         LOAD      ACTIVE   SUB  DESCRIPTION
etcd.service not-found inactive dead etcd.service

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

because of flanneld.service listing it as after hook:

# systemctl show flanneld.service | grep etcd.service
After=network.target network-online.target etcd.service systemd-journald.socket basic.target system.slice

What did you expect to see?

I do not expect to see metrics for a unit that is not even installed, ie. declared as not-found by systemd. Especially as I cannot discriminate non-installed and otherwise inactive units.

Alternatively, I can imagine another state "non-installed", in addition to the ones already existing, if anybody wants to monitor whether a unit file is actually installed. Might also be implemented by providing the loaded metrics as 0=not found, 1=loaded, it seems rather binary to me. I'm open to no or a completely different solution, this is not a use case for us anyway.

What did you see instead?

node_exporter lists the service as inactive in consequence:

# curl -s localhost:10255/metrics | grep etcd
node_systemd_unit_state{name="etcd.service",state="activating"} 0
node_systemd_unit_state{name="etcd.service",state="active"} 0
node_systemd_unit_state{name="etcd.service",state="deactivating"} 0
node_systemd_unit_state{name="etcd.service",state="failed"} 0
node_systemd_unit_state{name="etcd.service",state="inactive"} 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions