Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent failure on inability to connect to systemd #112

Open
robryk opened this issue Nov 13, 2023 · 0 comments · May be fixed by #113
Open

Silent failure on inability to connect to systemd #112

robryk opened this issue Nov 13, 2023 · 0 comments · May be fixed by #113

Comments

@robryk
Copy link

robryk commented Nov 13, 2023

Observations

I've managed to start systemd_exporter in a way in which it can't talk to systemd (I haven't figured out what's wrong yet, but this is not the topic of this bug report). On its stderr, it prints repeatedly:

Nov 13 23:40:58 howl systemd_exporter[1542322]: ts=2023-11-13T22:40:58.025Z caller=systemd.go:225 level=error msg="error collecting metrics" err="couldn't get dbus connection: read unix @->/run/dbus/system_bus_socket: recvmsg: connection reset by peer"

However, there is no trace of those errors in the metrics exported from it: it's just not exporting ~any metrics apart from promhttp_* stuff and systemd_exporter_build_info:

# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 9
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
# HELP systemd_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which systemd_exporter was built, and the goos and goarch for the build.
# TYPE systemd_exporter_build_info gauge
systemd_exporter_build_info{branch="unknown",goarch="amd64",goos="linux",goversion="go1.21.3",revision="unknown",tags="unknown",version="0.6.0"} 1

My expectation

I would expect systemd exporter to, in case in which it can't fulfill any part of its function, to either:

  • return HTTP 5xx error on /metrics, or
  • export a "health" metric and set it to false in such cases.

The reason for this is that currently it's easy to confuse the "all units are healthy and using reasonable amounts of resources" situation with this error: if one writes alert rules that look for evidence of problems, they will find none in this situation.

jpds added a commit to jpds/systemd_exporter that referenced this issue Nov 13, 2023
Fixes: prometheus-community#112

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
@jpds jpds linked a pull request Nov 13, 2023 that will close this issue
jpds added a commit to jpds/systemd_exporter that referenced this issue Nov 17, 2023
Fixes: prometheus-community#112

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
jpds added a commit to jpds/systemd_exporter that referenced this issue Nov 17, 2023
Fixes: prometheus-community#112

Signed-off-by: Jonathan Davies <jpds@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant