Add new Network UPS Tools exporter to the exporters list by DRuggeri · Pull Request #1749 · prometheus/docs

DRuggeri · 2020-09-22T22:20:14Z

Another COVID project as I work on the network and instrumenting more and more of it.

Feedback always welcome

Signed-off-by: Daniel Ruggeri <druggeri@primary.net>

brian-brazil · 2020-09-23T09:00:42Z

Could this supersede the apcupsd we currently list, or is it doing something different?

Some advice to help improve your exporter:
You're using direct instrumentation which is generally racy and incorrect for exporters, you should be using MustNewConstMetric.

variable should be part of the metric name, not a label.

The various last_scrape metrics aren't very useful, I'd suggest removing them. I'd also suggest not reporting an error with a gauge, but instead failing the scrape. You're also not catching when the 2nd RPC fails.

The _device metric is an info metric, so should end with _info. It also seems to be missing many of the labels I'd expect it to have given the NUT docs - I'd expect everything bar uptime.

Why not collect everything, rather than having the user provide a list?

DRuggeri · 2020-09-23T23:08:59Z

Could this supersede the apcupsd we currently list, or is it doing something different?

Yes, but the use case may depend. That's why I created this exporter (I'm changing UPS vendors). The apcupsd_exporter does have access to data that is not visible to NUT - presumably due to proprietary or undocumented protocol data/levels of NUT support. Some examples I came across are time in seconds on UPS, number of transfers to battery, and remaining runtime on battery.

Some advice to help improve your exporter:
You're using direct instrumentation which is generally racy and incorrect for exporters, you should be using MustNewConstMetric.

I know we discussed this in a separate PR and guess I built this from an old template. I've switched to the Desc/MustNewConst pattern. I also did make sure that all logic from an invocation of Collect is encapsulated to avoid race conditions (i.e. if there are concurrent scrapes, it would result in a session with the UPS daemon for each scrape rather than stepping on a common session).

variable should be part of the metric name, not a label.

I suppose this could be done by gathering the requested variables during init and then building metrics dynamically from the list, but each UPS will have an unknown set of variables available depending on level of support, NUT driver, and UPS "openness". In the case of a hard-coded list of variables to export, that's not too hard. However, in the case that the administrator wants all variables by setting the nut.vars_enable value to an empty string, it would require during init to reach out to the UPS and find out what is available, right? That's doable, I guess, but can you say a bit more about the benefits of putting the variable name in the metric name rather than being a label dimension?

The various last_scrape metrics aren't very useful, I'd suggest removing them. I'd also suggest not reporting an error with a gauge, but instead failing the scrape.

OK - I've gone ahead and removed them. They are leftovers from the exporter that I built this one from. What is the idiomatic way to fail the scrape, though, and how does one detect the failure from from the alertmanager side of things? The Collect method doesn't return an error, for example, and this target would still appear up in Prometheus, right?

You're also not catching when the 2nd RPC fails.

I think you're referring to the population of upsList. This should be OK since a failure there leaves an empty slice negating all the logic in the range loop. LMK if I'm not tracking to what you mean or if this was related to the scrape error metric which has since been deleted.

The _device metric is an info metric, so should end with _info. It also seems to be missing many of the labels I'd expect it to have given the NUT docs - I'd expect everything bar uptime.

I've renamed it in 3b8fe22e62d8b43afd40067db37d73cb6edbe273. Can you say more about the labels you expect, but are missing? These are held in the deviceLabels slice and are populated during each scrape. Currently the slice mirrors what is in the NUT doco ("model", "mfr", "serial", "type", "description", "contact", "location", "part", "macaddr", "uptime")

Why not collect everything, rather than having the user provide a list?

Oops - this is already possible, I just forgot to mention it in the README (fixed in 58135dbb3dce25790e1091a2772e8f2be2950048). This can be done by setting nut.vars_enable to an empty string. Generally speaking, I prefer to give the administrator the option of exporting only the time series data they care about (i.e. if they don't care about efficiency metrics, don't dump that into the TSDB).

brian-brazil · 2020-09-24T08:35:00Z

it would require during init to reach out to the UPS and find out what is available, right?

That's making it more complicated than it needs to be, you would usually build up a map as you go. However the Go client will do all of this for you when you're using MustNewConstMetric.

can you say a bit more about the benefits of putting the variable name in the metric name rather than being a label dimension?

You're making things much more difficult and error prone for users to query for no potential benefit.

What is the idiomatic way to fail the scrape, though, and how does one detect the failure from from the alertmanager side of things?

https://www.robustperception.io/failing-a-scrape-with-the-prometheus-go-client

This should be OK since a failure there leaves an empty slice negating all the logic in the range loop.

Silently missing data is not good, it's simplest to fail the scrape if any part of the data collection fails.

Can you say more about the labels you expect, but are missing?

Everything bar uptime from that list, as uptime is both a number and changing over time.

Generally speaking, I prefer to give the administrator the option of exporting only the time series data they care about (i.e. if they don't care about efficiency metrics, don't dump that into the TSDB).

Personally as there's only ~200 of them which is small, I'd keep it simple and always expose everything. You're not avoiding any load on the target, you're not going to have that many UPSes, and if someone really wants to micro-optimise for some reason they can use metric relabelling.

You also should decide if this is a single target or multi-target exporter, in particular why is Ups an option? If it should be configurable by the user then it should come from a URL parameter rather than a config file and the target label device not be added in, as service discovery is Prometheus's responsibility.

https://github.com/DRuggeri/nut_exporter/blob/master/collectors/nut_collector.go#L86

You should leave it as the empty string if it's missing, generally try to keep the data as raw as you can.

DRuggeri · 2020-09-25T13:13:52Z

it would require during init to reach out to the UPS and find out what is available, right?

That's making it more complicated than it needs to be, you would usually build up a map as you go. However the Go client will do all of this for you when you're using MustNewConstMetric.
Hah - yep - that makes sense. Thanks for connecting the dots. I've made this update.

can you say a bit more about the benefits of putting the variable name in the metric name rather than being a label dimension?

You're making things much more difficult and error prone for users to query for no potential benefit.

What is the idiomatic way to fail the scrape, though, and how does one detect the failure from from the alertmanager side of things?

https://www.robustperception.io/failing-a-scrape-with-the-prometheus-go-client

This should be OK since a failure there leaves an empty slice negating all the logic in the range loop.

Silently missing data is not good, it's simplest to fail the scrape if any part of the data collection fails.
OK - I've updated the scrape to emit an invalid metric as described. How would this manifest on the Prometheus side? For example, I'd like to emit an alert if the scrape fails. Do I look for the invalid metric being set?

Can you say more about the labels you expect, but are missing?

Everything bar uptime from that list, as uptime is both a number and changing over time.
I guess I still don't understand. These ARE in place regardless of being set or not, and are set on every scrape. See example output:

# HELP network_ups_tools_ups_device_info UPS Device information
# TYPE network_ups_tools_ups_device_info gauge
network_ups_tools_ups_device_info{contact="",description="",location="",macaddr="",mfr="American Power Conversion",model="Back-UPS ES 750G",part="",serial="4B1538P29631",type="ups",ups="ups"} 1

What am I missing?

You also should decide if this is a single target or multi-target exporter, in particular why is Ups an option? If it should be configurable by the user then it should come from a URL parameter rather than a config file and the target label device not be added in, as service discovery is Prometheus's responsibility.
I would say it makes sense as a multi-target exporter (I've only worked with single target exporters before). I'm a bit unclear how I gain access to URL parameters during Collect, though. I was examining snmp_exporter's code and realized that multi-target exporter consumes a config file rather than URL params. Any pointers on how to get access to query string parameters during a scrape?

https://github.com/DRuggeri/nut_exporter/blob/master/collectors/nut_collector.go#L86

You should leave it as the empty string if it's missing, generally try to keep the data as raw as you can.
OK, easy enough. This has been changed (see example above)

brian-brazil · 2020-09-25T14:44:44Z

How would this manifest on the Prometheus side? For example, I'd like to emit an alert if the scrape fails. Do I look for the invalid metric being set?

Up will be 0, as for any failed scrape.

I guess I still don't understand. These ARE in place regardless of being set or not, and are set on every scrap

That looks right, I misread the code.

I'm a bit unclear how I gain access to URL parameters during Collect, though. I was examining snmp_exporter's code and realized that multi-target exporter consumes a config file rather than URL params. Any pointers on how to get access to query string parameters during a scrape?

By design, you don't. Instead do what the snmp exporter does, create a collector and registry per request.

DRuggeri · 2020-09-26T02:27:11Z

I'm a bit unclear how I gain access to URL parameters during Collect, though. I was examining snmp_exporter's code and realized that multi-target exporter consumes a config file rather than URL params. Any pointers on how to get access to query string parameters during a scrape?

By design, you don't. Instead do what the snmp exporter does, create a collector and registry per request.

I see. OK - I've done this by instantiating a new collector during every scrape with a custom handler (all the base options set on command line are passed to the new collector, but the name of the ups is passed in the query string). This required a separate handler, so I've kept the UPS metrics at /metrics?ups=foo and left the usual promhttp.Handler() metrics at /exporter_metrics.

This may be a naive question, but is it an expensive operation to create a registry and handlers for each request or is that just splitting hairs?

brian-brazil · 2020-09-28T08:16:47Z

Generally /metrics is the metrics of the exporter itself, and some other path would have the ups metrics.

The "ups" label should be removed, that is the responsibility of Prometheus service discovery to provide. In general you should never have the same labelpair across all of a metrics page.

This may be a naive question, but is it an expensive operation to create a registry and handlers for each request or is that just splitting hairs?

It's going to be effectively nothing.

DRuggeri · 2020-09-30T23:31:14Z

Generally /metrics is the metrics of the exporter itself, and some other path would have the ups metrics.

OK, no worries. I've swapped them as proposed.

The "ups" label should be removed, that is the responsibility of Prometheus service discovery to provide. In general you should never have the same labelpair across all of a metrics page.

Sure - done - and I've added some notes in README as a helpful breadcrumb trail for those with multiple UPS devices configured in a single NUT server.

I think we should be good to go at this point - thanks for all the feedback!

brian-brazil · 2020-10-01T09:32:44Z

Thanks!

Add new Network UPS Tools exporter to the list

e2ff3bb

Signed-off-by: Daniel Ruggeri <druggeri@primary.net>

beorn7 added the exporters and integrations Requests for new entries in the list of exporters and integrations label Sep 23, 2020

brian-brazil merged commit 0d74d5e into prometheus:master Oct 1, 2020

DRuggeri mentioned this pull request Jan 30, 2021

Add name & server labels and support multiple UPSes DRuggeri/nut_exporter#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Network UPS Tools exporter to the exporters list#1749

Add new Network UPS Tools exporter to the exporters list#1749
brian-brazil merged 1 commit intoprometheus:masterfrom
DRuggeri:master

DRuggeri commented Sep 22, 2020

Uh oh!

brian-brazil commented Sep 23, 2020 •

edited

Loading

Uh oh!

DRuggeri commented Sep 23, 2020

Uh oh!

brian-brazil commented Sep 24, 2020

Uh oh!

DRuggeri commented Sep 25, 2020

Uh oh!

brian-brazil commented Sep 25, 2020

Uh oh!

DRuggeri commented Sep 26, 2020

Uh oh!

brian-brazil commented Sep 28, 2020

Uh oh!

DRuggeri commented Sep 30, 2020

Uh oh!

brian-brazil commented Oct 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DRuggeri commented Sep 22, 2020

Uh oh!

brian-brazil commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DRuggeri commented Sep 23, 2020

Uh oh!

brian-brazil commented Sep 24, 2020

Uh oh!

DRuggeri commented Sep 25, 2020

Uh oh!

brian-brazil commented Sep 25, 2020

Uh oh!

DRuggeri commented Sep 26, 2020

Uh oh!

brian-brazil commented Sep 28, 2020

Uh oh!

DRuggeri commented Sep 30, 2020

Uh oh!

brian-brazil commented Oct 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brian-brazil commented Sep 23, 2020 •

edited

Loading