Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: new meminfo_procfs collector, deprecate meminfo collector #3043

Closed

Conversation

tjhop
Copy link
Contributor

@tjhop tjhop commented Jun 8, 2024

Part of #2957

Adds a new collector named meminfo_procfs that exposes memory metrics in a format that attemps to be more inline with upstream conventions -- memory metrics are now exposed under a single metric named node_memory_bytes that has a single label called field, corresponding to the name of the field in /proc/meminfo that value represents. Label values for the field label are named according to the struct field in the procfs.Meminfo struct, and the values always use the byte-normalized counterpart fields in the procfs.Meminfo struct, resulting in a transition such as the following:
node_memory_Active_anon_bytes -> node_memory_bytes{field="ActiveAnon"}

Notes:
currently linux only, as that is the focus of the procfs lib. once consensus has been reached here on new metric name/labels/format, I can expand coverage for darwin/openbsd/netbsd and forward-port the existing meminfo collector for those platforms to use the updated format.

Part of prometheus#2957

Adds a new collector named `meminfo_procfs` that exposes memory metrics
in a format that attemps to be more inline with upstream conventions --
memory metrics are now exposed under a single metric named
`node_memory_bytes` that has a single label called `field`,
corresponding to the name of the field in `/proc/meminfo` that value
represents. Label values for the `field` label are named according to
the struct field in the procfs.Meminfo struct, and the values always use
the byte-normalized counterpart fields in the procfs.Meminfo struct,
resulting in a transition such as the following:
`node_memory_Active_anon_bytes -> node_memory_bytes{field="ActiveAnon"}`

Notes:
currently linux only, as that is the focus of the procfs lib. once
consensus has been reached here on new metric name/labels/format, I can
expand coverage for darwin/openbsd/netbsd and forward-port the existing
meminfo collector for those platforms to use the updated format.

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
@tjhop
Copy link
Contributor Author

tjhop commented Jun 8, 2024

@discordianfish @SuperQ @rexagod would you mind checking this out when time permits? Feedback welcome 👍

Comment on lines +122 to +131

// The meminfo collector is being deprecated in favor of the new
// meminfo_procfs collector. They are mutually exclusive and should not
// be enabled at the time same.
micEnabled, micExist := collectorState["meminfo"]
mipcEnabled, mipcExist := collectorState["meminfo_procfs"]
if (micExist && mipcExist) && (*micEnabled && *mipcEnabled) {
return nil, fmt.Errorf("'meminfo' and 'meminfo_procfs' collectors are mutually exclusive, please disable one")
}

Copy link
Contributor Author

@tjhop tjhop Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this results in a panic:

~/go/src/github.com/prometheus/node_exporter (feat/procfslib-memory-collector [ U ]) -> ./node_exporter --collector.meminfo_procfs --log.level="debug" --web.listen-address ":9101"
ts=2024-06-08T18:12:51.917Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=feat/procfslib-memory-collector, revision=1bab3570959f98b7e5b833d94c7024c7658c63aa)"
ts=2024-06-08T18:12:51.917Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.3, platform=linux/amd64, user=tjhop@contraband, date=20240608-18:05:10, tags=unknown)"
ts=2024-06-08T18:12:51.917Z caller=node_exporter.go:199 level=debug msg="Go MAXPROCS" procs=1
panic: Couldn't create metrics handler: couldn't create collector: 'meminfo' and 'meminfo_procfs' collectors are mutually exclusive, please disable one

goroutine 1 [running]:
main.newHandler(0x1, 0x28, {0xcf2740, 0xc0000d0440})
        /home/tjhop/go/src/github.com/prometheus/node_exporter/node_exporter.go:69 +0x2ab
main.main()
        /home/tjhop/go/src/github.com/prometheus/node_exporter/node_exporter.go:201 +0x128d

This definitely achieves the goal of "don't run the node exporter if both collectors are enabled", but is certainly not a graceful exit, either.

Is there a better place to check and ensure both collectors aren't enabled at the same time? I had considered doing so in registerCollector(), but it felt awkward trying to fatal out from there. I had also considered checking in the init() funcs of both collectors when they try to do their self registration to only register if $theOther isn't registered yet, but I wasn't sure about any ordering that may/may not exist in collector registration and didn't want to cause a race between the 2 to see which would successfully register.

@tjhop
Copy link
Contributor Author

tjhop commented Jun 8, 2024

Screenshot from 2024-06-08 15-26-10

Running the new build on 9101 and latest node_exporter on 9100 for some side-by-side, and thus far in my testing things seem to line up

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using labels like this is not what we want. We use separate metric names based on what information in the label can be aggregated.

Having to use on or ignoring for basic operations is a worse user experience.

Where labels are meant to be used is if operations like sum() are appropriate.

It doesn't make sense to sum(node_memory_bytes{field=~"MemTotal|MemAvailable"}), so this is a clear indicator that these should be separate metric names.

@tjhop
Copy link
Contributor Author

tjhop commented Jun 10, 2024

No worries, I also wasn't a huge fan of this approach but it felt like the most "direct" conversion to at least start the conversation based on some of the feedback in #2957.

Having to use on or ignoring for basic operations is a worse user experience.

I'll agree with that 👍

Where labels are meant to be used is if operations like sum() are appropriate.
It doesn't make sense to sum(node_memory_bytes{field=~"MemTotal|MemAvailable"}), so this is a clear indicator that these should be separate metric names.

Perhaps I was a bit too loose in my interpretation of "meaningful" when looking at the docs? 😉

As a rule of thumb, either the sum() or the avg() over all dimensions of a given metric should be meaningful (though not necessarily useful).

Don't worry -- I'm aware the next line of the quote uses an example of a queue with capacity/number of elements which is a decent corollary to our memory free/total example here, but to your point, it does raise questions about how to appropriately divide/label the metrics that haven't been fully fleshed out yet. For instance, should we have a separate metric for memory vs swap memory? And the current memory metrics are unlabeled, what other labels might be useful here?

@tjhop
Copy link
Contributor Author

tjhop commented Jun 12, 2024

Closing in favor of #3049

@tjhop tjhop closed this Jun 12, 2024
@tjhop tjhop deleted the feat/procfslib-memory-collector branch June 12, 2024 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants