Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Standardise prometheus metrics #1177

Merged
merged 15 commits into from Oct 21, 2016
Merged

Standardise prometheus metrics #1177

merged 15 commits into from Oct 21, 2016

Conversation

leonerd
Copy link
Contributor

@leonerd leonerd commented Oct 19, 2016

When I originally wrote the process-wide metric collection, I wasn't aware that there are standard names and semantics for the set of metrics you're supposed to export. Namely:
https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors

This PR attempts to address this by:

  • Providing currently-duplicate exporting of process-wide stats that already existed, now to standard names and units (process_cpu_*_seconds_total, process_open_fds)
  • Adding new standard process-wide metrics that didn't used to exist (process_*_memory_bytes, process_start_time_seconds, process_max_fds)

The intention is that once there's about a week of data in the new format logged by prometheus, I'll update grafana graphs to use those standard metric names instead, then make another commit to remove the "old" variable names.

@erikjohnston
Copy link
Member

Why are we using /proc/self/stat rather than using rusage?

@leonerd
Copy link
Contributor Author

leonerd commented Oct 19, 2016

Because /proc/self/stat has a bunch of other metrics we want that rusage doesn't give us anyway

@erikjohnston
Copy link
Member

I think at this point all the globals should be wrapped up into an actual class or split out into a separate module (or both)

with open("/proc/self/stat") as s:
line = s.read()
# line is PID (command) more stats go here ...
stats = line.split(") ", 1)[1].split(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parse the stats here, rather than requiring the users of stat to know the magic numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 6453d03

@leonerd
Copy link
Contributor Author

leonerd commented Oct 19, 2016

I think at this point all the globals should be wrapped up into an actual class or split out into a separate module (or both)

Yes, perhaps I'll pull the lot out into a separate ProcessMetrics class - is here in this file OK or put it in a new one?

@erikjohnston
Copy link
Member

Depends on the size really, if it gets a bit big then moving to a separate file would probably be sensible, but if it small then leaving it there is probably fine.

@leonerd
Copy link
Contributor Author

leonerd commented Oct 19, 2016

Actually turned out that moving it into a new file was easier/neater than into a separate class within the same file.

@leonerd
Copy link
Contributor Author

leonerd commented Oct 20, 2016

Three Jenkins failures appear unrelated. A single flakey test looks like:

408 - Can add global push rule for override

@erikjohnston
Copy link
Member

LGTM

@leonerd leonerd merged commit a842fed into develop Oct 21, 2016
@richvdh richvdh deleted the paul/standard-metric-names branch December 1, 2016 14:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants