Standardise prometheus metrics #1177

leonerd · 2016-10-19T15:05:32Z

When I originally wrote the process-wide metric collection, I wasn't aware that there are standard names and semantics for the set of metrics you're supposed to export. Namely:
https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors

This PR attempts to address this by:

Providing currently-duplicate exporting of process-wide stats that already existed, now to standard names and units (process_cpu_*_seconds_total, process_open_fds)
Adding new standard process-wide metrics that didn't used to exist (process_*_memory_bytes, process_start_time_seconds, process_max_fds)

The intention is that once there's about a week of data in the new format logged by prometheus, I'll update grafana graphs to use those standard metric names instead, then make another commit to remove the "old" variable names.

…etrics

…ite /proc entries

erikjohnston · 2016-10-19T15:26:14Z

Why are we using /proc/self/stat rather than using rusage?

leonerd · 2016-10-19T15:32:46Z

Because /proc/self/stat has a bunch of other metrics we want that rusage doesn't give us anyway

erikjohnston · 2016-10-19T15:41:50Z

I think at this point all the globals should be wrapped up into an actual class or split out into a separate module (or both)

erikjohnston · 2016-10-19T15:44:21Z

synapse/metrics/__init__.py

+        with open("/proc/self/stat") as s:
+            line = s.read()
+            # line is PID (command) more stats go here ...
+            stats = line.split(") ", 1)[1].split(" ")


Can we parse the stats here, rather than requiring the users of stat to know the magic numbers?

Done 6453d03

leonerd · 2016-10-19T15:52:12Z

I think at this point all the globals should be wrapped up into an actual class or split out into a separate module (or both)

Yes, perhaps I'll pull the lot out into a separate ProcessMetrics class - is here in this file OK or put it in a new one?

erikjohnston · 2016-10-19T15:53:24Z

Depends on the size really, if it gets a bit big then moving to a separate file would probably be sensible, but if it small then leaving it there is probably fine.

leonerd · 2016-10-19T17:11:21Z

Actually turned out that moving it into a new file was easier/neater than into a separate class within the same file.

…eep line lengths under 90

leonerd · 2016-10-20T10:23:23Z

Three Jenkins failures appear unrelated. A single flakey test looks like:

408 - Can add global push rule for override

erikjohnston · 2016-10-20T11:51:48Z

LGTM

Paul "LeoNerd" Evans added 10 commits October 19, 2016 15:05

Callback metric values might not just be integers - allow floats

b21b9db

Export CPU usage metrics also under prometheus-standard metric name

03c2720

Use /proc/self/stat to generate the new process_cpu_*_seconds_total m…

9b0316c

…etrics

Add standard process_*_memory_bytes metrics

95fc702

Add standard process_open_fds metric

06f1ad1

Add standard process_max_fds metric

def6364

Add standard process_start_time_seconds metric

981f852

Guard registration of process-wide metrics by existence of the requis…

1b17945

…ite /proc entries

Also guard /proc/self/fds-related code with a suitable psuedoconstant

b202531

appease pep8

5663137

leonerd assigned erikjohnston Oct 19, 2016

erikjohnston reviewed Oct 19, 2016

View reviewed changes

Paul "LeoNerd" Evans added 2 commits October 19, 2016 17:54

A slightly neater way to manage metric collector functions

4cedd53

Move the process metrics collector code into its own file

3ae48a1

Paul "LeoNerd" Evans added 3 commits October 19, 2016 18:21

Cut the raw /proc/self/stat line up into named fields at collection time

6453d03

Adjust code for <100 char line limit

1071c7d

Split callback metric lambda functions down onto their own lines to k…

b01aaad

…eep line lengths under 90

leonerd merged commit a842fed into develop Oct 21, 2016

richvdh deleted the paul/standard-metric-names branch December 1, 2016 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardise prometheus metrics #1177

Standardise prometheus metrics #1177

leonerd commented Oct 19, 2016 •

edited

erikjohnston commented Oct 19, 2016

leonerd commented Oct 19, 2016

erikjohnston commented Oct 19, 2016

erikjohnston Oct 19, 2016

leonerd Oct 19, 2016

leonerd commented Oct 19, 2016

erikjohnston commented Oct 19, 2016

leonerd commented Oct 19, 2016

leonerd commented Oct 20, 2016

erikjohnston commented Oct 20, 2016

Standardise prometheus metrics #1177

Standardise prometheus metrics #1177

Conversation

leonerd commented Oct 19, 2016 • edited

erikjohnston commented Oct 19, 2016

leonerd commented Oct 19, 2016

erikjohnston commented Oct 19, 2016

erikjohnston Oct 19, 2016

Choose a reason for hiding this comment

leonerd Oct 19, 2016

Choose a reason for hiding this comment

leonerd commented Oct 19, 2016

erikjohnston commented Oct 19, 2016

leonerd commented Oct 19, 2016

leonerd commented Oct 20, 2016

erikjohnston commented Oct 20, 2016

leonerd commented Oct 19, 2016 •

edited