[feature request] Support for more process and thread metrics #16

stefanodoni · 2017-09-21T15:51:39Z

Hi,

First of all many thanks for this great exporter!

I've been using it a lot and started to rely on it more and more to have process-level visibility. Over time, I've found that I needed a couple of more metrics that can be really handy in performance analysis. To cover such needs, I had to resort to other tools such as atop or pidstat, however I have clearly lost the benefit of having a centralized time series database like prometheus.

Here is my current wishlist:

Process CPU time
Total CPU user time
Total CPU system time
Memory
Peak resident size
Context switches:
Number of voluntary context switches
Number of involuntary context switches
Page faults:
Number of minor faults
Number of major faults
Process threads:
Number of threads in state 'running' (R)
Number of threads in state 'interruptible sleeping' (S)
Number of threads in state 'uninterruptible sleeping' (D)
Waiting channel:
Number of threads waiting on a specific wchan

Most of them come from the usual /proc/PID/stat files, while others require visiting process threads via /proc/PID/task.

Do you think they can be added to the process-exporter?

Thank you in advance

ncabatoff · 2017-10-15T00:08:32Z

These are good suggestions; the only one I think doesn't make sense is peak resident size, since you can get that by querying the history stored in prometheus (and over whatever interval you're interested in.)

CPU user/system and major/minor page faults are easy.

As to context switches and per-thread metrics, they're not currently supported by prometheus/procfs, which is the library I'm using right now to fetch the stats. Not necessarily a dealbreaker, but let's make that a different issue (#17) since it'll be more involved and I'd rather knock off the easy ones first.

Finally, wchan: what did you have in mind here? A metric namegroup_num_threads_waiting{groupname, wchan}? That makes me a little nervous in terms of how many different wchan values there might be, this could be a very big metric cardinality-wise.

stefanodoni · 2017-10-16T07:45:52Z

Hi,

Thanks for the answer!

Ok I see the procfs issue, make sense to start on the process metrics first.

On the wchan: Yes I would like to have the count of waiting threads for a specific process by waiting channel. In my experience, wchans are not so huge in number and are very useful in tracking down certain bottlenecks.

If you think the cardinality may be too high, this could perhaps be an option that is off by default.

Do you plan to support the CPU and faults metrics anytime soon?

Thanks!

ncabatoff · 2017-10-17T00:03:23Z

CPU and faults metrics are added. I've created #18 for your wchan feature request.

ncabatoff mentioned this issue Oct 15, 2017

Report on thread states #17

Closed

ncabatoff closed this as completed in 485a7f4 Oct 17, 2017

ncabatoff mentioned this issue Oct 17, 2017

Report on per-thread wchan state #18

Closed

ncabatoff mentioned this issue Oct 27, 2017

Report on context switches #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Support for more process and thread metrics #16

[feature request] Support for more process and thread metrics #16

stefanodoni commented Sep 21, 2017

ncabatoff commented Oct 15, 2017

stefanodoni commented Oct 16, 2017

ncabatoff commented Oct 17, 2017

[feature request] Support for more process and thread metrics #16

[feature request] Support for more process and thread metrics #16

Comments

stefanodoni commented Sep 21, 2017

ncabatoff commented Oct 15, 2017

stefanodoni commented Oct 16, 2017

ncabatoff commented Oct 17, 2017