Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Support for more process and thread metrics #16

Closed
stefanodoni opened this issue Sep 21, 2017 · 3 comments
Closed

[feature request] Support for more process and thread metrics #16

stefanodoni opened this issue Sep 21, 2017 · 3 comments

Comments

@stefanodoni
Copy link

Hi,

First of all many thanks for this great exporter!

I've been using it a lot and started to rely on it more and more to have process-level visibility. Over time, I've found that I needed a couple of more metrics that can be really handy in performance analysis. To cover such needs, I had to resort to other tools such as atop or pidstat, however I have clearly lost the benefit of having a centralized time series database like prometheus.

Here is my current wishlist:

  • Process CPU time
    Total CPU user time
    Total CPU system time
  • Memory
    Peak resident size
  • Context switches:
    Number of voluntary context switches
    Number of involuntary context switches
  • Page faults:
    Number of minor faults
    Number of major faults
  • Process threads:
    Number of threads in state 'running' (R)
    Number of threads in state 'interruptible sleeping' (S)
    Number of threads in state 'uninterruptible sleeping' (D)
  • Waiting channel:
    Number of threads waiting on a specific wchan

Most of them come from the usual /proc/PID/stat files, while others require visiting process threads via /proc/PID/task.

Do you think they can be added to the process-exporter?

Thank you in advance

@ncabatoff
Copy link
Owner

These are good suggestions; the only one I think doesn't make sense is peak resident size, since you can get that by querying the history stored in prometheus (and over whatever interval you're interested in.)

CPU user/system and major/minor page faults are easy.

As to context switches and per-thread metrics, they're not currently supported by prometheus/procfs, which is the library I'm using right now to fetch the stats. Not necessarily a dealbreaker, but let's make that a different issue (#17) since it'll be more involved and I'd rather knock off the easy ones first.

Finally, wchan: what did you have in mind here? A metric namegroup_num_threads_waiting{groupname, wchan}? That makes me a little nervous in terms of how many different wchan values there might be, this could be a very big metric cardinality-wise.

@stefanodoni
Copy link
Author

Hi,

Thanks for the answer!

Ok I see the procfs issue, make sense to start on the process metrics first.

On the wchan: Yes I would like to have the count of waiting threads for a specific process by waiting channel. In my experience, wchans are not so huge in number and are very useful in tracking down certain bottlenecks.

If you think the cardinality may be too high, this could perhaps be an option that is off by default.

Do you plan to support the CPU and faults metrics anytime soon?

Thanks!

@ncabatoff
Copy link
Owner

CPU and faults metrics are added. I've created #18 for your wchan feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants