-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Support for more process and thread metrics #16
Comments
These are good suggestions; the only one I think doesn't make sense is peak resident size, since you can get that by querying the history stored in prometheus (and over whatever interval you're interested in.) CPU user/system and major/minor page faults are easy. As to context switches and per-thread metrics, they're not currently supported by prometheus/procfs, which is the library I'm using right now to fetch the stats. Not necessarily a dealbreaker, but let's make that a different issue (#17) since it'll be more involved and I'd rather knock off the easy ones first. Finally, wchan: what did you have in mind here? A metric namegroup_num_threads_waiting{groupname, wchan}? That makes me a little nervous in terms of how many different wchan values there might be, this could be a very big metric cardinality-wise. |
Hi, Thanks for the answer! Ok I see the procfs issue, make sense to start on the process metrics first. On the wchan: Yes I would like to have the count of waiting threads for a specific process by waiting channel. In my experience, wchans are not so huge in number and are very useful in tracking down certain bottlenecks. If you think the cardinality may be too high, this could perhaps be an option that is off by default. Do you plan to support the CPU and faults metrics anytime soon? Thanks! |
CPU and faults metrics are added. I've created #18 for your wchan feature request. |
Hi,
First of all many thanks for this great exporter!
I've been using it a lot and started to rely on it more and more to have process-level visibility. Over time, I've found that I needed a couple of more metrics that can be really handy in performance analysis. To cover such needs, I had to resort to other tools such as atop or pidstat, however I have clearly lost the benefit of having a centralized time series database like prometheus.
Here is my current wishlist:
Total CPU user time
Total CPU system time
Peak resident size
Number of voluntary context switches
Number of involuntary context switches
Number of minor faults
Number of major faults
Number of threads in state 'running' (R)
Number of threads in state 'interruptible sleeping' (S)
Number of threads in state 'uninterruptible sleeping' (D)
Number of threads waiting on a specific wchan
Most of them come from the usual /proc/PID/stat files, while others require visiting process threads via /proc/PID/task.
Do you think they can be added to the process-exporter?
Thank you in advance
The text was updated successfully, but these errors were encountered: