Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report thread stats as well #19

Closed
flixr opened this issue Oct 18, 2017 · 6 comments
Closed

report thread stats as well #19

flixr opened this issue Oct 18, 2017 · 6 comments

Comments

@flixr
Copy link
Contributor

flixr commented Oct 18, 2017

The things I'm monitoring are often processes with quite a few threads and the most interesting thing is actually not how much resources (CPU) each process needs, but how much the (named) thread in the process need.

It would be great if the per thread cpu usage (/proc/<pid>/task/<tid>/stat) would be reported as well.
The format is the same, as the process stat file, so it seems that the procfs iterator could "just" be applied to the task dir per process pid again.

@ncabatoff
Copy link
Owner

I like the idea, though it'll probably be an opt-in behaviour based on a command-line argument. I see two ways these metrics could be exposed. (1) Add a new template variable ThreadName that can be used in the template that builds groupnames. Or (2) add a ThreadName label to the CPU and IO metrics and break down all metrics of threaded processes accordingly. Unthreaded processes would have ThreadName=groupname. I'm leaning towards (2) because it would allow doing sum() over groupname to get the existing aggregate metrics. What do you think?

@flixr
Copy link
Contributor Author

flixr commented Oct 18, 2017

Yes, I think this should be optional/configurable, either globally or per process...
For exposing, (2) basically sounds good to me. I would even say the aggregated process metrics should be kept in any case, as this is way cheaper than summing again to get the whole process metrics.
But just saying that having ThreadName=groupname equals the whole process IMHO won't work then...

@ncabatoff
Copy link
Owner

Maybe rather than adding a new dimension to the existing metrics, these per-threadname metrics should have new metric names altogether. Then we keep the aggregated process metrics as you suggest, and people can always use rewrite rules to selectively discard unneeded per-threadname metrics.

@ncabatoff
Copy link
Owner

Well, that was way more disruptive than I expected, so don't be surprised if I broke something. I hadn't done any major work on this project since last year, and coming back to it now I found it was hard to understand. The good news is that I was able to make some of the APIs cleaner, I rewrote the tests to be more understandable (or at least more readable I hope), and we now have per-thread CPU/IO/page fault metrics. The format isn't exactly the same as used for the groupname statistics, but it does reflect where I think they're headed (see #21). Example:

namedprocess_namegroup_thread_count{groupname="chromium-browse",threadname="SimpleCacheWork"} 5
namedprocess_namegroup_thread_cpu_seconds_total{cpumode="system",groupname="chromium-browse",threadname="SimpleCacheWork"} 0.16999999999999993
namedprocess_namegroup_thread_cpu_seconds_total{cpumode="user",groupname="chromium-browse",threadname="SimpleCacheWork"} 0.07000000000000006
namedprocess_namegroup_thread_io_bytes_total{groupname="chromium-browse",iomode="read",threadname="SimpleCacheWork"} 7.613094e+06
namedprocess_namegroup_thread_io_bytes_total{groupname="chromium-browse",iomode="write",threadname="SimpleCacheWork"} 1.871672e+06
namedprocess_namegroup_thread_major_page_faults_total{groupname="chromium-browse",threadname="SimpleCacheWork"} 0
namedprocess_namegroup_thread_minor_page_faults_total{groupname="chromium-browse",threadname="SimpleCacheWork"} 505

I left major/minor page faults unmerged because I don't think it's sensible to sum them, which is what that would imply.

@flixr
Copy link
Contributor Author

flixr commented Oct 23, 2017

Sweet!!
Just one minor comment: maybe rename cpumode to mode to have the same naming as the prometheus node_exporter? But that is not really that important....

@ncabatoff
Copy link
Owner

I thought about it, but it felt weird to qualify "iomode" but not "cpumode".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants