report thread stats as well #19

flixr · 2017-10-18T16:40:47Z

The things I'm monitoring are often processes with quite a few threads and the most interesting thing is actually not how much resources (CPU) each process needs, but how much the (named) thread in the process need.

It would be great if the per thread cpu usage (/proc/<pid>/task/<tid>/stat) would be reported as well.
The format is the same, as the process stat file, so it seems that the procfs iterator could "just" be applied to the task dir per process pid again.

The text was updated successfully, but these errors were encountered:

ncabatoff · 2017-10-18T21:39:58Z

I like the idea, though it'll probably be an opt-in behaviour based on a command-line argument. I see two ways these metrics could be exposed. (1) Add a new template variable ThreadName that can be used in the template that builds groupnames. Or (2) add a ThreadName label to the CPU and IO metrics and break down all metrics of threaded processes accordingly. Unthreaded processes would have ThreadName=groupname. I'm leaning towards (2) because it would allow doing sum() over groupname to get the existing aggregate metrics. What do you think?

flixr · 2017-10-18T21:55:45Z

Yes, I think this should be optional/configurable, either globally or per process...
For exposing, (2) basically sounds good to me. I would even say the aggregated process metrics should be kept in any case, as this is way cheaper than summing again to get the whole process metrics.
But just saying that having ThreadName=groupname equals the whole process IMHO won't work then...

ncabatoff · 2017-10-18T22:23:57Z

Maybe rather than adding a new dimension to the existing metrics, these per-threadname metrics should have new metric names altogether. Then we keep the aggregated process metrics as you suggest, and people can always use rewrite rules to selectively discard unneeded per-threadname metrics.

ncabatoff · 2017-10-23T18:44:11Z

Well, that was way more disruptive than I expected, so don't be surprised if I broke something. I hadn't done any major work on this project since last year, and coming back to it now I found it was hard to understand. The good news is that I was able to make some of the APIs cleaner, I rewrote the tests to be more understandable (or at least more readable I hope), and we now have per-thread CPU/IO/page fault metrics. The format isn't exactly the same as used for the groupname statistics, but it does reflect where I think they're headed (see #21). Example:

namedprocess_namegroup_thread_count{groupname="chromium-browse",threadname="SimpleCacheWork"} 5
namedprocess_namegroup_thread_cpu_seconds_total{cpumode="system",groupname="chromium-browse",threadname="SimpleCacheWork"} 0.16999999999999993
namedprocess_namegroup_thread_cpu_seconds_total{cpumode="user",groupname="chromium-browse",threadname="SimpleCacheWork"} 0.07000000000000006
namedprocess_namegroup_thread_io_bytes_total{groupname="chromium-browse",iomode="read",threadname="SimpleCacheWork"} 7.613094e+06
namedprocess_namegroup_thread_io_bytes_total{groupname="chromium-browse",iomode="write",threadname="SimpleCacheWork"} 1.871672e+06
namedprocess_namegroup_thread_major_page_faults_total{groupname="chromium-browse",threadname="SimpleCacheWork"} 0
namedprocess_namegroup_thread_minor_page_faults_total{groupname="chromium-browse",threadname="SimpleCacheWork"} 505

I left major/minor page faults unmerged because I don't think it's sensible to sum them, which is what that would imply.

flixr · 2017-10-23T19:23:58Z

Sweet!!
Just one minor comment: maybe rename cpumode to mode to have the same naming as the prometheus node_exporter? But that is not really that important....

ncabatoff · 2017-10-23T20:16:20Z

I thought about it, but it felt weird to qualify "iomode" but not "cpumode".

ncabatoff mentioned this issue Oct 18, 2017

[feature request] add number of threads #5

Closed

ncabatoff closed this as completed in 05e6b32 Oct 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report thread stats as well #19

report thread stats as well #19

flixr commented Oct 18, 2017

ncabatoff commented Oct 18, 2017

flixr commented Oct 18, 2017

ncabatoff commented Oct 18, 2017

ncabatoff commented Oct 23, 2017

flixr commented Oct 23, 2017

ncabatoff commented Oct 23, 2017

report thread stats as well #19

report thread stats as well #19

Comments

flixr commented Oct 18, 2017

ncabatoff commented Oct 18, 2017

flixr commented Oct 18, 2017

ncabatoff commented Oct 18, 2017

ncabatoff commented Oct 23, 2017

flixr commented Oct 23, 2017

ncabatoff commented Oct 23, 2017