add support for monitoring thp, ballooning, zswap, ksm cow#15000
add support for monitoring thp, ballooning, zswap, ksm cow#15000ktsaou merged 6 commits intonetdata:masterfrom
Conversation
There was a problem hiding this comment.
lgtm, but a small note:
Usually having "events/s" as units is a red flag because it means we are aggregating (chart level) non-aggregatable metrics - grouping by host/system w/o selecting a dimension (the Cloud UI) will give useless values. E.g.
thp_swpout
is incremented every time a huge page is swapout in one
piece without splitting.
thp_swpout_fallback
is incremented if a huge page has to be split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.
The sum of these events doesn't seem useful/correct to me.
And this PR is 15000th contribution to this repo 🎉
How do you suggest to do it? |
|
I propose to go with the current implementation, which is why I approved the PR. We do such aggregations pretty often. By "such aggregations" I mean correct when grouped by dimension, wrong when grouped by anything else without selecting a dimension (some obvious examples: system load average, system pressure). cc @ralphm
I don't find them useful because the sum of:
doesn't seem to make sense to me, they look like 2 completely different metrics. But my understanding can be wrong because it is based only on metrics description, I am not a specialist.
If |
This reverts commit 54b9464.
|
@cakrit this needs to go into the release notes. The description is in the document I shared with you about hugepages. |
Modified the
vmstatmodule ofprocmodule to monitor:Especially the THP is very important since it seems enabled on most Linux distros by default.