Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upIncorrect CPU statistics for mode "iowait" #2856
Comments
This comment has been minimized.
This comment has been minimized.
|
@ArmanArbab Are you still getting this issue? The first thing I would suspect here is a broken data source, like Have you looked at the raw exported data from Node Exporter over time by looking at their metrics endpoints and seeing whether the iowait counters on the affected cores go down or otherwise behave weirdly? It would also be interesting to just see |
brian-brazil
added
the
kind/question
label
Jul 14, 2017
This comment has been minimized.
This comment has been minimized.
|
Closing as not a problem with Prometheus itself. |
brian-brazil
closed this
Jul 14, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
ArmanArbab commentedJun 16, 2017
•
edited
What did you do?
Attempt to show the percent of time spent, at a particular instant, in a particular mode by the cpu (or rather, the average of all the cpus)
What did you expect to see?
Values for each mode under 100%, with the values of all modes summed together equaling 100% when summed.
What did you see instead? Under which circumstances?
Unreasonable values for the "iowait" mode. Below are a few queries I executed and their associated outputs: *Whether I specify either one of my two instances, "**:9100" or "***:9100", or neither (and thus take the average over both instances), the queries and their associated results/graphs are almost identical.
Time spent in "idle mode" looks reasonable:
"(avg by (mode) (rate(node_cpu{mode="idle"}[5m])) * 100)"
1.)
Time spent in "iowait" mode does not- values graphed are almost exclusively above 100%, which should not be possible:
"(avg by (mode) (rate(node_cpu{mode="iowait"}[5m])) * 100)"
2.)
The time spent in all modes, except for "iowait", is reasonable/correct:
"(avg by (mode) (rate(node_cpu{}[5m])) * 100)"
3.)
I noticed that the "iowait" mode was also the only mode which had any counter resets. Perhaps this is related to the unreasonable values in the second screenshot (though my impression was that the "rate()" function was supposed to automatically handle counter resets):
"avg by (mode) (resets (node_cpu{}[5m])) "
4.)
A closer inspection of query in the second screenshot reveals that most CPUs have reasonable/ostensibly correct values, while a couple of cpus are responsible for the high error:
"(avg by (mode, cpu) (rate(node_cpu{mode="iowait"}[5m])) * 100)"
5.)




6.)
7.)
8.)
9.)
Note that when changing the mode to "idle" (or any other mode for that matter), no CPUs have unreasonable values:
"(avg by (mode, cpu) (rate(node_cpu{mode="idle"}[5m])) * 100)"
10.)


11.)
12.)
A similar pattern emerges when inspecting the counter resets of iowait counter for each CPU (almost all have a value of zero, while a few, not necessarily the same ones that have unreasonably high values, in the screenshots above, have non zero values for the number of counter resets):
"avg by (mode, cpu) (resets (node_cpu{mode="iowait"}[5m])) "
13.)


14.)
15.)
Again, there are no counter resets for the "idle" mode (or any other mode for that matter, except for "iowait". Refer to screenshot 4):
"(avg by (mode, cpu) (resets(node_cpu{mode="idle}[5m]))
16.)
A quick side note, - changing "rate" to "irate" does not result in reasonable values for the "iowait" mode:
"(avg by (mode) (irate(node_cpu{}[5m])) * 100)"
17.)
Environment
A prometheus server running on a CentOS7 VM, collecting statistics from 2 instances, each a server running Node Exporter on CentOS7.
System information:
Linux 3.10.0-514.10.2.el7.x86_64 x86_64
Prometheus version:
prometheus, version 1.7.0 (branch: master, revision: bfa37c8)
build user: root@7a6329cc02bb
build date: 20170607-09:43:48
go version: go1.8.3
Prometheus configuration file: