Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcp-mpstat shows inconsistent values as compared to mpstat #1922

Open
orasagar opened this issue Mar 19, 2024 · 3 comments
Open

pcp-mpstat shows inconsistent values as compared to mpstat #1922

orasagar opened this issue Mar 19, 2024 · 3 comments

Comments

@orasagar
Copy link
Contributor

orasagar commented Mar 19, 2024

$ mpstat -P ALL ; pcp mpstat -P ALL -s4
Linux 5.15.0-203.146.5.1.el9uek.x86_64 (sagsagar-pcp-pmval-test) 03/19/2024 x86_64 (4 CPU)

10:38:38 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
10:38:38 AM all 0.19 0.01 0.12 0.00 0.00 0.00 0.01 0.00 0.00 99.67
10:38:38 AM 0 0.16 0.01 0.12 0.00 0.00 0.00 0.00 0.00 0.00 99.71
10:38:38 AM 1 0.20 0.01 0.14 0.00 0.00 0.00 0.01 0.00 0.00 99.64
10:38:38 AM 2 0.19 0.00 0.13 0.00 0.00 0.00 0.01 0.00 0.00 99.67
10:38:38 AM 3 0.20 0.01 0.11 0.00 0.00 0.00 0.01 0.00 0.00 99.67
Linux 5.15.0-203.146.5.1.el9uek.x86_64 (sagsagar-pcp-pmval-test) 03/19/24 x86_64 (4 CPU)

Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle
10:38:39 all 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.53
10:38:39 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.78
10:38:39 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.78
10:38:39 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.78
10:38:39 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 98.78

Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle
10:38:40 all 0.25 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.0 99.52
10:38:40 0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.52
10:38:40 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.51
10:38:40 2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 98.52
10:38:40 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.52

Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle
10:38:41 all 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.58
10:38:41 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.58
10:38:41 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.58
10:38:41 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.58
10:38:41 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 98.59
$

Here pcp-mpstat values are quite far from mpstat values

  • %steal and %nice was never reported as non-zero by pcp-mpstat
  • %usr and %sys values are way off

The above output has been taken on live system
VirtualBox_ol8_latest_19_03_2024_16_13_49

@kmcdonell
Copy link
Member

For mpstat you have no "interval" parameter, so the reported values are the averaged values since boot time (different versions of mpstat over the past 50 years have had slightly different semantics, but the for the Linux version you're using here I think this is correct).

For pcp-mpstat there is no way to report the averaged values since boot time, so the samples in the sample above are 4 consecutive live samples.

This is one place where mpstat and pcp-mpstat are not the same, and perhaps the pcp-mpstat(1) man page should call this out.

I've tried a more oranges-to-oranges comparison locally (only reporting the first two and last two CPUs). I ran these side-by-side, so they're not exactly the same sample interval, but close ...

$ pcp-mpstat -P 0,1,10,11 -t 10 -s 4
Linux  5.15.0-94-generic  (bozo.localdomain)  04/05/24  x86_64    (12 CPU)

Timestamp 	CPU	%usr 	%nice 	%sys 	%iowait 	%irq 	%soft 	%steal 	%guest 	%nice 	%idle 
07:02:55  	0  	25.13	0.0   	1.89 	3.49    	0.0  	0.0   	0.0    	22.63  	0.0   	67.5  
07:02:55  	1  	62.02	0.0   	1.2  	2.29    	0.0  	0.0   	0.0    	61.32  	0.0   	33.4  
07:02:55  	10 	2.39 	0.0   	2.79 	14.96   	0.0  	0.3   	0.0    	0.3    	0.0   	77.77 
07:02:55  	11 	4.49 	0.0   	4.59 	6.28    	0.0  	0.1   	0.0    	0.1    	0.0   	82.86 
07:03:05  	0  	2.79 	0.0   	3.29 	5.28    	0.0  	0.1   	0.0    	0.3    	0.0   	85.82 
07:03:05  	1  	5.58 	0.0   	3.69 	6.08    	0.0  	0.0   	0.0    	0.1    	0.0   	82.93 
07:03:05  	10 	2.69 	0.0   	3.09 	7.28    	0.0  	0.1   	0.0    	0.7    	0.0   	85.13 
07:03:05  	11 	72.17	0.0   	1.4  	0.3     	0.0  	0.0   	0.0    	70.77  	0.0   	26.12 
07:03:15  	0  	2.79 	0.0   	3.09 	2.59    	0.0  	0.0   	0.0    	0.4    	0.0   	90.77 
07:03:15  	1  	2.69 	0.0   	1.2  	1.79    	0.0  	0.0   	0.0    	1.0    	0.0   	92.47 
07:03:15  	10 	7.07 	0.0   	2.09 	5.28    	0.0  	0.1   	0.0    	5.98   	0.0   	84.9  
07:03:15  	11 	37.07	0.0   	0.7  	0.6     	0.0  	0.0   	0.0    	36.47  	0.0   	61.28 
$ mpstat -P 0,1,10,11 10 3
Linux 5.15.0-94-generic (bozo.localdomain) 	05/04/24 	_x86_64_	(12 CPU)

07:02:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:02:56       0    2.35    0.00    1.84    3.47    0.00    0.00    0.00   23.16    0.00   69.18
07:02:56       1    0.71    0.00    1.31    2.32    0.00    0.00    0.00   58.69    0.00   36.97
07:02:56      10    2.04    0.00    2.65   15.70    0.00    0.20    0.00    0.31    0.00   79.10
07:02:56      11    4.47    0.00    4.37    5.28    0.00    0.10    0.00    3.56    0.00   82.22

07:02:56     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:06       0    2.77    0.00    3.49    6.36    0.00    0.10    0.00    0.31    0.00   86.97
07:03:06       1    5.70    0.00    3.77    6.21    0.00    0.00    0.00    0.10    0.00   84.22
07:03:06      10    2.14    0.00    3.56    7.02    0.00    0.10    0.00    0.71    0.00   86.47
07:03:06      11    1.40    0.00    1.40    0.30    0.00    0.00    0.00   70.60    0.00   26.30

07:03:06     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
07:03:16       0    2.32    0.00    3.03    1.72    0.00    0.00    0.00    0.40    0.00   92.53
07:03:16       1    1.73    0.00    1.52    1.83    0.00    0.00    0.00    1.02    0.00   93.91
07:03:16      10    0.91    0.00    1.71    5.33    0.00    0.10    0.00    8.75    0.00   83.20
07:03:16      11    0.60    0.00    0.70    0.60    0.00    0.00    0.00   33.73    0.00   64.36

(I chopped the "Average" lines from mpstat ... another undocumented difference 😄)

Now these look pretty OK except they don't agree on %usr when %guest is significant ... I think this may be a difference of semantics, as the PCP metric that's behind the %usr number is

$ pminfo -t kernel.percpu.cpu.user
kernel.percpu.cpu.user [percpu user CPU time metric from /proc/stat, including guest CPU time]

and mpstat is clearly not including %guest in %usr.

Breaking the semantics of the PCP metric is probably a bad idea at this stage, so I think the options might be to (a) make pcp-mpstat the same as mpstat or (b) document the difference ... neither is ideal, so I'll solicit feedback.

@natoscott
Copy link
Member

@kmcdonell I think this may be the kind of situation Mark added the vuser metric for? Might be as simple as just changing over to that metric here to match mpstat if we're lucky...

$ pminfo -t kernel.all.cpu.vuser
kernel.all.cpu.vuser [total user CPU time from /proc/stat for all CPUs, excluding guest CPU time]

@orasagar
Copy link
Contributor Author

orasagar commented Apr 8, 2024

@kmcdonell that makes sense.
But one more thing I have observed is that the value of ALL is not the average of all individual CPU values in the PCP case but in the case of mpstat, the value is average.
I did look around the code and it seems we have a different metric with per-calculated ALL values such as "kernel.all.cpu.user".
Is there any specific reason we are doing this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants