[Linux] Provide access to Pressure Stall Information metrics #1932

MrPippin66 · 2021-04-10T14:59:14Z

OS: Linux (kernels at 4.20 or higher, unless vendor has back ported feature)
Type: Performance metrics for CPU, Memory & IO

Summary:

https://www.kernel.org/doc/html/latest/accounting/psi.html

Though the is a relative new feature, it will become a common use of information for determining performance issues on systems.

I would requests this information become available (if enabled in OS for psi and/or cgroup) via the psutil framework, primarily so that tools built atop this framework can readily use this for monitoring purposes.

Ultimately, I'd suggest a new category (psi) to gather these values from.

psutil.psi_cpu()

psutil.psi_memory()

psutil.psi_io()

I'd request both the system level and cgroup2 level data be presented for each category.

giampaolo · 2021-04-11T17:45:37Z

Mmm... It's the first time I hear about this. I'm trying to understand how it works (https://unixism.net/2019/08/linux-pressure-stall-information-psi-by-example/). The information in those 3 files is easy to extract. What's more difficult is understanding how to interpret that data and imagine an actual use case. For instance, psutil doc shows an actual use case for psutil.getloadavg(), showing how to translate those raw numbers to get a percentage of CPU usage/load over time:

>>> import psutil
>>> psutil.getloadavg()
(3.14, 3.89, 4.67)
>>> psutil.cpu_count()
10
>>> # percentage representation over the last 1, 5, 15 mins
>>> [x / psutil.cpu_count() * 100 for x in psutil.getloadavg()]
[31.4, 38.9, 46.7]

If we were to add this I would like to see something similar to provide in the doc: some actual code which does something useful with those raw numbers extracted from /proc/pressure. But in order to do that I/we'd have to properly understand how this works first. =)

MrPippin66 · 2021-04-14T13:51:37Z

PSI was developed by Facebook. They posted a decent explanation of how they use it, and the benefits it's given them.

https://lwn.net/Articles/759658/

And FYI, that article gives detailed response of the issue with "getloadavg", which goes above the issues we've encountered (namely that you can have several active threads that are active for a small period of their allocated slice. They manifest as high load averages, but overall low CPU utilization).

And swap thrashing isn't the only memory utilization metric that results in low processor throughput, which this facility would include, without having complicated monitoring scripts (high reclaim rates, high faults, etc.)

I think having this available would simplify monitoring,. and hopefully would be used upstream in monitoring products, like "ncpa", etc.

MrPippin66 · 2023-02-07T15:18:03Z

@giampaolo Is this still a feature candidate?

MrPippin66 · 2023-02-07T19:17:45Z

FYI, PSI data is reported in 'sar' data for all current Linux distributions. I think being able to report this data in 'psutil' merits attention.

MrPippin66 added the enhancement label Apr 10, 2021

github-actions bot added the linux label Apr 10, 2021

giampaolo added the api label Apr 11, 2021

giampaolo added new-api and removed api labels May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Linux] Provide access to Pressure Stall Information metrics #1932

[Linux] Provide access to Pressure Stall Information metrics #1932

MrPippin66 commented Apr 10, 2021

giampaolo commented Apr 11, 2021 •

edited

MrPippin66 commented Apr 14, 2021

MrPippin66 commented Feb 7, 2023

MrPippin66 commented Feb 7, 2023

[Linux] Provide access to Pressure Stall Information metrics #1932

[Linux] Provide access to Pressure Stall Information metrics #1932

Comments

MrPippin66 commented Apr 10, 2021

giampaolo commented Apr 11, 2021 • edited

MrPippin66 commented Apr 14, 2021

MrPippin66 commented Feb 7, 2023

MrPippin66 commented Feb 7, 2023

giampaolo commented Apr 11, 2021 •

edited