Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up cpu_frequncy() on Linux systems (#1851). #1852

Merged
merged 1 commit into from
Jan 7, 2021

Conversation

marxin
Copy link
Contributor

@marxin marxin commented Oct 15, 2020

The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes #1851.

@marxin marxin force-pushed the speed-up-cpu_frequency branch 2 times, most recently from db8c08a to 5e7d22f Compare October 15, 2020 08:39
@marxin
Copy link
Contributor Author

marxin commented Oct 20, 2020

Can you please make a review @giampaolo ?

@DerDakon
Copy link

Would this work?

castor ~ #  cat /proc/cpuinfo 
cpu             : UltraSparc T2 (Niagara2)
fpu             : UltraSparc T2 integrated FPU
pmu             : niagara2
prom            : OBP 4.33.6 2012/03/14 08:07
type            : sun4v
ncpus probed    : 64
ncpus active    : 64
D$ parity tl1   : 0
I$ parity tl1   : 0
cpucaps         : flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit
Cpu0ClkTck      : 000000005458c3a0
Cpu1ClkTck      : 000000005458c3a0
dakon@catbus ~ $  cat /proc/cpuinfo 
cpu             : UltraSparc T5 (Niagara5)
fpu             : UltraSparc T5 integrated FPU
pmu             : niagara5
prom            : OBP 4.38.4.a 2016/03/11 11:12
type            : sun4v
ncpus probed    : 256
ncpus active    : 256
D$ parity tl1   : 0
I$ parity tl1   : 0
cpucaps         : flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc,ima,pause,cbcond,aes,des,kasumi,camellia,md5,sha1,sha256,sha512,mpmul,montmul,montsqr,crc32c
Cpu0ClkTck      : 00000000d6924470
Cpu1ClkTck      : 00000000d6924470

@marxin
Copy link
Contributor Author

marxin commented Dec 29, 2020

Would this work?

Ahh, we need to verify that cpu MHz is present. I can fix that, but I haven't received any feedback so far from @giampaolo ...

@DerDakon
Copy link

The numbers give the actual CPU frequency, they are just encoded in hex for whatever reason.

@giampaolo
Copy link
Owner

giampaolo commented Dec 29, 2020

Hello there. I re-read the original issue. As a quick recap (correct me if I'm wrong), the problem as per #1851 (comment) is this: if we have 128 CPUs, that means we'll have to read 128 * 3 = 385 files (current, min and max frequencies), whereas if we get current frequency from /proc/cpuinfo we can get away with reading 128 * 2 + 1 = 257 files instead. Is this correct?

I tried this PR with 8 CPUs, and it's slower:

~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.cpu_freq()"
1000 loops, best of 3: 355 usec per loop

~/svn/psutil {marxin-speed-up-cpu_frequency}$ python3 -m timeit -s "import psutil" "psutil.cpu_freq()"
1000 loops, best of 3: 476 usec per loop

What's the speedup with your patch + 128 CPUs?

@giampaolo
Copy link
Owner

Also: is lscpu command slow as well?

giampaolo added a commit that referenced this pull request Dec 29, 2020
Micro optimization in reference to #1852 and #1851.
Use glob.glob(), which internally relies on os.scandir()
in order to list /sys/devices/system/cpu/cpufreq files.
In doing so, we avoid os.path.exists() for each CPU, which
internally uses os.stat().

Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com>
@marxin
Copy link
Contributor Author

marxin commented Jan 5, 2021

Hello.

Thanks for reply.

Hello there. I re-read the original issue. As a quick recap (correct me if I'm wrong), the problem as per #1851 (comment) is this: if we have 128 CPUs, that means we'll have to read 128 * 3 = 385 files (current, min and max frequencies), whereas if we get current frequency from /proc/cpuinfo we can get away with reading 128 * 2 + 1 = 257 files instead. Is this correct?

No, reading min a max frequencies from /sys/devices/system/cpu/cpufreq/ is for free (almost). The problematic read is scaling_cur_freq as it has to read a CPU counter:

time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq && time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
1194685

real	0m0.019s
user	0m0.002s
sys	0m0.001s
1194685

real	0m0.002s
user	0m0.002s
sys	0m0.001s

The benefit of /proc/cpuinfo is that displays a cached current frequency.

I tried this PR with 8 CPUs, and it's slower:

~/svn/psutil {master}$ python3 -m timeit -s "import psutil" "psutil.cpu_freq()"
1000 loops, best of 3: 355 usec per loop

~/svn/psutil {marxin-speed-up-cpu_frequency}$ python3 -m timeit -s "import psutil" "psutil.cpu_freq()"
1000 loops, best of 3: 476 usec per loop

Heh :) Note these values quite close to each other (and very small), if I'm corrent it's 0.000355 s.

On my model name : AMD EPYC 7601 32-Core Processor I see:

BEFORE my patch:

time python3 -c "import psutil; print(psutil.cpu_freq())"
scpufreq(current=1237.2858906250003, min=1200.0, max=2200.0)

real	0m2.639s
user	0m0.084s
sys	0m0.015s

after my patch:

time /tmp/venv/bin/python3 -c "import psutil; print(psutil.cpu_freq())"
scpufreq(current=1232.5320625, min=1200.0, max=2200.0)

real	0m0.118s
user	0m0.065s
sys	0m0.033s

That's 30x faster and I get the same speed for percpu=True.

What's the speedup with your patch + 128 CPUs?

About lscpu, it does not print min/max/current per CPU:

lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          128
On-line CPU(s) list:             0-127
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       2
NUMA node(s):                    8
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           1
Model name:                      AMD EPYC 7601 32-Core Processor
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         1195.699
CPU max MHz:                     2200.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4391.46
Virtualization:                  AMD-V
L1d cache:                       2 MiB
L1i cache:                       4 MiB
L2 cache:                        32 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-7,64-71
NUMA node1 CPU(s):               8-15,72-79
NUMA node2 CPU(s):               16-23,80-87
NUMA node3 CPU(s):               24-31,88-95
NUMA node4 CPU(s):               32-39,96-103
NUMA node5 CPU(s):               40-47,104-111
NUMA node6 CPU(s):               48-55,112-119
NUMA node7 CPU(s):               56-63,120-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq mon
                                 itor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_psta
                                 te sme ssbd sev ibpb vmmcall sev_es fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists paus
                                 efilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

@giampaolo
Copy link
Owner

That's 30x faster and I get the same speed for percpu=True.

OK, I'm convinced. =)
We should also cover what @DerDakon says here about SPARC: #1852 (comment). In summary, if I got everything right:

  • get current freq from /proc/cpuinfo (achieves speedup)
  • if not available (SPARC) get it from /sys fs (as we do now)
  • if not available, raise NotImplementedError (as we do now)
  • get min freq from /sys fs, if not available set it to 0.0 (as we do now)
  • get max freq from /sys fs, if not available set it to 0.0 (as we do now)

Other than that (minor change): /proc/cpuinfo is always available (this assumption already exists elsewhere in _pslinux.py), so this module level check should go:

if os.path.exists("/proc/cpuinfo"):

@DerDakon
Copy link

DerDakon commented Jan 5, 2021

The current frequency (and AFAIK the only supported one) is available on Sparc, it's just in a different format.

@giampaolo
Copy link
Owner

>>> int("00000000d6924470", 16)
3599910000

Is there a way to pre-emptively know if you're on SPARC?
Does SPARC have /sys/devices/system/cpu/cpufreq/policy0 or /sys/devices/system/cpu/cpu0/cpufreq?

@DerDakon
Copy link

DerDakon commented Jan 6, 2021

The sysfs files do not exist at least on my machines. You can either call uname -r and check for "sparc" in it or match on one of the first lines of /proc/cpuinfo as shown above, either type containing "sun" or cpu containing "Sparc".

@marxin marxin force-pushed the speed-up-cpu_frequency branch 2 times, most recently from 59a0629 to b101d24 Compare January 6, 2021 08:58
@marxin
Copy link
Contributor Author

marxin commented Jan 6, 2021

Other than that (minor change): /proc/cpuinfo is always available (this assumption already exists elsewhere in _pslinux.py), so this module level check should go:

All right, I assumed that. About the SPARC issue, I simply first parse cpuinfo for Mhz values and if it is not present I fallback to the second approach.

The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
@giampaolo
Copy link
Owner

@DerDakon what's the output of os.uname() on SPARC?

@DerDakon
Copy link

DerDakon commented Jan 6, 2021

>>> os.uname()
posix.uname_result(sysname='Linux', nodename='castor', release='5.10.3-gentoo-sparc64', version='#1 SMP Wed Dec 30 12:57:02 CET 2020', machine='sparc64')

@giampaolo
Copy link
Owner

OK, this LGTM and fixes #1851. Let's handle SPARC separately.
@DerDakon could you run tests on SPARC and open a separate issue in case there's something failing?

@giampaolo giampaolo merged commit 6e494bd into giampaolo:master Jan 7, 2021
giampaolo added a commit that referenced this pull request Jan 7, 2021
Signed-off-by: Giampaolo Rodola <g.rodola@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Linux] psutil.cpu_frequency() is slow
3 participants