-
-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: incorrect CPU frequencies when some CPUs are offline #648
Comments
Seeing you mention |
Yes, this is on Linux, Fedora 34 to be precise. And sorry I didn't remember to mention the htop version. I'm seeing this both on 3.0.5 and git master. It doesn't necessarily look like a regression to me but rather just something that might not have occurred as a possibility to whoever wrote the frequency reading code. The logic of the bug/omission seems pretty clear once you've seen it and read the code, so I haven't even tried bisecting its history. |
Fixes: htop-dev#648 If some CPUs have been turned offline, those CPUs will not appear in the active CPU count parsed from /proc/stat. Offline CPUs that are still present may still count in terms of sequential CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/. Reading the frequencies for the N active CPUs from CPUFreq information in /sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some of the CPUs [0..N-1] are actually offline. If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores, the code would attempt to read the frequencies for the remaining two online CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU entries from which to read the frequencies would be {0,2}. The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq suffers from a similar problem. This fixes the reporting of frequencies in case of offline CPUs being mixed within the range of online CPUs.
Fixes: htop-dev#648 If some CPUs have been turned offline, those CPUs will not appear in the active CPU count parsed from /proc/stat. Offline CPUs that are still present may still count in terms of sequential CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/. Reading the frequencies for the N active CPUs from CPUFreq information in /sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some of the CPUs [0..N-1] are actually offline. If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores, the code would attempt to read the frequencies for the remaining two online CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU entries from which to read the frequencies would be {0,2}. The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq suffers from a similar problem. This fixes the reporting of frequencies in case of offline CPUs being mixed within the range of online CPUs.
Fixes: htop-dev#648 If some CPUs have been turned offline, those CPUs will not appear in the active CPU count parsed from /proc/stat. Offline CPUs that are still present may still count in terms of sequential CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/. Reading the frequencies for the N active CPUs from CPUFreq information in /sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some of the CPUs [0..N-1] are actually offline. If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores, the code would attempt to read the frequencies for the remaining two online CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU entries from which to read the frequencies would be {0,2}. The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq suffers from a similar problem. This fixes the reporting of frequencies in case of offline CPUs being mixed within the range of online CPUs.
@mwahlroos has this be solved by #656? |
@cgzones yes, that does resolve this. Thanks for your work on this! |
The CPU frequency figures may show incorrect (or unnecessarily "N/A") values for CPU cores if some CPUs or cores have been taken logically offline.
The way I read it, the code currently sums the number of CPUs from /proc/stat, and then proceeds to read the frequencies for CPUs 0..N-1 (for N total CPUs as determined from the stat file) either from CPUfreq information in sysfs, or by parsing /proc/cpuinfo if frequency information from CPUfreq is unavailable or too slow to read.
This works correctly if all CPUs from 0 to N-1 are online.
However, if, say, on a quad-core system (CPUs 0..3), cores 1 and 3 have been taken logically offline, the total number of CPUs that appear in /proc/stat is 2, and the code attempts to read the frequencies for cores 0..1 rather than 0 and 2, although CPU 1 is offline with no valid frequency information available, and the second online core is actually number 2.
On my system, the frequency read from CPUfreq for CPU 1 would then appear to be some fixed value that's incorrect for the actual second online core. If the frequencies are read from /proc/cpuinfo instead, the frequency reported in htop for the second core would appear to be "N/A" instead.
This may be a bit of a weird corner case, but I've seen it when taking every second core offline to soft-disable SMT, so I thought I'd report it. I don't know how the entries in /sys/devices/system/cpu/ change if CPUs are disabled in a system with actual CPU hotplug, but I suppose this might affect some such systems as well.
I've been working on this a bit but I don't have a fix yet.
The text was updated successfully, but these errors were encountered: