Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: incorrect CPU frequencies when some CPUs are offline #648

Closed
mwahlroos opened this issue Jun 6, 2021 · 4 comments
Closed

Linux: incorrect CPU frequencies when some CPUs are offline #648

mwahlroos opened this issue Jun 6, 2021 · 4 comments
Labels
bug 🐛 Something isn't working Linux 🐧 Linux related issues question ❔ Further information is requested
Milestone

Comments

@mwahlroos
Copy link

mwahlroos commented Jun 6, 2021

The CPU frequency figures may show incorrect (or unnecessarily "N/A") values for CPU cores if some CPUs or cores have been taken logically offline.

The way I read it, the code currently sums the number of CPUs from /proc/stat, and then proceeds to read the frequencies for CPUs 0..N-1 (for N total CPUs as determined from the stat file) either from CPUfreq information in sysfs, or by parsing /proc/cpuinfo if frequency information from CPUfreq is unavailable or too slow to read.

This works correctly if all CPUs from 0 to N-1 are online.

However, if, say, on a quad-core system (CPUs 0..3), cores 1 and 3 have been taken logically offline, the total number of CPUs that appear in /proc/stat is 2, and the code attempts to read the frequencies for cores 0..1 rather than 0 and 2, although CPU 1 is offline with no valid frequency information available, and the second online core is actually number 2.

On my system, the frequency read from CPUfreq for CPU 1 would then appear to be some fixed value that's incorrect for the actual second online core. If the frequencies are read from /proc/cpuinfo instead, the frequency reported in htop for the second core would appear to be "N/A" instead.

This may be a bit of a weird corner case, but I've seen it when taking every second core offline to soft-disable SMT, so I thought I'd report it. I don't know how the entries in /sys/devices/system/cpu/ change if CPUs are disabled in a system with actual CPU hotplug, but I suppose this might affect some such systems as well.

I've been working on this a bit but I don't have a fix yet.

@BenBE BenBE added bug 🐛 Something isn't working Linux 🐧 Linux related issues labels Jun 6, 2021
@BenBE
Copy link
Member

BenBE commented Jun 6, 2021

Seeing you mention procfs I'll assume you're noticing this issue on Linux. Also given this code hasn't been touched since the last release AFAIR, this ain't be a very recent breakage, but something that happened to break when reading the sensor information was last touched.

@mwahlroos
Copy link
Author

Yes, this is on Linux, Fedora 34 to be precise.

And sorry I didn't remember to mention the htop version. I'm seeing this both on 3.0.5 and git master. It doesn't necessarily look like a regression to me but rather just something that might not have occurred as a possibility to whoever wrote the frequency reading code. The logic of the bug/omission seems pretty clear once you've seen it and read the code, so I haven't even tried bisecting its history.

mwahlroos added a commit to mwahlroos/htop that referenced this issue Jun 9, 2021
Fixes: htop-dev#648

If some CPUs have been turned offline, those CPUs will not appear in the
active CPU count parsed from /proc/stat.

Offline CPUs that are still present may still count in terms of sequential
CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/.
Reading the frequencies for the N active CPUs from CPUFreq information in
/sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some
of the CPUs [0..N-1] are actually offline.

If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores,
the code would attempt to read the frequencies for the remaining two online
CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU
entries from which to read the frequencies would be {0,2}.

The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq
suffers from a similar problem.

This fixes the reporting of frequencies in case of offline CPUs being mixed
within the range of online CPUs.
mwahlroos added a commit to mwahlroos/htop that referenced this issue Jun 9, 2021
Fixes: htop-dev#648

If some CPUs have been turned offline, those CPUs will not appear in the
active CPU count parsed from /proc/stat.

Offline CPUs that are still present may still count in terms of sequential
CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/.
Reading the frequencies for the N active CPUs from CPUFreq information in
/sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some
of the CPUs [0..N-1] are actually offline.

If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores,
the code would attempt to read the frequencies for the remaining two online
CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU
entries from which to read the frequencies would be {0,2}.

The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq
suffers from a similar problem.

This fixes the reporting of frequencies in case of offline CPUs being mixed
within the range of online CPUs.
mwahlroos added a commit to mwahlroos/htop that referenced this issue Jun 9, 2021
Fixes: htop-dev#648

If some CPUs have been turned offline, those CPUs will not appear in the
active CPU count parsed from /proc/stat.

Offline CPUs that are still present may still count in terms of sequential
CPU IDs in both /proc/cpuinfo and the CPU entries in /sys/devices/system/cpu/.
Reading the frequencies for the N active CPUs from CPUFreq information in
/sys/devices/system/cpu/cpu[0..N-1] may thus give incorrect results if some
of the CPUs [0..N-1] are actually offline.

If e.g. CPUs 1 and 3 have been turned offline on a system with four CPU cores,
the code would attempt to read the frequencies for the remaining two online
CPUs from /sys/devices/system/cpu/cpu{0,1}, while the correct sysfs CPU
entries from which to read the frequencies would be {0,2}.

The code path that reads the frequencies from /proc/cpuinfo instead of CPUfreq
suffers from a similar problem.

This fixes the reporting of frequencies in case of offline CPUs being mixed
within the range of online CPUs.
@cgzones
Copy link
Member

cgzones commented Aug 10, 2021

@mwahlroos has this be solved by #656?

@BenBE BenBE added the question ❔ Further information is requested label Aug 10, 2021
@BenBE BenBE added this to the 3.1.0 milestone Aug 10, 2021
@mwahlroos
Copy link
Author

@mwahlroos has this be solved by #656?

@cgzones yes, that does resolve this. Thanks for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working Linux 🐧 Linux related issues question ❔ Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants