-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect topology on AMD Phenom II X4 with missing InitApicIdCpuIdLo MSR bit #183
Comments
Hello Thank you for all the debugging. Unfortunately, I don't see any easy solution. If InitApicIdCpuIdLo is only visible in MSR, that's not accessible from user-space. And I am not aware of any way to query the actual APIC ID either. If that helps, we could generate a fixed XML topology for your machine and tell hwloc to load from XML by default. In case somebody wants to look at this, it would be nice if you could run hwloc-gather-cpuid and attach a tarball of the output cpuid directory. This new tool is available in git master (nightly tarballs available from https://ci.inria.fr/hwloc/job/master-0-tarball/) under utils/hwloc. It will dump the all CPUID outputs that hwloc needs so that we can debug offline. Brice |
Thank you for the explanation. I've worked around the problem by adding a fixup routine that sets
I gathered the CPUID information before applying the fixup. |
Thanks. |
First I tested the former (using FreeBSD cpucontrol utility) and then I did the former. |
Summary: The Initial Local APIC ID is returned by CPUID function 1 (in EBX). On AMD Family 10h systems the way that ID is built is controlled by an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a certain way, but a BIOS can be buggy. In that case the ID can confuse tools that use it, e.g. hwloc. For example, on a system that I own real Local APIC IDs are configured as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0. See: open-mpi/hwloc#183 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6060 git-svn-id: svn+ssh://svn.freebsd.org/base/head@298736 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Summary: The Initial Local APIC ID is returned by CPUID function 1 (in EBX). On AMD Family 10h systems the way that ID is built is controlled by an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a certain way, but a BIOS can be buggy. In that case the ID can confuse tools that use it, e.g. hwloc. For example, on a system that I own real Local APIC IDs are configured as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0. See: open-mpi/hwloc#183 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6060
Summary: The Initial Local APIC ID is returned by CPUID function 1 (in EBX). On AMD Family 10h systems the way that ID is built is controlled by an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a certain way, but a BIOS can be buggy. In that case the ID can confuse tools that use it, e.g. hwloc. For example, on a system that I own real Local APIC IDs are configured as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0. See: open-mpi/hwloc#183 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6060 git-svn-id: svn+ssh://svn.freebsd.org/base/head@298736 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
I am closing this issue since it's a BIOS bug and we cannot do much about it in hwloc. Thanks for all the debugging and for sending patches to FreeBSD. |
I have a single-socket desktop system with an AMD Phenom II X4 955 processor in it.
My operating system is FreeBSD.
hwloc discovers the following topology which is clearly wrong:
The problem seems to be with the information reported in EBX by CPUID function 1:
So, the APIC IDs obtained in this fashion are: 0, 0x40, 0x80, 0xc0.
According to BKDG For AMD Family 10h Processors those APIC IDs are initial local APIC IDs. Those IDs depend on MSR C001 001F, Northbridge Configuration Register (NB_CFG), bit 54, InitApicIdCpuIdLo:
On my system this bit is zero despite what BKDG says and that is consistent with the observed initial APIC ID values (core IDs are placed into the upper bits, not the lowest ones). So, this is likely a BIOS bug, but I am using the latest version available for my system.
APIC IDs programmed into the Local APIC registers are 0, 1, 2, 3, which is correct and allows the OS to see the correct topology. At least here BIOS is compliant with section 2.9.5.1 ApicId Enumeration Requirements.
So, I wonder if there is a way for hwloc to query actual APIC IDs instead of the initial IDs...
Or, perhaps, hwloc should be aware of
InitApicIdCpuIdLo
and interpret the initial IDs accordingly?Or is this mess not a hwloc's problem?
The text was updated successfully, but these errors were encountered: