Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect topology on AMD Phenom II X4 with missing InitApicIdCpuIdLo MSR bit #183

Closed
avg-I opened this issue Apr 13, 2016 · 5 comments
Closed
Labels

Comments

@avg-I
Copy link

avg-I commented Apr 13, 2016

I have a single-socket desktop system with an AMD Phenom II X4 955 processor in it.
My operating system is FreeBSD.
hwloc discovers the following topology which is clearly wrong:

$ lstopo --no-io                      
Machine (16GB)
  Package L#0 + L3 L#0 (6144KB) + L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
  Package L#1 + L3 L#1 (6144KB) + L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  Package L#2 + L3 L#2 (6144KB) + L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
  Package L#3 + L3 L#3 (6144KB) + L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)

$ lstopo --no-io -p
Machine (16GB)
  Package P#0 + L3 P#0 (6144KB) + L2 P#0 (512KB) + L1d P#0 (64KB) + L1i P#0 (64KB) + Core P#0 + PU P#0
  Package P#16 + L3 P#0 (6144KB) + L2 P#0 (512KB) + L1d P#0 (64KB) + L1i P#0 (64KB) + Core P#0 + PU P#1
  Package P#48 + L3 P#0 (6144KB) + L2 P#0 (512KB) + L1d P#0 (64KB) + L1i P#0 (64KB) + Core P#0 + PU P#2
  Package P#32 + L3 P#0 (6144KB) + L2 P#0 (512KB) + L1d P#0 (64KB) + L1i P#0 (64KB) + Core P#0 + PU P#3

The problem seems to be with the information reported in EBX by CPUID function 1:

$ for c in 0 1 2 3 ; do cpucontrol -i 1 /dev/cpuctl$c ; done
cpuid level 0x1: 0x00100f43 0x00040800 0x00802009 0x178bfbff
cpuid level 0x1: 0x00100f43 0x40040800 0x00802009 0x178bfbff
cpuid level 0x1: 0x00100f43 0xc0040800 0x00802009 0x178bfbff
cpuid level 0x1: 0x00100f43 0x80040800 0x00802009 0x178bfbff

So, the APIC IDs obtained in this fashion are: 0, 0x40, 0x80, 0xc0.
According to BKDG For AMD Family 10h Processors those APIC IDs are initial local APIC IDs. Those IDs depend on MSR C001 001F, Northbridge Configuration Register (NB_CFG), bit 54, InitApicIdCpuIdLo:

Read-write.
Revision C and earlier:
0=Initial value of APIC20[ApicId[7:0]] is {CpuCoreNum[1:0], 000b, F0x60[NodeId[2:0]]}.
1=Initial value of APIC20[ApicId[7:0]] is {000b, F0x60[NodeId[2:0]], CpuCoreNum[1:0]}.
...
See 2.9.2 [CPU Cores and Downcoring] for information about CpuCoreNum.
This bit should always be set by BIOS; it should be set before F0x60[NodeId] is programmed.

On my system this bit is zero despite what BKDG says and that is consistent with the observed initial APIC ID values (core IDs are placed into the upper bits, not the lowest ones). So, this is likely a BIOS bug, but I am using the latest version available for my system.

APIC IDs programmed into the Local APIC registers are 0, 1, 2, 3, which is correct and allows the OS to see the correct topology. At least here BIOS is compliant with section 2.9.5.1 ApicId Enumeration Requirements.

So, I wonder if there is a way for hwloc to query actual APIC IDs instead of the initial IDs...
Or, perhaps, hwloc should be aware of InitApicIdCpuIdLo and interpret the initial IDs accordingly?
Or is this mess not a hwloc's problem?

@bgoglin
Copy link
Contributor

bgoglin commented Apr 13, 2016

Hello

Thank you for all the debugging. Unfortunately, I don't see any easy solution. If InitApicIdCpuIdLo is only visible in MSR, that's not accessible from user-space. And I am not aware of any way to query the actual APIC ID either.

If that helps, we could generate a fixed XML topology for your machine and tell hwloc to load from XML by default.

In case somebody wants to look at this, it would be nice if you could run hwloc-gather-cpuid and attach a tarball of the output cpuid directory. This new tool is available in git master (nightly tarballs available from https://ci.inria.fr/hwloc/job/master-0-tarball/) under utils/hwloc. It will dump the all CPUID outputs that hwloc needs so that we can debug offline.

Brice

@bgoglin bgoglin added the x86 label Apr 13, 2016
@bgoglin bgoglin changed the title incorrect CPU topology discovered on my AMD Phenom II X4 system incorrect topology on AMD Phenom II X4 with missing InitApicIdCpuIdLo MSR bit Apr 13, 2016
@avg-I
Copy link
Author

avg-I commented Apr 21, 2016

Thank you for the explanation. I've worked around the problem by adding a fixup routine that sets InitApicIdCpuIdLo bit. Apparently it is not too late to do that after OS starts up.
Now I get:

lstopo --no-io -p
Machine (16GB) + Package P#0 + L3 P#0 (6144KB)
  L2 P#0 (512KB) + L1d P#0 (64KB) + L1i P#0 (64KB) + Core P#0 + PU P#0
  L2 P#1 (512KB) + L1d P#1 (64KB) + L1i P#1 (64KB) + Core P#1 + PU P#1
  L2 P#3 (512KB) + L1d P#3 (64KB) + L1i P#3 (64KB) + Core P#3 + PU P#2
  L2 P#2 (512KB) + L1d P#2 (64KB) + L1i P#2 (64KB) + Core P#2 + PU P#3

I gathered the CPUID information before applying the fixup.
issue-183-cpuid.zip

@bgoglin
Copy link
Contributor

bgoglin commented Apr 21, 2016

Thanks.
You applied that fixup to the FreeBSD kernel or by manually writing into the MSR somehow?

@avg-I
Copy link
Author

avg-I commented Apr 21, 2016

First I tested the former (using FreeBSD cpucontrol utility) and then I did the former.

uqs pushed a commit to freebsd/freebsd-src that referenced this issue Apr 28, 2016
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo).  BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy.  In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: open-mpi/hwloc#183

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D6060


git-svn-id: svn+ssh://svn.freebsd.org/base/head@298736 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
uqs pushed a commit to freebsd/freebsd-src that referenced this issue Apr 28, 2016
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo).  BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy.  In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: open-mpi/hwloc#183

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D6060
bdrewery pushed a commit to bdrewery/freebsd that referenced this issue May 3, 2016
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo).  BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy.  In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: open-mpi/hwloc#183

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D6060


git-svn-id: svn+ssh://svn.freebsd.org/base/head@298736 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
@bgoglin
Copy link
Contributor

bgoglin commented Jul 11, 2017

I am closing this issue since it's a BIOS bug and we cannot do much about it in hwloc. Thanks for all the debugging and for sending patches to FreeBSD.

@bgoglin bgoglin closed this as completed Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants