Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel 5.15 does not boot when built with CONFIG_MPILEDRIVER=y #80

Closed
ernsteiswuerfel opened this issue Nov 4, 2021 · 24 comments
Closed

Comments

@ernsteiswuerfel
Copy link

Kernel 5.15 (gentoo-sources) builds fine with CONFIG_MPILEDRIVER=y on my AMD FX-8370 but it won't boot. I don't even get netconsole output... Does not matter whether I build the kernel with gcc (11.2.0) or clang (12.0.1).

But I found out when I build the kernel with CONFIG_GENERIC_CPU=y or CONFIG_MBULLDOZER=y it just runs fine. Kernel 5.14.x runs ok with CONFIG_MPILEDRIVER=y.

Some data about the cpu and the system in general:

 # lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  Model name:            AMD FX-8370 Eight-Core Processor
    CPU family:          21
    Model:               2
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         4000.0000
    CPU min MHz:         1400.0000
    BogoMIPS:            8040.42
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
                          mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc
                          rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor s
                         sse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm ex
                         tapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lw
                         p fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ib
                         pb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyas
                         id decodeassists pausefilter pfthreshold
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   128 KiB (8 instances)
  L1i:                   256 KiB (4 instances)
  L2:                    8 MiB (4 instances)
  L3:                    8 MiB (1 instance)
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
  Srbds:                 Not affected

 # lspci 
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD9x0/RX980 Host Bridge (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GFX port 0)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 0)
00:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 2)
00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 4)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c5)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev c5)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio
04:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)
05:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. SM2263EN/SM2263XT-based OEM SSD (rev 03)
06:00.0 USB controller: ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller
08:00.0 Ethernet controller: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller (rev 10)

This is Genoo downstream bugreport: https://bugs.gentoo.org/821406
Please find kernel .config and dmesg (with CONFIG_MBULLDOZER=y) in the downstream bugreport.

@drfiemost
Copy link

There are a lot of objtool warnings while compiling, looks similar to #30

@ernsteiswuerfel
Copy link
Author

I did hit #30 at that time too. But I think it's not the same. #30 is about the kernel not building at all with CONFIG_MPILEDRIVER=y.

This issue here is about the kernel building ok, but it won't run (won't even boot to the point I could get a dmesg via netconsole). I read the linked Gentoo Forums thread about CONFIG_STACK_VALIDATION and wanted to try with CONFIG_STACK_VALIDATION=n but I can't deselect it in my config, even when not using ORC unwinder.

Btw issue is the same on my Opteron 6386 SE box.

@appashchenko
Copy link

Is 5.14.15 or 5.14.16 kernel CONFIG_MPILEDRIVER works for you?
gentoo-sources 5.15.0 was released after 5.14.14, so there is a chance you skipped them.

why I am asking.
Usually, I compile kernel with CONFIG_MSTEAMROLLER. Now, last working version is 5.14.14.
next kernels:
CONFIG_M*(yourcpuhere) compiles with objtool warnings and then does not boot;
CONFIG_GENERIC_CPU compiles without warnings, boot fine;
CONFIG_GENERIC_CPU[2-4] compiles without warnings, does not boot;

I am trying to figure out is this is related or I should create new issue.

@ernsteiswuerfel
Copy link
Author

ernsteiswuerfel commented Nov 7, 2021

Is 5.14.15 or 5.14.16 kernel CONFIG_MPILEDRIVER works for you? gentoo-sources 5.15.0 was released after 5.14.14, so there is a chance you skipped them.

Yes, 5.14.16 works on my FX-8370 with CONFIG_MPILEDRIVER, also with CONFIG_MSTEAMROLLER on my A10-8750B. So it's probably a different issue.

But I'll double-check on both machines with 5.14.17 an report back here should I be wrong.

@drfiemost
Copy link

From my understanding objtool isn't supposed to handle extended instructions like SIMD and whatever else, as those are not normally needed in the kernel code.
I wonder if it woulnd't be wiser to use -mtune instead of -march.

@Pigpog
Copy link

Pigpog commented Nov 14, 2021

So it is this patch causing it! I've been trying to boot 5.15.2 for the past couple days and have been getting a huge amount of objtool warnings and an unbootable kernel. I used my config from 5.14.14, did make oldconfig, and have always used CONFIG_MSTEAMROLLER.
The kernel panic message I get is jump_label: Fatal kernel bug, unexpected op at swap_writepage+0x17/0x70, if that's helpful at all

Just thought i'd add my experience here, thanks for the patch btw!

@graysky2
Copy link
Owner

graysky2 commented Nov 14, 2021

@Pigpog - I am not sure what changed from 5.14.14 --> 5.15.2 that would cause this behavior. I am running 5.15.2 built with CONFIG_MZEN3 without any such errors. Something unique to steamroller and to piledriver (per @ernsteiswuerfel original report)? Is the steamroller able to use one of the newer x86-64 presets? What is the output of the following on it:

/lib/ld-linux-x86-64.so.2 --help | grep supported

@appashchenko
Copy link

appashchenko commented Nov 14, 2021

/lib/ld-linux-x86-64.so.2 --help | grep supported

x86-64-v2 (supported, searched)
x86_64 (AT_PLATFORM; supported, searched)
tls (supported, searched)
x86_64 (supported, searched)

// AMD A8-7200P

@ernsteiswuerfel
Copy link
Author

Something unique to steamroller and to piledriver (per @ernsteiswuerfel original report)? Is the steamroller able to use one of the newer x86-64 presets? What is the output of the following on it:

Definitely a Steamroller/Piledriver issue I think. My Ryzen 9 5950X is fine with CONFIG_MZEN3 and my Athlon X2 3250e with CONFIG_MK8SSE3 on 5.15.x.

 $ /lib64/ld-linux-x86-64.so.2 --help | grep supported
  x86-64-v2 (supported, searched)
  x86_64 (AT_PLATFORM; supported, searched)
  tls (supported, searched)
  x86_64 (supported, searched)

//FX-8370

 $ /lib64/ld-linux-x86-64.so.2 --help | grep supported
  x86-64-v2 (supported, searched)
  x86_64 (AT_PLATFORM; supported, searched)
  tls (supported, searched)
  x86_64 (supported, searched)

//A10-8750B

@graysky2
Copy link
Owner

graysky2 commented Nov 14, 2021

  • Would you mind posting the output of gcc -c -Q -march=native --help=target | grep march from the FX-8370?
  • Since all the affected hardware is generic-v2 compat, would one of you mind building 5.15.x with the generic-v2 target and test it on a machine that fails to boot with the specific target?

@ernsteiswuerfel
Copy link
Author

$ gcc -c -Q -march=native --help=target | grep march
  -march=                     		bdver2
  Known valid arguments for -march= option:

Built with CONFIG_GENERIC_CPU2 kernel 5.15.2 boots fine on my FX8370!

@graysky2
Copy link
Owner

I might be off base with this but seems as though the body of evidence is pointing to the flag/compiler interacting with the 5.15 kernel in some way...

@ernsteiswuerfel
Copy link
Author

... and to some subtle differences between bdver1, generic-v2 and bdver2.

Don't know wheter the compiler matters much as bdver2 kernels fail regardless of built with clang-13 or gcc-12. I can try older versions as well if this should be helpful.

@graysky2
Copy link
Owner

I honestly don't have a good plan to track this down.

@Pigpog
Copy link

Pigpog commented Nov 14, 2021

I just compiled 5.15.2 for bulldozer instead of steamroller and it booted just fine, and compiled without warnings. Here are the relevant bits of lscpu:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
Vendor ID:               AuthenticAMD
  Model name:            AMD Athlon(tm) X4 860K Quad Core Processor
    CPU family:          21
    Model:               48
    Thread(s) per core:  2
    Core(s) per socket:  2
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx
                         16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsb
                         ase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov

Perhaps I should try Piledriver optimizations for my Steamroller next? Let me know if there's any tests I should try, I have access to a few other architectures too (Coffee Lake, Sandy Bridge, Intel Atom, Caspian).

@drfiemost
Copy link

@graysky2 it's the same issue as #55, it seems you've lost the -mno-tbm flag in the transition to 5.15

@Pigpog
Copy link

Pigpog commented Nov 14, 2021

@graysky2 it's the same issue as #55, it seems you've lost the -mno-tbm flag in the transition to 5.15

I can confirm. Adding -mno-tbm where it used to be made it compile without warnings and boots just fine. Thank you!

@graysky2
Copy link
Owner

graysky2 commented Nov 14, 2021

@drfiemost - Nice, thanks for pointing that out. Totally slipped my mind.

Can one of you guys please try af5c6eb from my test branch to verify functionality? I successfully built 5.15.2 (MPILEDRIVER) but cannot boot it without the hardware.

Further, I am unclear if the syntax is:

  1. As I included in the commit (mixing the old $(call cc-option,-mno-tbm) or
  2. Straight up += -mno-tbm on a new line or
  3. Can it be combined into a single line ie cflags-$(CONFIG_MPILEDRIVER) += -march=bdver2 -mno-tbm
  4. or the hybrid: cflags-$(CONFIG_MPILEDRIVER) += -march=bdver2 $(call cc-option,-mno-tbm)

EDIT: Actually, I see it could very well be option 3.

% grep '+=' arch/x86/Makefile
KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard

f141161

@graysky2
Copy link
Owner

OK, so patching 5.15.2 with f141161 and applying MBULLDOZER compiled fine. @Pigpog, you mind test building/booting?

@drfiemost
Copy link

From the docs:

cc-option is used to check if $(CC) supports a given option, and if not supported to use an optional second option.

So it may be useful if there's any known compiler or compiler version which doesn't support the -mno-tbm flag.

@Pigpog
Copy link

Pigpog commented Nov 14, 2021

Oh, I was still compiling af5c6eb, let me start again with f141161 and ill let you know what happens

@graysky2
Copy link
Owner

@drfiemost - Thanks for the link. I agree with you in principal but since this patch only applies to 5.15 with at least gcc11 or at least clang12, I think we're good without it, no?

@Pigpog
Copy link

Pigpog commented Nov 14, 2021

Compiled using f141161 for Steamroller using Steamroller setting and it gave no warnings and booted fine

@ernsteiswuerfel
Copy link
Author

f141161 made 5.15.2 build & boot with CONFIG_MPILEDRIVER=y on my FX8370 too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants