Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udev: Excessive CPUFreq driver loading causes device timeouts and boot time delays #19439

Closed
k-meyer opened this issue Apr 27, 2021 · 1 comment

Comments

@k-meyer
Copy link

k-meyer commented Apr 27, 2021

systemd version the issue has been seen with

systemd 248

Used distribution

SUSE Linux Enterprise Server 15 SP2

Linux kernel version used (uname -a)

Linux h2-623 5.3.18-24.61-default #1 SMP Wed Apr 14 10:10:07 UTC 2021 (c41a65c) x86_64 x86_64 x86_64 GNU/Linux

CPU architecture issue was seen on

# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          1536
On-line CPU(s) list:             0-1535
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       32
NUMA node(s):                    32
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz
...

Expected behaviour you didn't see

A successful boot with all devices available after reaching the root login prompt.

Unexpected behaviour you saw

intel_pstate is mutually exclusive with acpi_cpufreq, however, systemd-udevd attempts to load
acpi_cpufreq multiple times while intel_pstate is enabled.

Logical CPUs x /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0007:XXX (acpi)
Logical CPUs x /devices/system/cpu/cpuXXX (cpu)

systemd-udev-trigger triggers one uevent for each device. The devices mentioned above cause
systemd-udevd to run 'kmod load acpi:ACPI0007:' (80-drivers.rules) once per corresponding event.

On a system with 1536 logical CPUs, systemd-udevd attempts to load acpi_cpufreq 3072 times.

The delay, caused by systemd-udevd attempting to load acpi_cpufreq, causes some devices such as the
serial console and Ethernet to be unavailable after reaching the root login prompt. The repeated
loading of acpi_cpufreq postpones the loading of other drivers.

Blacklisting acpi_cpufreq or disabling intel_pstate prevents the delay.

# systemd-analyze
Startup finished in 37.939s (kernel) + 10.909s (initrd) + 3min 55.004s (userspace) = 4min 43.852s
# systemd-analyze
Startup finished in 38.307s (kernel) + 10.205s (initrd) + 38.312s (userspace) = 1min 26.826s

Steps to reproduce the problem

  1. Verify that CONFIG_X86_INTEL_PSTATE=y and CONFIG_X86_ACPI_CPUFREQ=m.
  2. Add initcall_debug to the kernel command line and boot.

Additional program output to the terminal or log subsystem illustrating the issue

# dmesg | grep intel_pstate
[   26.672504] calling  intel_pstate_init+0x0/0x43c @ 1
[   26.672506] intel_pstate: Intel P-state driver initializing
[   26.875721] intel_pstate: HWP enabled
[   26.884560] initcall intel_pstate_init+0x0/0x43c returned 0 after 207083 usecs
# dmesg | grep acpi_cpufreq
[   32.890970] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5380
[   32.890974] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   32.935210] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5415
[   32.935214] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   32.980610] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5380
[   32.980613] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   33.020656] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5390
[   33.020659] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   33.048462] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5386
[   33.048465] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   33.089678] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5433
[   33.089682] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   33.133258] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5380
[   33.133261] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
[   33.170690] calling  acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] @ 5386
[   33.170694] initcall acpi_cpufreq_init+0x0/0x1000 [acpi_cpufreq] returned -17 after 0 usecs
...
@poettering
Copy link
Member

Well, what do you expect udev to do? The kernel is just fucked there. It tells us to load the driver, by exporting the same modalias many times, and hence we act on it.

This really needs to be addressed by the kernel devs our your packagers. There's nothing actionable we can do here on our side. We can't specially handle two drivers fighting for the same device.

Please contact your distro for help, they'll work with you to fix this in the kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants