Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-boot fails after updating systemd to 252.3 #25737

Closed
crondrift opened this issue Dec 14, 2022 · 12 comments · Fixed by #25848
Closed

systemd-boot fails after updating systemd to 252.3 #25737

crondrift opened this issue Dec 14, 2022 · 12 comments · Fixed by #25848
Labels
bug 🐛 Programming errors, that need preferential fixing regression ⚠️ A bug in something that used to work correctly and broke through some recent commit sd-boot/sd-stub/bootctl
Milestone

Comments

@crondrift
Copy link

crondrift commented Dec 14, 2022

systemd version the issue has been seen with

252.3-1

Used distribution

Arch Linux

Linux kernel version used

6.0.12-arch1-1 and 5.15.82-1-lts

CPU architectures issue was seen on

x86_64

Component

bootctl, systemd-boot

Expected behaviour you didn't see

System booting correctly with systemd-boot.

Unexpected behaviour you saw

System hangs immediately after choosing the boot-entry, just showing a black screen with the following message in top left corner in red letters:

Failed to open random seed file: Media changed
Error opening root path: Invalid Parameter

Steps to reproduce the problem

  • Coming from systemd 251.7-4 (which worked flawlessly) and updating to 252.3-1.
  • Running 'bootctl update'
  • Reboot

Updating to 252.3-1 WITHOUT running 'bootctl update' afterwards works fine. The system boots without any problems.

So the problem seems really systemd-boot related. Maybe something in 'systemd-bootx64.efi' has changed and my Lenovo E15 Gen 2 doesn't like that.

Additional program output to the terminal or log subsystem illustrating the issue

No response

@crondrift crondrift added the bug 🐛 Programming errors, that need preferential fixing label Dec 14, 2022
@crondrift crondrift changed the title Systemd-boot fails after updating systemd to 252.3 systemd-boot fails after updating systemd to 252.3 Dec 14, 2022
@yuwata yuwata added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Dec 14, 2022
@yuwata yuwata added this to the v253 milestone Dec 14, 2022
@medhefgo
Copy link
Contributor

medhefgo commented Dec 14, 2022

Does sd-boot from v252, v252.1 or v252.2 work? If possible can you do a regression test (between git bisect master v251)?

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 14, 2022
@crondrift
Copy link
Author

crondrift commented Dec 15, 2022

@medhefgo @poettering

  • v252.1 works
  • v252.2 works
  • v252.3 fails

@bluca bluca removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 15, 2022
@medhefgo
Copy link
Contributor

I don't see how any commits in between could cause this. Which version of v252.2 did you check? There is v252.2-1 to v252.2-4.

For the non-working sd-boot, do you see a device path for that entry in the listing if you press p at the boot menu and is it the same compared to a working sd-boot? Is the loader path listed there also correct?

Also, the output of bootctl would be helpful.

@yuwata yuwata added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 15, 2022
@crondrift
Copy link
Author

@medhefgo
I've tested with v252.1-2 and with v252.2-4 - the device path is showing and is the same in every version (even v252.3-1).

bootctl

System:
Firmware: UEFI 2.70 (Lenovo 0.4960)
Firmware Arch: x64
Secure Boot: disabled (setup)
TPM2 Support: yes
Boot into FW: supported

Current Boot Loader:
Product: systemd-boot 252.2-4-arch
Features: ✓ Boot counting
✓ Menu timeout control
✓ One-shot menu timeout control
✓ Default entry control
✓ One-shot entry control
✓ Support for XBOOTLDR partition
✓ Support for passing random seed to OS
✓ Load drop-in drivers
✓ Support Type #1 sort-key field
✓ Support @saved pseudo-entry
✓ Support Type #1 devicetree field
✓ Boot loader sets ESP information
ESP: /dev/disk/by-partuuid/fba529a3-073a-504d-8d7b-767868e8a5e0
File: └─/EFI/systemd/systemd-bootx64.efi

Random Seed:
Passed to OS: yes
System Token: set
Exists: yes

Available Boot Loaders on ESP:
ESP: /boot (/dev/disk/by-partuuid/fba529a3-073a-504d-8d7b-767868e8a5e0)
File: ├─/EFI/systemd/systemd-bootx64.efi (systemd-boot 252.2-4-arch)
└─/EFI/BOOT/BOOTX64.EFI (systemd-boot 252.2-4-arch)

Boot Loaders Listed in EFI Variables:
Title: Linux Boot Manager
ID: 0x0001
Status: active, boot-order
Partition: /dev/disk/by-partuuid/fba529a3-073a-504d-8d7b-767868e8a5e0
File: └─/EFI/systemd/systemd-bootx64.efi

Boot Loader Entries:
$BOOT: /boot (/dev/disk/by-partuuid/fba529a3-073a-504d-8d7b-767868e8a5e0)

Default Boot Loader Entry:
type: Boot Loader Specification Type #1 (.conf)
title: Arch Linux
id: 04-arch.conf
source: /boot/loader/entries/04-arch.conf
linux: /vmlinuz-linux
initrd: /intel-ucode.img
/initramfs-linux.img
options: root=/dev/nvme0n1p2 rw net.ifnames=0 nowatchdog loglevel=3

@yuwata yuwata removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Dec 15, 2022
@medhefgo
Copy link
Contributor

The device path showing in v252.3 is just weird as the device handle changing is the only thing I can think of that could cause the error.

The culprits should be one of the December 8 commits on stable: https://github.com/systemd/systemd-stable/commits/v252-stable/src/boot/efi (probably either 87add68 or 1c9e7fc). But really, neither of these should be causing this…

Can you check whether the latest git main branch works? (Ideally a regression test so I don't have to make a guess to which commit is the culprit.)

@crondrift
Copy link
Author

I did a regression test starting with defining v251 as good version:

git bisect bad

b99bf58 is the first bad commit

@medhefgo
Copy link
Contributor

b99bf58 is the first bad commit

Connecting console devices making the ESP suddenly inaccessible makes no sense to me, are you sure this is the one?

Can you try this branch, hopefully that will fix it (or at least tell me which call exactly is failing): https://github.com/medhefgo/systemd/tree/boot-fixes

You plug one hole and another firmware gets their panties in a bunch. EFI vendors are gonna drive me insane.

@crondrift
Copy link
Author

@medhefgo

This time, the error is slightly different. I tried my best and I'm not sure, if I did it right.

error

What's also curious...on every version that fails, the boot menu doesn't show as normal. It's alligned to the left and the entries only pop up one after another, when you select them with the arrow keys (sorry for the bad quality, but I think you get the idea).

bootmenu2

@medhefgo
Copy link
Contributor

Please try the attached PR to see if it fixes your issue.

@crondrift
Copy link
Author

@medhefgo

I've tested it. Now everything is working again as expected: bootloader menu is back to normal, system boots fine.

@eybisi
Copy link

eybisi commented Dec 29, 2022

@medhefgo

I've tested it. Now everything is working again as expected: bootloader menu is back to normal, system boots fine.

Can you explain how you did it? I'm struggling with updating systemd

@crondrift
Copy link
Author

I've built the bootloader binary from this branch: https://github.com/medhefgo/systemd/tree/boot-fixes

poettering pushed a commit that referenced this issue Jan 3, 2023
… devices"

This reverts commit b99bf58.

It seems that using this protocol on some firmwares to forcibly
initialize console devices may break handles (already opened file
handles and the device handle itself) that we rely on to access the
boot filesystem, making it impossible to load the selected entry.

It might be possible to get a new handle by querying for the device
handle by using its device path after calling into this protocol, but
this is untested. The firmware might also be so buggy that accessing
devices after using this protocol is impossible.

It seems prudent to revert this for now until some reliable way is found
to initialize console devices without introducing huge boot delays. Any
users on firmware where console devices cannot be accessed would have to
rely on disabling fastboot.

Fixes: #25737, #25846
eworm-de pushed a commit to eworm-de/systemd that referenced this issue Feb 4, 2023
… devices"

This reverts commit b99bf58.

It seems that using this protocol on some firmwares to forcibly
initialize console devices may break handles (already opened file
handles and the device handle itself) that we rely on to access the
boot filesystem, making it impossible to load the selected entry.

It might be possible to get a new handle by querying for the device
handle by using its device path after calling into this protocol, but
this is untested. The firmware might also be so buggy that accessing
devices after using this protocol is impossible.

It seems prudent to revert this for now until some reliable way is found
to initialize console devices without introducing huge boot delays. Any
users on firmware where console devices cannot be accessed would have to
rely on disabling fastboot.

Fixes: systemd#25737, systemd#25846
(cherry picked from commit f151abb)
d-hatayama pushed a commit to d-hatayama/systemd that referenced this issue Feb 15, 2023
… devices"

This reverts commit b99bf58.

It seems that using this protocol on some firmwares to forcibly
initialize console devices may break handles (already opened file
handles and the device handle itself) that we rely on to access the
boot filesystem, making it impossible to load the selected entry.

It might be possible to get a new handle by querying for the device
handle by using its device path after calling into this protocol, but
this is untested. The firmware might also be so buggy that accessing
devices after using this protocol is impossible.

It seems prudent to revert this for now until some reliable way is found
to initialize console devices without introducing huge boot delays. Any
users on firmware where console devices cannot be accessed would have to
rely on disabling fastboot.

Fixes: systemd#25737, systemd#25846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing regression ⚠️ A bug in something that used to work correctly and broke through some recent commit sd-boot/sd-stub/bootctl
Development

Successfully merging a pull request may close this issue.

6 participants