New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-boot: Error preparing initrd: Bad Buffer Size #25911
Comments
Does one of the initrds referenced by the entry happen to have a size of 0 by any chance? (Ideally checked via EFI shell to make sure kernel and EFI agree) Also, if you remove the initrd lines in the .conf file and instead append
I'd like to hear about that as well, would be nice if you could create a separate issue about that. |
Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911
Or alternatively: does the issue go away if you apply #25922? |
Looks OK
Will try that Edit: WTF, that worked. I am double checking if I changed nothing else. Edit 2: Works indeed. My configs: Does not work:
Does work:
Sure, I'm just not sure where to place it, as it doesn't seem to be a systemd problem per-se. The error I get is the same regardless if booted directly or via systemd-boot. |
Well, an empty file was a wild shot. https://github.com/medhefgo/systemd/tree/boot-bad-buffer-size contains a potential fix along with some debug logging in case it doesn't work (this should be with initrd config options instead of in the cmdline).
This is expected. This just leaves the work of fetching the initrd to the kernel instead of doing it ourselves. Now we just need to figure out what the kernel does better…
Well, telling us the error message would be a starter. :D |
I don't see any changes unfortunately? Give me a ping if you want me to test something 💪 |
Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: #25911
Would've helped if I actually commited my changes. Please try again |
@medhefgo there you go Edit: Note that this test was done with the fallback initrd, but it fails for the non fallback as well. |
Well, that firmware is dented. The read size we give it is valid and the buffer suitably allocated. There is a slight chance #25848 is causing this, I pulled it into the branch, just in case you wanna test this. But more likely the firmware is one of those that cannot read large buffers, considering that the (small) ucode initrd was read without issues. You could try booting with |
Also, regarding the UKI not loading: could be the same issue at hand (when we discover it we only read small chunks from it instead of the whole file at once). You said booting it directly without a bootloader in between also fails? Can you try starting it from the EFI shell to see if it gives the same error? (If it says nothing, you can get an error code with |
Hmm, maybe we should load these files with EFI_LOAD_FILE_PROTOCOL or so? |
No dice
Spot on
|
maybe we should try to load the thin in one go, and if that fails revert to chunked reads? |
That would require the firmware to provide that protocol on the device (it likely doesn't). And it would have to go through the broken file system code anyways.
Always so impatient… @kernle32dll Please give the PR a try. |
Welp, coming back with some unexpected results... I double checked
First of all,
Still not working, same |
@medhefgo Could you provide another commit with additional debug output? I would love to help debugging this further |
I haven't forgotten about you. I am focusing on other areas right now and also still thinking on what to do next here. |
Not sure if you are comfortable with changing the c code:
Let's maybe also rule out some other issues:
Why is it that the quirky firmware always have to be remote machines I can't put my hands on... :( |
No pressure :) Got a working workaround with the kernel options. Just eager to understand the issue.
Sure, will give that a try.
Will do
Tbh, I got myself a PiKVM for exactly THAT issue 😄 |
Please try this new branch: https://github.com/medhefgo/systemd/tree/boot-bad-buffer-size-test It will automatically perform a chunk size bisection for any initrds that are loaded for a given entry. It has two phases and I hope at least one of them will converge. |
Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911 (cherry picked from commit fd1fec5)
Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911
This makes no sense. Are you sure you built and tested the correct commit (d2ab1a4)? There's a chance you're testing a stale build artifact too. We now need python-pyelftools instead of gnu-efi, so unless it was already installed for you, you would've had to install it or no bootloader would've been built. |
Yeah, I checked twice. I know it built correctly, since I did not see the bisect test again. If you put in some debug code, we at least might know where it fails? |
@medhefgo did some testing myself - it seems to fail to re-open the file. Altho I can't say why |
The only thing I can think of is that the file handles are reused. The re-opening worked for the bisection test and we did not have the handle open twice there. I updated the PR, please give it another try. |
Unfortunately, trips this assert: |
I also double checked that line again (on your previous version) https://github.com/medhefgo/systemd/blob/d2ab1a4f332166c743f7160dcc575e882cb7e192/src/boot/efi/util.c#L348 Its indeed returning the same BAD_BUFFER_SIZE error, which is odd for the open call. So something is up with the handle alright. |
We're making progress. I've updated the branch to print out the size we expect vs get along with a sha256 sum of the initrd. Please give it a try (and tell me if |
Build is failing :( Same as CI:
|
You need to pass |
Not experienced with meson - where do I put these? The |
You must have called meson at some point when compiling. Either manually or as part of your PKGBUILD. Just append it to the cmdline there. And if you have, don't use the configure script, just call |
Delightful. The re-opened handle will silently truncate the file. I guess we have to always do chunked-reads just like the kernel does. 😿 Please try the PR again. This should hopefully work now. Also, is there any chance you're missing a firmware upgrade (that happens to fix this)? |
Nice 😞
Will do. I will come back with results.
Unfortunately not. The server is already fully updated. However, the server is a Dell R420 series, which is almost 10 years old by now. So I have little hope there 😢 |
For anyone who wants to test this in VMs, here's a little patch to OVMF that causes it to exhibit (part of) this bug: diff --git a/FatPkg/EnhancedFatDxe/ReadWrite.c b/FatPkg/EnhancedFatDxe/ReadWrite.c
index 8f525044d1..1fed0fecce 100644
--- a/FatPkg/EnhancedFatDxe/ReadWrite.c
+++ b/FatPkg/EnhancedFatDxe/ReadWrite.c
@@ -216,6 +216,10 @@ FatIFileAccess (
Volume = OFile->Volume;
Task = NULL;
+ if (*BufferSize > (10U * 1024U * 1024U)) {
+ return EFI_BAD_BUFFER_SIZE;
+ }
+
//
// Write to a directory is unsupported
// It just makes the FAT driver return I did not bother trying to replicate the truncate-on-reopen behavior because that sounded like a much bigger patch. |
@medhefgo Hey, sorry for coming back so late to you. Tested your MR, works like a charm! |
Fixes: systemd#25911 (cherry picked from commit f70f992)
Could the commit fixing this (3ed1d96) please get also ported back to the systemd-stable v252 branch, which e.g., Debian 12 Bookworm bases on? We (Proxmox, based on Debian) will do the backport ourselves earlier in the meantime anyway, but as we got quite a few reports already I think Debian users, and users from other distros relying on the systemd-stable project, would benefit from this. Thanks for your consideration! |
I figured I'll write this here since multiple forum posts link here. I was able to circumvent this issue by turning on fast boot in BIOS. Oddly my boot time actually takes longer but an OROM driver gets loaded for my raid card (hardware RAID is disabled, but the controller still exists) and the error goes away. Also a Proxmox user if that matters. |
I have the same issue described here on the latest official ISO from Proxmox on two seperate systems. Both of which were dell poweredges with PERC raid cards, flashed to IT mode for ZFS. A third server that did not use ZFS to boot off, did not have the issue. Proxmox 7 works flawlessly with ZFS booting. |
@JackPala ZFS isn't relevant to this issue. This issue is solely about systemd-boot failing to read the initrd from the FAT32 ESP. ZFS doesn't come into play until a significantly later stage during boot. |
Fixes: systemd#25911 (cherry picked from commit f70f992) (cherry picked from commit 1a0f2c5)
Bumping into this with UKI files - it happens in a not-easy-reproducible way. Some of the UKI files we generate occasionally fails with this error, sometime it doesn't. The interesting aspect is that I'm testing this with qemu/ed2k - not a real HW at all. |
systemd version the issue has been seen with
252
Used distribution
Arch Linux
Linux kernel version used
6.1.1-arch1-1
CPU architectures issue was seen on
x86_64
Component
systemd-boot, other
Expected behaviour you didn't see
A successful boot
Unexpected behaviour you saw
Unsuccessful boot after selecting the entry in systemd-boot, with a cryptic error message:
Steps to reproduce the problem
This happened on a Dell R420 server.
I run a fairly default setup. A
fat32
mounted as/boot
, containing the built initramfs, etc. I don't use any modules or something. I do use https://github.com/random-archer/mkinitcpio-systemd-tool for a cryptsetup, but I believe the problem occurs much earlier.This setup worked for about two years without any issue, but has been flaky for a few weeks (might be months) now. After chrooting into the installation and randomly reinstalling stuff, and rebuilding the boot components, it did work again briefly, but has been broken again since. I really have no idea what influences the problem.
I do have the hunch, that this might be related to the server's nvram, or efi vars. Problems started occurring when I was tinkering around with unified kernel images (which the server won't boot neither directly nor via
systemd-boot
, but that is a different topic). In any case, I did briefly run out of space while tinkering around with uefi boot entries usingefibootmgr
.Additional program output to the terminal or log subsystem illustrating the issue
No response
The text was updated successfully, but these errors were encountered: