systemd-boot: Error preparing initrd: Bad Buffer Size #25911

kernle32dll · 2023-01-02T09:04:28Z

systemd version the issue has been seen with

252

Used distribution

Arch Linux

Linux kernel version used

6.1.1-arch1-1

CPU architectures issue was seen on

x86_64

Component

systemd-boot, other

Expected behaviour you didn't see

A successful boot

Unexpected behaviour you saw

Unsuccessful boot after selecting the entry in systemd-boot, with a cryptic error message:

Steps to reproduce the problem

This happened on a Dell R420 server.

I run a fairly default setup. A fat32 mounted as /boot, containing the built initramfs, etc. I don't use any modules or something. I do use https://github.com/random-archer/mkinitcpio-systemd-tool for a cryptsetup, but I believe the problem occurs much earlier.

This setup worked for about two years without any issue, but has been flaky for a few weeks (might be months) now. After chrooting into the installation and randomly reinstalling stuff, and rebuilding the boot components, it did work again briefly, but has been broken again since. I really have no idea what influences the problem.

I do have the hunch, that this might be related to the server's nvram, or efi vars. Problems started occurring when I was tinkering around with unified kernel images (which the server won't boot neither directly nor via systemd-boot, but that is a different topic). In any case, I did briefly run out of space while tinkering around with uefi boot entries using efibootmgr.

Additional program output to the terminal or log subsystem illustrating the issue

No response

The text was updated successfully, but these errors were encountered:

medhefgo · 2023-01-03T10:26:43Z

Does one of the initrds referenced by the entry happen to have a size of 0 by any chance? (Ideally checked via EFI shell to make sure kernel and EFI agree)

Also, if you remove the initrd lines in the .conf file and instead append initrd=\path-to-initrd-relative-to-ESP-root to the cmdline for each of them, does it boot then?

Problems started occurring when I was tinkering around with unified kernel images (which the server won't boot neither directly nor via systemd-boot, but that is a different topic).

I'd like to hear about that as well, would be nice if you could create a separate issue about that.

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911

poettering · 2023-01-03T15:07:13Z

Does one of the initrds referenced by the entry happen to have a size of 0 by any chance? (Ideally checked via EFI shell to make sure kernel and EFI agree)

Or alternatively: does the issue go away if you apply #25922?

kernle32dll · 2023-01-03T15:28:45Z

Does one of the initrds referenced by the entry happen to have a size of 0 by any chance? (Ideally checked via EFI shell to make sure kernel and EFI agree)

Looks OK

Also, if you remove the initrd lines in the .conf file and instead append initrd=\path-to-initrd-relative-to-ESP-root to the cmdline for each of them, does it boot then?

Will try that

Edit: WTF, that worked. I am double checking if I changed nothing else.

Edit 2: Works indeed. My configs:

Does not work:

title Arch Linux
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img
options root=/dev/mapper/root1 rootflags=subvol=@ rw

Does work:

title Arch Linux
linux /vmlinuz-linux
options initrd=/intel-ucode.img initrd=/initramfs-linux.img root=/dev/mapper/root1 rootflags=subvol=@ rw

Problems started occurring when I was tinkering around with unified kernel images (which the server won't boot neither directly nor via systemd-boot, but that is a different topic).

I'd like to hear about that as well, would be nice if you could create a separate issue about that.

Sure, I'm just not sure where to place it, as it doesn't seem to be a systemd problem per-se. The error I get is the same regardless if booted directly or via systemd-boot.

medhefgo · 2023-01-03T16:26:07Z

Well, an empty file was a wild shot. https://github.com/medhefgo/systemd/tree/boot-bad-buffer-size contains a potential fix along with some debug logging in case it doesn't work (this should be with initrd config options instead of in the cmdline).

Edit: WTF, that worked. I am double checking if I changed nothing else.

This is expected. This just leaves the work of fetching the initrd to the kernel instead of doing it ourselves. Now we just need to figure out what the kernel does better…

Sure, I'm just not sure where to place it, as it doesn't seem to be a systemd problem per-se. The error I get is the same regardless if booted directly or via systemd-boot.

Well, telling us the error message would be a starter. :D

kernle32dll · 2023-01-03T16:59:16Z

Well, telling us the error message would be a starter. :D

Obviously, the file is there, as its picked up by systemd-boot without any loader config.

kernle32dll · 2023-01-03T22:04:54Z

Well, an empty file was a wild shot. https://github.com/medhefgo/systemd/tree/boot-bad-buffer-size contains a potential fix along with some debug logging in case it doesn't work (this should be with initrd config options instead of in the cmdline).

I don't see any changes unfortunately? Give me a ping if you want me to test something 💪

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: #25911

medhefgo · 2023-01-04T14:05:46Z

I don't see any changes unfortunately? Give me a ping if you want me to test something muscle

Would've helped if I actually commited my changes. Please try again

kernle32dll · 2023-01-05T13:02:45Z

@medhefgo there you go

Edit: Note that this test was done with the fallback initrd, but it fails for the non fallback as well.

medhefgo · 2023-01-05T15:21:08Z

Well, that firmware is dented. The read size we give it is valid and the buffer suitably allocated.

There is a slight chance #25848 is causing this, I pulled it into the branch, just in case you wanna test this.

But more likely the firmware is one of those that cannot read large buffers, considering that the (small) ucode initrd was read without issues. You could try booting with efi=nochunk to see if the kernel would hit the same issue then too.

medhefgo · 2023-01-05T15:25:53Z

Also, regarding the UKI not loading: could be the same issue at hand (when we discover it we only read small chunks from it instead of the whole file at once).

You said booting it directly without a bootloader in between also fails? Can you try starting it from the EFI shell to see if it gives the same error? (If it says nothing, you can get an error code with echo %lasterror%.)

poettering · 2023-01-05T15:39:02Z

Hmm, maybe we should load these files with EFI_LOAD_FILE_PROTOCOL or so?

kernle32dll · 2023-01-05T16:07:03Z

There is a slight chance #25848 is causing this, I pulled it into the branch, just in case you wanna test this.

No dice

But more likely the firmware is one of those that cannot read large buffers, considering that the (small) ucode initrd was read without issues. You could try booting with efi=nochunk to see if the kernel would hit the same issue then too.

Spot on

You said booting it directly without a bootloader in between also fails? Can you try starting it from the EFI shell to see if it gives the same error? (If it says nothing, you can get an error code with echo %lasterror%.)

poettering · 2023-01-05T16:41:43Z

maybe we should try to load the thin in one go, and if that fails revert to chunked reads?

medhefgo · 2023-01-05T17:59:48Z

Hmm, maybe we should load these files with EFI_LOAD_FILE_PROTOCOL or so?

That would require the firmware to provide that protocol on the device (it likely doesn't). And it would have to go through the broken file system code anyways.

maybe we should try to load the thin in one go, and if that fails revert to chunked reads?

Always so impatient…

@kernle32dll Please give the PR a try.

kernle32dll · 2023-01-05T22:28:22Z

Welp, coming back with some unexpected results... I double checked

title Arch Linux (nochunk)
linux /vmlinuz-linux
options efi=nochunk initrd=/intel-ucode.img initrd=/initramfs-linux.img root=/dev/mapper/root1 rootflags=subvol=@ rw

First of all, efi=nochunk suddenly started working (even without the PR changes). I have no idea why - I made no modifications to the system, besides rebuilding systemd.

title Arch Linux (bad buffer size)
linux /vmlinuz-linux
initrd /intel-ucode.img  
initrd /initramfs-linux-fallback.img
options root=/dev/mapper/root1 rootflags=subvol=@ rw

Still not working, same Bad Buffer Size error.

kernle32dll · 2023-01-10T14:23:34Z

@medhefgo Could you provide another commit with additional debug output? I would love to help debugging this further

medhefgo · 2023-01-10T14:26:39Z

I haven't forgotten about you. I am focusing on other areas right now and also still thinking on what to do next here.

medhefgo · 2023-01-11T08:57:48Z

Not sure if you are comfortable with changing the c code:

Maybe you can play with the chunk size (make it 1M, very small, and maybe even larger than the buf/file size.
Maybe the firmware will refuse to use the handle once a too large read was called. Maybe removing the GetPosition()/Read()/SetPosition() calls in front of the loop helps.

Let's maybe also rule out some other issues:

memtest
fsck
S.M.A.R.T. self-test
firmware update?

Why is it that the quirky firmware always have to be remote machines I can't put my hands on... :(

kernle32dll · 2023-01-11T22:51:56Z

I haven't forgotten about you. I am focusing on other areas right now and also still thinking on what to do next here.

No pressure :) Got a working workaround with the kernel options. Just eager to understand the issue.

Not sure if you are comfortable with changing the c code:

* Maybe you can play with the chunk size (make it 1M, very small, and maybe even larger than the buf/file size.

* Maybe the firmware will refuse to use the handle once a too large read was called. Maybe removing the GetPosition()/Read()/SetPosition() calls in front of the loop helps.

Sure, will give that a try.

Let's maybe also rule out some other issues:
* memtest

* fsck

* S.M.A.R.T. self-test

* firmware update?

Will do

Why is it that the quirky firmware always have to be remote machines I can't put my hands on... :(

Tbh, I got myself a PiKVM for exactly THAT issue 😄

medhefgo · 2023-01-19T17:05:57Z

Please try this new branch: https://github.com/medhefgo/systemd/tree/boot-bad-buffer-size-test

It will automatically perform a chunk size bisection for any initrds that are loaded for a given entry. It has two phases and I hope at least one of them will converge.

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911 (cherry picked from commit fd1fec5)

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911

medhefgo · 2023-03-25T19:27:00Z

This makes no sense. Are you sure you built and tested the correct commit (d2ab1a4)?

There's a chance you're testing a stale build artifact too. We now need python-pyelftools instead of gnu-efi, so unless it was already installed for you, you would've had to install it or no bootloader would've been built.

kernle32dll · 2023-03-25T20:03:02Z

Yeah, I checked twice. I know it built correctly, since I did not see the bisect test again. If you put in some debug code, we at least might know where it fails?

kernle32dll · 2023-03-26T15:28:21Z

@medhefgo did some testing myself - it seems to fail to re-open the file. Altho I can't say why
https://github.com/medhefgo/systemd/blob/d2ab1a4f332166c743f7160dcc575e882cb7e192/src/boot/efi/util.c#L348

medhefgo · 2023-03-29T12:44:29Z

The only thing I can think of is that the file handles are reused. The re-opening worked for the bisection test and we did not have the handle open twice there.

I updated the PR, please give it another try.

kernle32dll · 2023-03-31T15:17:41Z

Unfortunately, trips this assert:
https://github.com/medhefgo/systemd/blob/ba54d7305590bbd5f72c799815ca177e664b1dcc/src/boot/efi/boot.c#L2314

kernle32dll · 2023-03-31T15:43:05Z

I also double checked that line again (on your previous version) https://github.com/medhefgo/systemd/blob/d2ab1a4f332166c743f7160dcc575e882cb7e192/src/boot/efi/util.c#L348

Its indeed returning the same BAD_BUFFER_SIZE error, which is odd for the open call. So something is up with the handle alright.

medhefgo · 2023-03-31T15:48:22Z

We're making progress. I've updated the branch to print out the size we expect vs get along with a sha256 sum of the initrd. Please give it a try (and tell me if sha256sum disagrees).

kernle32dll · 2023-03-31T17:47:35Z

Build is failing :( Same as CI:

../src/boot/efi/boot.c: In function "initrd_prepare":
../src/boot/efi/boot.c:2334:17: error: implicit declaration of function "hexdump" [-Werror=implicit-function-declaration]
 2334 |                 hexdump(u"sha256", sha256, SHA256_DIGEST_SIZE);
      |                 ^~~~~~~
../src/boot/efi/boot.c:2334:17: error: nested extern declaration of "hexdump" [-Werror=nested-externs]

medhefgo · 2023-03-31T17:54:58Z

You need to pass --debug (and -Dmode=developer) to meson.

kernle32dll · 2023-03-31T18:01:46Z

Not experienced with meson - where do I put these? The configure file?

medhefgo · 2023-04-01T07:39:10Z

You must have called meson at some point when compiling. Either manually or as part of your PKGBUILD. Just append it to the cmdline there. And if you have, don't use the configure script, just call meson setup $builddir directly.

kernle32dll · 2023-04-02T12:30:21Z

Followed by:

medhefgo · 2023-04-03T13:28:53Z

Delightful. The re-opened handle will silently truncate the file. I guess we have to always do chunked-reads just like the kernel does. 😿

Please try the PR again. This should hopefully work now.

Also, is there any chance you're missing a firmware upgrade (that happens to fix this)?

kernle32dll · 2023-04-03T13:47:44Z

Delightful. The re-opened handle will silently truncate the file. I guess we have to always do chunked-reads just like the kernel does. crying_cat_face

Nice 😞

Please try the PR again. This should hopefully work now.

Will do. I will come back with results.

Also, is there any chance you're missing a firmware upgrade (that happens to fix this)?

Unfortunately not. The server is already fully updated. However, the server is a Dell R420 series, which is almost 10 years old by now. So I have little hope there 😢

ElvishJerricco · 2023-04-21T21:46:51Z

For anyone who wants to test this in VMs, here's a little patch to OVMF that causes it to exhibit (part of) this bug:

diff --git a/FatPkg/EnhancedFatDxe/ReadWrite.c b/FatPkg/EnhancedFatDxe/ReadWrite.c
index 8f525044d1..1fed0fecce 100644
--- a/FatPkg/EnhancedFatDxe/ReadWrite.c
+++ b/FatPkg/EnhancedFatDxe/ReadWrite.c
@@ -216,6 +216,10 @@ FatIFileAccess (
   Volume = OFile->Volume;
   Task   = NULL;
 
+  if (*BufferSize > (10U * 1024U * 1024U)) {
+    return EFI_BAD_BUFFER_SIZE;
+  }
+
   //
   // Write to a directory is unsupported
   //

It just makes the FAT driver return EFI_BAD_BUFFER_SIZE if you try to read or write more than 10M. Why 10M? Because if it's much smaller, LoadImage breaks on the 8.3M kernel I was testing with, and 16M would have been too big for the 13M initrd I was testing with. I suspect that if your kernel exceeds the limit on these real-world buggy firmwares, LoadImage will probably fail too. So keep your kernels small, I guess.

I did not bother trying to replicate the truncate-on-reopen behavior because that sounded like a much bigger patch.

kernle32dll · 2023-05-02T20:35:11Z

@medhefgo Hey, sorry for coming back so late to you.

Tested your MR, works like a charm!

Fixes: #25911

Fixes: systemd#25911 (cherry picked from commit f70f992)

ThomasLamprecht · 2023-06-28T06:52:19Z

Could the commit fixing this (3ed1d96) please get also ported back to the systemd-stable v252 branch, which e.g., Debian 12 Bookworm bases on?
As then, it would be automatically shipped by a future 12.x point release of theirs, as a maintainer of the systemd package in Debian wrote0.

We (Proxmox, based on Debian) will do the backport ourselves earlier in the meantime anyway, but as we got quite a few reports already I think Debian users, and users from other distros relying on the systemd-stable project, would benefit from this.

Thanks for your consideration!

Skinner927 · 2023-07-07T23:21:16Z

I figured I'll write this here since multiple forum posts link here. I was able to circumvent this issue by turning on fast boot in BIOS. Oddly my boot time actually takes longer but an OROM driver gets loaded for my raid card (hardware RAID is disabled, but the controller still exists) and the error goes away. Also a Proxmox user if that matters.

JackPala · 2023-10-22T20:13:13Z

I have the same issue described here on the latest official ISO from Proxmox on two seperate systems. Both of which were dell poweredges with PERC raid cards, flashed to IT mode for ZFS. A third server that did not use ZFS to boot off, did not have the issue. Proxmox 7 works flawlessly with ZFS booting.

ElvishJerricco · 2023-10-23T01:57:00Z

@JackPala ZFS isn't relevant to this issue. This issue is solely about systemd-boot failing to read the initrd from the FAT32 ESP. ZFS doesn't come into play until a significantly later stage during boot.

Fixes: systemd#25911 (cherry picked from commit f70f992) (cherry picked from commit 1a0f2c5)

mudler · 2024-02-22T09:57:05Z

Bumping into this with UKI files - it happens in a not-easy-reproducible way. Some of the UKI files we generate occasionally fails with this error, sometime it doesn't.

The interesting aspect is that I'm testing this with qemu/ed2k - not a real HW at all.

kernle32dll added the bug 🐛 Programming errors, that need preferential fixing label Jan 2, 2023

github-actions bot added the sd-boot/sd-stub/bootctl label Jan 2, 2023

kernle32dll changed the title ~~systemd-boot: Error preparing initrd: Bad Buffer Size.~~ systemd-boot: Error preparing initrd: Bad Buffer Size Jan 2, 2023

poettering added a commit to poettering/systemd that referenced this issue Jan 3, 2023

efi: skip Read() calls with zero sizes

2fa0e53

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911

poettering mentioned this issue Jan 3, 2023

efi: skip Read() calls with zero sizes #25922

Merged

bluca pushed a commit that referenced this issue Jan 3, 2023

efi: skip Read() calls with zero sizes

fd1fec5

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: #25911

medhefgo mentioned this issue Jan 5, 2023

boot: Read files in small chunks on broken firmware #25948

Merged

bluca added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Jan 19, 2023

eworm-de pushed a commit to eworm-de/systemd that referenced this issue Feb 4, 2023

efi: skip Read() calls with zero sizes

c2deca7

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911 (cherry picked from commit fd1fec5)

d-hatayama pushed a commit to d-hatayama/systemd that referenced this issue Feb 15, 2023

efi: skip Read() calls with zero sizes

6e394c9

Let's avoid calling Read() with zero-sized buffer, to avoid needless firmware quirkiness. See: systemd#25911

ethorsoe mentioned this issue Apr 21, 2023

systemd-boot-252 loading initrd renders some systems unbootable with "Error preparing initrd: Bad Buffer Size" (suspected broken EFI implementation) NixOS/nixpkgs#227431

Closed

bluca closed this as completed in #25948 May 22, 2023

bluca pushed a commit that referenced this issue May 22, 2023

boot: Read files in small chunks on broken firmware

f70f992

Fixes: #25911

RaitoBezarius mentioned this issue May 23, 2023

Some UEFI implementations don't handle large file reads correctly rust-osdev/uefi-rs#825

Closed

peckato1 pushed a commit to peckato1/systemd that referenced this issue Jun 12, 2023

boot: Read files in small chunks on broken firmware

1a0f2c5

Fixes: systemd#25911 (cherry picked from commit f70f992)

mbiebl removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Aug 22, 2023

nmeyerhans pushed a commit to nmeyerhans/systemd that referenced this issue Jan 21, 2024

boot: Read files in small chunks on broken firmware

b5ac1d4

Fixes: systemd#25911 (cherry picked from commit f70f992) (cherry picked from commit 1a0f2c5)

This was referenced Feb 22, 2024

UKI: Reduce Ubuntu base image sizes kairos-io/kairos#2283

Closed

Reduce binary size of kairos components kairos-io/kairos#2285

Closed

systemd-boot: Error preparing initrd: Bad Buffer Size #25911

systemd-boot: Error preparing initrd: Bad Buffer Size #25911

Comments

kernle32dll commented Jan 2, 2023 • edited

systemd version the issue has been seen with

Used distribution

Linux kernel version used

CPU architectures issue was seen on

Component

Expected behaviour you didn't see

Unexpected behaviour you saw

Steps to reproduce the problem

Additional program output to the terminal or log subsystem illustrating the issue

medhefgo commented Jan 3, 2023

poettering commented Jan 3, 2023

kernle32dll commented Jan 3, 2023 • edited

medhefgo commented Jan 3, 2023

kernle32dll commented Jan 3, 2023

kernle32dll commented Jan 3, 2023

medhefgo commented Jan 4, 2023

kernle32dll commented Jan 5, 2023 • edited

medhefgo commented Jan 5, 2023

medhefgo commented Jan 5, 2023

poettering commented Jan 5, 2023

kernle32dll commented Jan 5, 2023

poettering commented Jan 5, 2023

medhefgo commented Jan 5, 2023

kernle32dll commented Jan 5, 2023

kernle32dll commented Jan 10, 2023

medhefgo commented Jan 10, 2023

medhefgo commented Jan 11, 2023

kernle32dll commented Jan 11, 2023

medhefgo commented Jan 19, 2023

medhefgo commented Mar 25, 2023

kernle32dll commented Mar 25, 2023

kernle32dll commented Mar 26, 2023

medhefgo commented Mar 29, 2023

kernle32dll commented Mar 31, 2023

kernle32dll commented Mar 31, 2023

medhefgo commented Mar 31, 2023

kernle32dll commented Mar 31, 2023

medhefgo commented Mar 31, 2023 • edited

kernle32dll commented Mar 31, 2023

medhefgo commented Apr 1, 2023

kernle32dll commented Apr 2, 2023 • edited

medhefgo commented Apr 3, 2023

kernle32dll commented Apr 3, 2023

ElvishJerricco commented Apr 21, 2023 • edited

kernle32dll commented May 2, 2023

ThomasLamprecht commented Jun 28, 2023 • edited

Skinner927 commented Jul 7, 2023

JackPala commented Oct 22, 2023

ElvishJerricco commented Oct 23, 2023

mudler commented Feb 22, 2024

kernle32dll commented Jan 2, 2023 •

edited

kernle32dll commented Jan 3, 2023 •

edited

kernle32dll commented Jan 5, 2023 •

edited

medhefgo commented Mar 31, 2023 •

edited

kernle32dll commented Apr 2, 2023 •

edited

ElvishJerricco commented Apr 21, 2023 •

edited

ThomasLamprecht commented Jun 28, 2023 •

edited