Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Booting a large initrd image (> 4 GB) #526

Closed
griznog opened this issue Nov 29, 2021 · 18 comments
Closed

Booting a large initrd image (> 4 GB) #526

griznog opened this issue Nov 29, 2021 · 18 comments

Comments

@griznog
Copy link

griznog commented Nov 29, 2021

Hi,

I've been searching for what the limit on how large an initrd image can be, but haven't had much luck yet. By testing with different sized images I've narrowed it down to around 4 GB. The only place I can find something that looks like a limit for this is in src/arch/x86/prefix/lkrnprefix.S :

initrd_addr_max:
    .long   0xffffffff

Naively changing that to
 

initrd_addr_max:
    .quad   0xffffffffff

Doesn't seem to break anything but also doesn't solve my issue with booting larger images. For context on what I'm trying to do, I'd like to be able to take a chroot where I've created a symlink from /usr/lib/systemd/systemd to /init and

find . | cpio --quiet -o -H newc | /usr/bin/pigz -c > "mycontainer.img.gz"

Then boot that container over UEFI with this ipxe script:

kernel --name kernel mycontainerkernel        || goto reboot
initrd --name container mycontainer   || goto reboot
boot kernel initrd=container quiet crashkernel=no vga=791 root=tmpfs rootfstype=tmpfs

For images below the implied 4 GB threshold, this works great. Images above that produce this error when the kernel tries to unpack them:

Initramfs unpacking failed: read error

Is there any way to get ipxe to work with a larger initrd image?

The systems I am booting with this have 2 TB of RAM, so there should not be a hardware memory limit here.

@mcb30
Copy link
Member

mcb30 commented Nov 29, 2021

A few preliminary points:

  • This will never be able to work in the BIOS builds (either a 32-bit build such as bin/ipxe.pxe, or a 64-bit BIOS build such as bin-x86_64-pcbios/ipxe.pxe) since those are restricted to using the low 4GB of addressable RAM.
  • The file src/arch/x86/prefix/lkrnprefix.S is relevant only for BIOS builds, hence made no change to the outcome in your tests.

It should work to download an image of larger than 4GB in the 64-bit UEFI build (e.g. bin-x86_64-efi/ipxe.efi), but it will be an untested code path. I would suggest trying a few basic tests first:

  • Use imgstat to verify the size of the downloaded mycontainer file.
  • Use md5sum (see https://ipxe.org/dev/drvtest/md5sum) to verify the content of the downloaded mycontainer file.
  • Boot into the UEFI shell (rather than the Linux kernel) and examine the contents of the virtual filesystem provided by iPXE (probably fs0: on a diskless system), and check the size of the virtual container file.

That should get a first indication of where the problem might lie.

Michael

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@NiKiZe
Copy link
Contributor

NiKiZe commented Nov 30, 2021

You should be able to use any standard "shell.efi", I know that the edk2 one is often linked to. A copy is available at http://boot.ipxe.org/Shell.efi

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@mcb30
Copy link
Member

mcb30 commented Nov 30, 2021

@griznog Have you tried using a >4GB initrd loaded through some means other than iPXE, to verify that the kernel itself has no 4GB limitation?

@mcb30
Copy link
Member

mcb30 commented Nov 30, 2021

One minor point: there is a limit of 4GB for any single file within a CPIO archive (since the filesize is encoded using an 8-digit hexadecimal ASCII field), but this is unlikely to be the cause of your issue since it's unlikely that your initrd image contains just a single large file.

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@mcb30
Copy link
Member

mcb30 commented Nov 30, 2021

I'm in way beyond my level of understanding, but is it fair to say that in the kernel where this gets unpacked(init/initramfs.c):

static char * __init unpack_to_rootfs(char *buf, unsigned long len)

That unsigned long len there is my problem? My C is pretty rusty but it looks like len gets decremented as the initramfs is unpacked, which implies it will never work with an image larger than the size of an unsigned long, which nicely fits the pattern I see of failing for images larger than 4 GB.

In a 64-bit kernel, unsigned long will be a 64-bit quantity, so that shouldn't be the root cause.

@mcb30
Copy link
Member

mcb30 commented Nov 30, 2021

I'm in the process of figuring out how I'd go about doing that, both to test this theory and in search of any other options I could use to boot larger images. But I'm not entirely sure how to go about it. I'm guessing I need to stage my large initrd on local media of some sort, then point initrd= at that somehow? Nothing in my searches implies that I can have the kernel set up networking and use initrd=http://... so that the kernel pulls the image down itself. Maybe I need dracut to coordinate all this?

Assuming that you start with a system that has a functional GRUB bootloader (e.g. a standard Fedora installation), then you should be able to place your kernel and initrd files in the /boot directory, press Escape during boot to get to the GRUB boot menu, then edit the boot entry to point to your kernel and initrd. (This kind of edit is a temporary change: it will not overwrite your normal GRUB configuration.)

@griznog
Copy link
Author

griznog commented Nov 30, 2021 via email

@griznog
Copy link
Author

griznog commented Dec 1, 2021 via email

@griznog
Copy link
Author

griznog commented Dec 1, 2021 via email

@NiKiZe
Copy link
Contributor

NiKiZe commented Dec 2, 2021

Does your initrd have some huge files?
You might be able to split it to a compressed part and an uncompressed part, the kernel just reads data as concatenated cpio decompressing any part it finds along the way.

See some examples at https://github.com/NiKiZe/Gentoo-iPXE#different-types-of-combine

So if you have many small highly compressible files you could have those compressed, and then concat any large files (with added cpio headers)

Some things I would like to investigate is the best order of these cpio parts, both in terms of speed and memory usage, and also what is the memory impact how much memory is available in the different scenarios. What memory can be freed and how.

Since this is not an issue with iPXE maybe this issue should be closed. If needed we can convert it to discussion instead?

@griznog
Copy link
Author

griznog commented Dec 2, 2021

Does your initrd have some huge files?

No large files, my approach to splitting was to put everything except /usr/share into one image and /usr/share into a second, then load them both. I'm not sure why that didn't work, but no amount of enabling debugging in the kernel I tried offered any clues and the only thing I can think of to dig deeper is to start adding printf to the parts of the kernel that do the decompressing/unpacking. Given that just dropping compression of the initrd images works, I'm content to say this is solved and although I don't know what the actual problem is, it doesn't seem to be an issue with ipxe (at least not up to a 16 GB uncompressed initrd delivered by http or tftp.)

Adding some relevant things for anyone searching for this in the future

Initramfs unpacking failed: read error

Kernel panic - not syncing: VFS: Unable to mount root fs

Kernel panic - not syncing: No working init found.

@griznog griznog closed this as completed Dec 2, 2021
@mcb30
Copy link
Member

mcb30 commented Dec 2, 2021

@griznog It may be worth mentioning that iPXE can itself decompress a gzip image using its imgextract command. This might allow you to keep the download size small without triggering whatever kernel quirk is causing it to fail to decompress large images. It would be interesting to know if this works (or if iPXE's decompressor also fails for some reason).

@griznog
Copy link
Author

griznog commented Dec 2, 2021

@mcb30 you are brilliant. That not only works, it saves me a ton of effort changing the upstream parts of my provisioning to support uncompressed images. Thanks for pointing this option out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants