Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware handling of upstream kernel and Device Tree files #3237

Open
pelwell opened this issue Sep 17, 2019 · 16 comments
Open

Firmware handling of upstream kernel and Device Tree files #3237

pelwell opened this issue Sep 17, 2019 · 16 comments

Comments

@pelwell
Copy link
Contributor

@pelwell pelwell commented Sep 17, 2019

Although most users are unaware of it, the RPi firmware has special support for "upstream" (compiled from source code found on kernel.org) Linux kernels and Device Tree files. Specifically, the upstream_kernel=1 flag can be used to request that the relevant upstream DTB files are loaded, falling back to the downstream variants as necessary and applying the upstream overlay to help bridge the gap between the two worlds.

This mechanism has relied on the fact that the upstream developers have used a different naming convention for their DT files, basing them on the package name (BCM283x) as opposed to the die name (BCM27xx) (*). The Pi 4B SoC chip is called BCM2711 - there is no BCM2838, although the name is guaranteed free for use - and against expectations the upstream devs have chosen to also use bcm2711 as their SoC identifier. This will break the select-by-filename logic used up to now, so an alternative is needed.

Another little-known firmware feature is the ability to request that overlays are loaded from a different subdirectory of the boot partition (on SD card or network share). This is controlled by the overlay_prefix setting, the default for which is overlays/.

A third item for consideration is that it would be useful to be able to store multiple independent operating systems on a single image and select between them at boot time based on a config.txt setting. This would allow (for example) a "recovery" OS for the case when a bad kernel build prevents the device from booting (almost a daily occurrence for me, sometimes). I've recently implemented a physical "64-bit switch" on my daily driver Pi4 that pulls GPIO5 to ground, activating the [gpio5=0] section that sets arm_64bit=1. It would be nice to be able to extend that to multiple alternate builds of the same kernel type, etc.

Pulling these strands together brings me to suggest a new config.txt setting - os_prefix. The default value would be the empty string, but if set it would be prepended to the names of "OS" files. Booting with os_prefix=backup- might load backup-kernel.img, whereas os_prefix=backup/ would cause the firmware to look in the backup directory.

What constitutes an "OS" file? The kernel, .dtbs and cmdline.txt definitely fit the description, and I'm declaring that the firmware files (bootcode.bin, start*.elf, fixup*.dat) don't. Overlays fall into a grey area - making them OS-specific is conceptually cleanest and most flexible, making them common saves a small amount of space. I'm leaning towards a hybrid scheme whereby the firmware looks for ${os_prefix}${overlay_prefix}README(**) and, if found, sets overlay_prefix to ${os_prefix}${overlay_prefix}, otherwise leaving it unchanged. This allows shared and unshared overlays, but prevents a pick-and-mix approach.

The proposal for the handling of upstream files is that setting upstream_kernel=1 has an implicit side-effect of setting os_prefix=upstream/ (unless os_prefix is explicitly set). Putting all upstream kernel files into a subdirectory allows upstream and downstream to coexist, regardless of the names of the individual files.

N.B. os_prefix and upstream_kernel only affect automatic file selection - they have no effect on explicit cmdline=, kernel=, device_tree= and ramfsfile= settings which are always relative to the root of the boot partition (or the network equivalent).

Does anybody have any improvements to suggest or concerns about this approach?

(*) This could be the wrong way round - all that matters is that that each chip effectively has two names.
(**) The network and USB boot mechanisms don't have a way to test for the existence of a directory, only a file (and in the case of a non-directory prefix it wouldn't make sense anyway) so test for the existence of the README instead.

@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Sep 17, 2019

@MilhouseVH Am I right in thinking that LibreELEC uses a custom overlay_prefix? Does this scheme work for you?

@MilhouseVH

This comment has been minimized.

Copy link

@MilhouseVH MilhouseVH commented Sep 17, 2019

@pelwell No, LibreELEC does not use overlay_prefix, and I don't see any issues with your proposed scheme at this stage. @HiassofT do you have any concerns?

@HiassofT

This comment has been minimized.

Copy link
Contributor

@HiassofT HiassofT commented Sep 17, 2019

No concerns from me and I think keeping upstream kernels, device trees etc in a separate directory is a good idea.

That's also the setup I chose quite a while ago when testing upstream kernels on a Raspbian SD card

@lategoodbye

This comment has been minimized.

Copy link
Contributor

@lategoodbye lategoodbye commented Sep 18, 2019

Thanks for making a suggestion to handle these conflicts. The final goal would be use one DTB for up- and downstream kernel. Without the dwc_otg / dwc2 conflict on the BCM2711 this is much easier to achieve.

Until we reach this point, this sounds like a great improvement.

Btw i plan to submit the first PR for 4.19 to "push" the downstream kernel into the upstream direction. This would contain mostly the upstream DT changes until Linux 5.4.

@lategoodbye

This comment has been minimized.

Copy link
Contributor

@lategoodbye lategoodbye commented Sep 18, 2019

@mbgg @vianpl Any comments?

@mbgg

This comment has been minimized.

Copy link
Contributor

@mbgg mbgg commented Sep 18, 2019

From the distro perspective (at least talking for openSUSE) we are booting U-Boot, which uses distro_boot. Latter looks for broadcom/bcm2711-rpi-4-b.dtb on the FAT partition. If present it uses this to boot the kernel, otherwise it falls back to the DTB provided by the FW.

So I don't have a need for upstream_kernel=1. My understanding is that os_prefix gets ignored for the kernel if you provide kernel=u-boot.bin explicitly. That's the only thing that could break the way we use config.txt for now.

In general, at openSUSE we use FW -> U-Boot -> Grub boot path, so we handle different kernels through the Grub menu.

@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Sep 18, 2019

My understanding is that os_prefix gets ignored for the kernel if you provide kernel=u-boot.bin explicitly.

That's correct - I've amended the original post to make that clear.

@lategoodbye

This comment has been minimized.

Copy link
Contributor

@lategoodbye lategoodbye commented Sep 19, 2019

The promised PR #3244 is available now.

popcornmix added a commit to raspberrypi/firmware that referenced this issue Oct 21, 2019
kernel: irs1125 infineon tof camera support
See: raspberrypi/linux#3261

kernel: unicam fixes for YUYV one to many mappings
See: raspberrypi/linux#3290

kernel: Add w5500 support (for #3276)
See: raspberrypi/linux#3278

firmware: arm_loader: Add os_prefix option
See: raspberrypi/linux#3237

firmware: Add support for arbitrary memory specification
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Oct 21, 2019
kernel: irs1125 infineon tof camera support
See: raspberrypi/linux#3261

kernel: unicam fixes for YUYV one to many mappings
See: raspberrypi/linux#3290

kernel: Add w5500 support (for #3276)
See: raspberrypi/linux#3278

firmware: arm_loader: Add os_prefix option
See: raspberrypi/linux#3237

firmware: Add support for arbitrary memory specification
@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Oct 29, 2019

After some internal discussion I changed the file location logic to make it simpler (i.e. more consistent) and more flexible. In the new scheme, os_prefix is applied to all OS files (kernels, cmdline.txt, dtbs, initramfs). It is also applied to overlay_prefix, provided ${os_prefix}${overlay_prefix}README exists.

If upstream_kernel=1, the firmware sets os_prefix=upstream/ provided os_prefix hasn't already been set and it can locate the kernel or the dtb there (remember that we can't check for the existence of a directory).

To allow the OS prefix to be skipped, all user-specified filenames may include a leading / character, making the paths absolute w.r.t. the boot directory/FAT partition.

This revised logic is in the most recent firmware release. Please report any issues you find.

@pelwell pelwell changed the title RFC: Firmware handling of upstream kernel and Device Tree files Firmware handling of upstream kernel and Device Tree files Oct 29, 2019
@stiltr

This comment has been minimized.

Copy link

@stiltr stiltr commented Nov 5, 2019

While not quite the original purpose of this issue, I noticed that this lays a lot of the ground work for an A/B rootfs. I am currently using Mender and it requires u-boot to handle booting to the different root partitions / kernels. It looks like with a couple additions to the firmware, u-boot might not be necessary. Is this something you'd be willing to incorporate into the firmware? I know one important feature currently missing is a boot count. At first glance, the rest of the features could be handled through config.txt or possibly a new special file with setting specific to this sort of setup. Is something that might be possible or am I just dreaming?

@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Nov 6, 2019

The main problem with implementing some kind of boot count is the extremely limited storage that survives a reset - we have 6 bits (yes, bits) to play with. Currently 0x3f means HALT, otherwise it is a partition number for the boot partition. 63 is a huge number of partitions, so there is some scope for, errm... partitioning it, but I think two bits would be the limit.

Please outline how you think such an A/B rootfs ought to work, and we'll see whether it sounds feasible without a huge amount of effort.

@stiltr

This comment has been minimized.

Copy link

@stiltr stiltr commented Dec 6, 2019

Hi @pelwell! Sorry for the delayed response!

I think two bits would work perfectly for the boot count.

The general idea with the A/B setup is to provide atomic image updates than can be rolled back if unsuccessful. New images are written to the secondary root partition and then re-booted into. If successful, it becomes the primary partition. Wash, rinse, repeat.

I've been discussing this with mirzak over on the Mender forum and I think we've got it mostly sorted out. I've done my best to keep this generic. The neat thing is that this would essentially be dual function and would add a nice (optional) recovery feature for people not utilizing an A/B setup. I'll do my best to outline the proposed additions below.

  • A 2-bit counter (bootcount) that would persist across power cycles.
  • A userspace tool that would allow bootcount to be read and reset.
  • 3 new optional flags for config.txt
    • upgrade_available Signals that a new image has been installed and should be booted. Unused in non-A/B scenarios. Defaults to 0 if not specified.
    • bootlimit Sets the threshold for failed boots (set to 1 for A/B setup, 4 if unspecified.)
    • recovery_os_prefix Same idea as os_prefix, but for the "recovery" os.
  • New boot logic approximated below.
if(bootcount<bootlimit && upgrade_available==0)
  //Normal boot
  boot();
elseif(bootcount<bootlimit && upgrade_available==1)
  //Upgrade is pending, boot to it
  os_prefix=recovery_os_prefix;
  boot();
elseif(bootcount>=bootlimit && upgrade_available==1)
  //Upgrade failed, boot normal os
  boot();
elseif(bootcount>=bootlimit && upgrade_available==0)
  //Normal boot failed, fall back to recovery
  /*Note: This may not be the desired behavior in all situations.
  An additional flag was proposed to select this behavior, but it
  could also be handled by modifying the recovery_os_prefix or
  otherwise disabling the "recovery os" in userspace. This would
  keep the code here simple and stop forced "insecure" boots. */
  boot();

Root partitions would be specified in separate cmdline.txt's for each os_prefix. Userspace tools would handle updating the os_prefix and upgrade_available flags before/after updates.

Does this sound like something that might be feasible to incorporate? I'd be thrilled to get rid of my custom version of u-boot in favor of the stock bootloader. If I failed to explain something satisfactorily, just let me know and I'll give it another go. Also, if you'd rather this was submitted as it's own issue, I'm happy to do that as well.

Cheers!

PS
It just occurred to me that rather than having the boot logic in the FW, an alternative might be sections in config.txt like [bootcount=1] or similar. I'm not sure if that's feasible or even a good idea, but I thought I'd mention it as it popped into by head as I was about to hit submit.

netbsd-srcmastr pushed a commit to NetBSD/src that referenced this issue Dec 16, 2019
commit 0c01dbefba45a08c47f8538d5a071a0fba6b7e83
Author: popcornmix <popcornmix@gmail.com>
Date:   Wed Dec 11 15:30:08 2019 +0000

and include firmware for RPI4

Firmware has bee updated to support mainline linux kernels as described in
raspberrypi/linux#3237
ryo pushed a commit to IIJ-NetBSD/netbsd-src that referenced this issue Dec 16, 2019
commit 0c01dbefba45a08c47f8538d5a071a0fba6b7e83
Author: popcornmix <popcornmix@gmail.com>
Date:   Wed Dec 11 15:30:08 2019 +0000

and include firmware for RPI4

Firmware has bee updated to support mainline linux kernels as described in
raspberrypi/linux#3237
@stiltr

This comment has been minimized.

Copy link

@stiltr stiltr commented Jan 15, 2020

Just checking in to see if there was any movement on this, one way or the other. Thanks!

@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Jan 16, 2020

Your reply came in at a busy time, and not being at all what I expected it was put to one side for later review. And now a month later I'm reminded why I couldn't just say "yes, sure!".

My original answer was based on the idea of using another few bits of the very scarce reset-proof state, but that doesn't survive a power cycle so could only be used with a user-accessible reset signal (removing the need to pull the power cable in the event of a boot failure). The current firmware never writes to persistent storage, and making it do so is a big step that we are reluctant to take. A boot count, by definition, relies on changing state at every boot (twice, in the event of a successful boot). One can think of journal-like schemes where a pool of sectors is used to spread the load of maintaining the count, but it's still write operations, and that's a road we don't want to go down.

@dividuum

This comment has been minimized.

Copy link

@dividuum dividuum commented Jan 16, 2020

The current firmware never writes to persistent storage [...]

The recovery.bin from the eeprom firmware upgrade process renames itself, so at least in theory it seems possible. Maybe keeping the state within a magical filename itself is easier than writing to their content, as that requires fewer indirection and checks? That said: I totally understand and agree with the reluctance to add any more state changing code to the firmware as the seems fragile and probably adds a ton of new error paths and A/B booting is already possible without these changes.

@pelwell

This comment has been minimized.

Copy link
Contributor Author

@pelwell pelwell commented Jan 16, 2020

Ah - recovery.bin is a special case, and doesn't really form part of what I think of as the firmware (start*.elf), although clearly it is. You are right that such a change is possible, but it's not feasibility that's preventing us proceeding - see Ian Malcolm's views on could and should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.