64-bit boot stub? #579

Closed
swarren opened this Issue Mar 26, 2016 · 119 comments

Projects

None yet

10 participants

@swarren
swarren commented Mar 26, 2016

If I write a "boot stub" for the VC FW to place into memory at address 0 in 64-bit mode, will you accept it into the VC FW? If so:

a) What license do you want it under? I'd be happy to make it MIT/BSD/X11 or assign copyright to the Pi Foundation; whatever is easiest for you.

b) Which address(es) does the VC FW write to? I assume at least 0x14 for the ATAGS/DTB pointer, but I'm not sure whether the initialization values for cntfrq or the Linux machine ID are written too. I'm discounting address 0x100 and on since that's just data and not part of the boot stub itself.

(As an aside, if this stub could be loaded from a separate .bin file, that might be helpful since it'd allow easy experimentation, but that would then get into a discussion of how to name the files since we'd need different files for bcm2835, bcm2836, bcm2837 32-bit, bcm2837 64-bit).

@swarren
swarren commented Mar 26, 2016

Oh, and it might turn out useful if the boot stub code could be included in e.g. Qemu for RPi emulation - both the existing 32-bit version and any new 64-bit version. Do you have any problems with my doing that, even for the existing 32-bit binary that you wrote? I wonder if you could publish the source under an open license to help the contribution chain to Qemu.

@popcornmix
Contributor

No problem with releasing the boot stub code. See: http://pastebin.com/raw/stxjiVVD
If you are happy with a similar licence for a 64-bit boot stub, we'd be happy to include it in the firmware.

Just let us know what addresses to modify. Currently we do:

         ((unsigned *)mem)[BCM2836 ? 5:13] = device_tree_address;

But we could do something different in 64-bit mode.
The machine ID and ctrfreq are not overwritten.

@capnm
capnm commented Mar 27, 2016

Hmm, it seems to work without issues, but at least according to the A53 TRM (p.328) the order of enabling the caches and SMP is wrong:

Data coherency is enabled only when the CPUECTLR.SMPEN bit is set. You must set the
SMPEN bit before enabling the data cache. If you do not, then the cache is not coherent
with other cores and data corruption could occur.

    mrc p15, 0, r0, c1, c0, 0 @ Read System Control Register
    orr r0, r0, #(1<<2)       @ cache enable
    orr r0, r0, #(1<<12)      @ icache enable
    mcr p15, 0, r0, c1, c0, 0 @ Write System Control Register
.if !BCM2710
    mrc p15, 0, r0, c1, c0, 1 @ Read Auxiliary Control Register
    orr r0, r0, #(1<<6)       @ SMP
    mcr p15, 0, r0, c1, c0, 1 @ Write Auxiliary Control Register
@swarren
swarren commented Mar 27, 2016

Writing to the SCR doesn't actually enable the icache/dache. To enable the cache, the MMU must also be configured/enabled, and the "ARM boot stub" doesn't do that (Linux/U-Boot/... does it during early boot).

@capnm
capnm commented Mar 28, 2016

Great, thanks! Do you know why is the SCR code in the stub at all?

I guess the next stop is the kernel entry point here
http://lxr.free-electrons.com/source/arch/arm64/kernel/head.S#L54
MMU=off is clear, but they actually mean the state of the SCR register bits, right?

 58  * The requirements are:
 59  *   MMU = off, D-cache = off, I-cache = on or off,

What about the IIRC VC/ARM shared L2 cache? Is it always enabled? I couldn't find anything special in the kernel code.

@swarren
swarren commented Mar 28, 2016

Correction: System Control Register is SCTLR, whereas SCR is Secure Configuration Register.

I think the write to the SCTLR in the stub is not necessary; At least the arch/arm64 kernel repeats this and I'm sure the arch/arm kernel must too; see setup of x0 here:
http://lxr.free-electrons.com/source/arch/arm64/mm/proc.S#L178
based on data here:
http://lxr.free-electrons.com/source/arch/arm64/mm/proc.S#L229
and x0 is written to SCTLR here:
http://lxr.free-electrons.com/source/arch/arm64/kernel/head.S#L644

I suspect the comment in the kernel code refers to the logical state of the caches being disabled. The fact this is implemented by writing bits in the SCTLR is true but simply not explicitly mentioned.

I don't know if the L2 cache can be enabled/disabled, but IIUC all control over that cache is from the VideoCore FW. I believe its state is irrelevant to ARM-based SW, except for interaction with peripherals doing DMA, since it's beyond the point of coherency/unification for the ARM cores themselves. As such I would not expect to find any code that controls it.

@swarren
swarren commented Mar 30, 2016

A first cut is available at https://github.com/swarren/rpi-3-aarch64-demo in armstub64.S. The most-recent-but-one includes a TODO list describing the outstanding changes.

@swarren
swarren commented Apr 2, 2016

I have pushed what I hope is a final version of the 64-bit stub to that github repo. The source is in armstub64.S.

I have tested this with U-Boot, and the test app in the same repo. I have not tested it with any Linux kernel yet, nor have I attempted to boot any Linux kernel from U-Boot. I believe @anholt will test it some time.

The DTB address should be written as a 64-bit value to address 0xf8.

The stub implements the standard RAM-based "spin table" method of SMP secondary CPU boot. This is a mechanism already supported by the mainline Linux kernel's arch/arm64 port, so should ease upstreaming kernel support; the arch/arm64 maintainers are unlikely to welcome a new SoC-specific SMP mechanism (besides spin table or PSCI) that uses the bcm283x mailbox registers.

I'd like to suggest creating a new git repo that contains the ARM stub source for all the Pis. I'd be happy to contribute patches to such a repo to fix the issues I filed with the 32-bit stubs if there was somewhere I could send a regular git patch against. I'd also be happy to create the initial content of such a repo if you want; just let me know (assuming the pastebin you linked above is all the versions; otherwise it might be best if you filled in the existing 32-bit stub code first).

@sukantoghosh

Do I need any custom version of boot-firmware for this or is the HEAD of github.com/raspberrypi/firmware master branch supposed to work? Somehow this is not working for me, I did try to put prints at the beginning of armstub64.S as well, but it didn't help.

This might be worth documenting in the README as well as the firmwares shipped with 2016-03-18-raspbian-jessie don't respond to 'enable_uart=250'.

BTW, I really hope that when this stub is integrated to VC FWs it is picked from a bin file as Stephen suggested

@popcornmix
Contributor

I'm happy to add support to load:
armstub.bin (when on Pi1)
armstub7.bin (when on Pi2 or Pi3 in 32-bit mode)
armstub8.bin (when booting in 64-bit mode)

at address zero (up to 0x100 bytes). If armstub*.bin is not present it will use default ones (including current armstub64.S from @swarren's repo).

Also the presence of kernel8.img will enable 64-bit mode.
I can add existing armstub*.S files to a Pi repo (possibly in tools/mkimage).
Is that okay?

@swarren
swarren commented Apr 3, 2016

@sukantoghosh I tested with 046effa "firmware: arm_loader: emmc clock depends on core clock See: #572", which was HEAD of github.com/raspberrypi/firmware master branch when I last fetched a few days ago.

@swarren
swarren commented Apr 3, 2016

@popcornmix that all sounds good to me, thanks!

Just one query though: The stub code you posted had an ifdef for BCM2810-vs-not, so I think you'd end needing 4 FW files: ARMv6, ARMv7, ARMv8 32-bit mode, and ARMv8 64-bit mode, or am I misinterpreting something?

@swarren
swarren commented Apr 3, 2016

@sukantoghosh that should be enable_uart=1 not =250. I'll fix my repo in a minute.

@popcornmix
Contributor

Yes, you are right, the stub code is slightly different between Pi2 and Pi3 so we should have an extra stub file.

@sukantoghosh

thanks @swarren it works for me

@anholt
anholt commented Apr 4, 2016

With @swarren's boot stub I have SMP enabled on 64-bit Linux, and I'm making it to the point of trying to NFS root mount before it fails because USB device probing isn't working.

On 2836, the boot stub was setting a 19.2Mhz CNTFREQ but not LOCAL_CONTROL/PRESCALER, so we set the LOCAL_* regs early in Linux boot. For this boot stub, I've disabled the LOCAL_* setup so that we run with the 1Mhz it sets in CNTFREQ. I'm wondering, though: Don't we want to run the timer at 19.2Mhz? If so, should the firmware/boot stub take over that responsibility, if it was already setting CNTFREQ previously?

@swarren
swarren commented Apr 4, 2016

BTW, there's some discussion re: DWC2 and broken DMA support in the kernel on page 3 (and later) of https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=137963&start=50. I'm not sure if that's talking about the RPi Foundation kernel or mainline.

Where (which module) are the LOCAL_CONTROL/PRESCALER registers; I'm not familiar with those.

@capnm
capnm commented Apr 4, 2016

Other difference that I spotted during my experiments (and I guess the
fw should take care of) is the another arm64 kernel load-adress, currently
hardcoded in the makefile

http://lxr.free-electrons.com/source/arch/arm64/Makefile#L53

TEXT_OFFSET := 0x0008_0000 # 512K
was 0x8000 / 32k

The image should be idealy loaded at adress taken from the image
header (if magic == 0x644d5241 ?), not sure what to do with the
u-boot or another images or how to pass that to the boot blob.
So probably just simply load the arm64 image at 0x00080000?

The decompressed kernel image contains a 64-byte header as follows:
  u32 code0;            /* Executable code */
  u32 code1;            /* Executable code */
  u64 text_offset;      /* Image load offset, little endian */
  u64 image_size;       /* Effective Image size, little endian */
  u64 flags;            /* kernel flags, little endian */
  u64 res2  = 0;        /* reserved */
  u64 res3  = 0;        /* reserved */
  u64 res4  = 0;        /* reserved */
  u32 magic = 0x644d5241;   /* Magic number, little endian, "ARM\x64" */
  u32 res5;
@popcornmix
Contributor

I've added the arm stubs here: raspberrypi/tools@920c7ed
Pull requests for fixes are welcome.

Here is a (not hugely tested) firmware:
https://dl.dropboxusercontent.com/u/3669512/temp/firmware_armstub.zip
that supports, e.g.
armstub=armstub8-32.bin
to load a stub file from a file. The file should be 0x100 bytes (or less).

If you have a kernel8.img (and optionally an armstub8.bin) they should be used and enable 64-bit mode automatically. Supported kernel names are:
kernel8.img, kernel8-32.img, kernel7.img, kernel.img
If one of those is loaded it will by default look for:
armstub8.bin, armstub8-32.bin, armstub7.bin armstub.bin
although they can be overridden with kernel/armstub in config.txt.

@anholt
anholt commented Apr 5, 2016

drivers/irqchip/irq-bcm2836.c

@swarren
swarren commented Apr 6, 2016

@popcornmix I've tested firmware_armstub.zip and it works great, thanks! I tested:

  1. Unified SD card containing kernel.img (U-Boot rpi build), kernel7.img (U-Boot rpi_2 build), kernel8.img (U-Boot rpi_3 64-bit build). I plugged the card into a B+, 2 B, and 3 B, and all worked. I didn't test if the 2 B board picked kernel.img or kernel7.img; both would have worked. I'm pretty sure the B+ correctly picked kernel.img not kernel7.img or it would have failed to execute the ARMv7 instructions in it.

  2. An SD card containing just kernel8-32.img (U-Boot rpi_3_32 build) booted on the Pi 3. I also accidentally validated that fallback to kernel7.img works on the 3 B in 32-bit mode.

I didn't use mkknlimg in any of my testing. I should probably switch to that soon:-)

@swarren
swarren commented Apr 6, 2016

@capnm yes it looks like the FW and 64-bit ARM stub can't just hard-code the kernel address; it must be read from the kernel image header. I'll prepare a patch for the stub that:

  • Defines a 64-bit location that the FW can over-write with the address of the kernel, to which the stub will jump.
  • The default value at that address will still be 0x8000 for compatibility. When loading a Linux kernel, the FW will always have to read the value out of the kernel image header, so the default value isn't at all relevant for Linux, just custom other FW.
  • I'll also specifically reserve a location for the CPU0 spin-table release address. This will never actually be used, but according to the DT binding documentation for ARM stream tables all CPUs need a release address defined.
@swarren
swarren commented Apr 6, 2016

@popcornmix I've also tested using custom ARM stubs (only on RPi3 in 64-bit mode). If I use config.txt option armstub=armstub8.bin, it works great. However, if I don't put any armstub= option in config.txt, the FW doesn't seem to automatically check for and load armstub8.bin (even though I put the "kernel" in kernel8.img). I thought from your description it would?

@popcornmix
Contributor

Yes, your expectations are correct. I'll push an updated firmware later today that fixes a few issues. Hopefully that will work.

Currently, not specifying armstub= and armstub8.bin being found enables 64-bit mode automatically.
Obviously armstub=mystub.bin shouldn't enable 64-bit mode.
What about armstub=armstub8.bin?
What about armstub=mystub8.bin?
Currently they don't, but I'm wondering if one or both should (*8.bin being the wildcard to enable 64-bit mode). Thoughts?

@swarren
swarren commented Apr 7, 2016

@popcornmix my inclination is to keep things simple. Have the FW decide whether to boot in 32- or 64-bit mode solely based on the kernel image filename, and don't take the stub filename into account. That's because the kernel is the primary piece of SW, and the stub is somewhat only there to support the kernel. Once that bit-size determination is made, use that to select the default stub filename. This way, the FW doesn't have to have rules to decide whether the bit size implied by the kernel filename overrides the bit size implied by the stub filename, or the other way around.

If someone wants to do something really unsusual and run a 32-bit OS (kernel) under a 64-bit secure monitor/OS, I assume they'd make that work by putting the secure monitor into kernel8.img or setting arm_control=0x200 (thus ensuring 64-bit booting), using a custom stub file (ths avoiding the built-in switch to HYP/EL2), and having the secure OS load the real kernel image (or perhaps concatenating it with the secure OS image). Hence, even with this simple scheme, doing something unusual is still possible.

I envisage something like:

bits=32
if arm_control 64-bit flag is set:
    bits=64
if kernel= not in config.txt and kernel8.img exists:
    bits=64

if bits == 64:
    armstub_fn=armstub8.bin
    kernel_fn=kernel8.img
else:
    armstub_fn=armstub8.bin
    kernel_fn=kernel8-32.img

if armstub= in config.txt:
    armstub_fn=value_from_config

if kernel= in config.txt:
    kernel_fn=value_from_config

Does that make sense?
@swarren
swarren commented Apr 7, 2016

@anholt the timer frequency setup is implemented in raspberrypi/tools#54, and tested w/ U-Boot's sleep command.

@popcornmix
Contributor

@swarren there are a number of "bare metal" users who just use a single block of arm code.
Currently they use a kernel.img with kernel_old=1.
It would probably make more sense for them to just use armstub.bin and no kernel file. We could then deprecate kernel_old=1. (The size limit on armstub.bin would be removed).
In that scenario determining the bits from armstub makes sense.
In theory it would be possible for armstub.bin to be 64-bit code and kernel.img to be 32-bit (with armstub switching back to 32-bit mode), which again would mean deriving the bits from the armstub would make more sense.

@popcornmix
Contributor

Here is latest test firmware. It includes latest armstub8.S from github:
https://dl.dropboxusercontent.com/u/3669512/temp/firmware_armstub2.zip

It is useful for the firmware to provide some information. Currently:
device_tree_address (dtb_ptr) and kernel_address (kernel_entry)
Possibly we'll also need device_tree_end.

Currently device_tree_address is (uint32_t *)0x34 for Pi1 and (uint32_t *)0x14 for Pi2/Pi3.
We are not currently writing it, but it looks like armstub8.S is using (uint64_t *)0xf8.

armstub8.S also uses (uint64_t *)0xd0 for kernel_entry

As armstub7.S is quite cramped in 0x100 bytes currently (we may be able to increase the size, but that could break some bare metal assumptions), using 0xd0 might be awkward.

Can we define an API for where to put these words? It seems 32-bit values will be enough as bus addresses are still 32-bits even in 64-bit mode. Using unused vectors (like 0x14) is convenient for 32-bit mode. Can we find suitable values that work for both 32-bit and 64-bit stubs?

We could always use different values for 32-bit and 64-bit, but if making them the same is possible, it simplifies things.

@pelwell
Contributor
pelwell commented Apr 7, 2016

device_tree_end shouldn't be needed, since the dtb contains a length and any serious OS will unflatten it into a more convenient form ASAP.

@swarren
swarren commented Apr 7, 2016

I was going to suggest unifying the addresses too. That would also be very useful since you plan to make armstub a "first class citizen" and the primary mechanism for bare-metal. Having consistent addresses would make life easier for people implementing their own stub. I assume that when loading just a stub, all the DT loading/processing would still happen, and the DTB pointer would still be written into the stub memory. Hopefully there will also be a simple specification (beyond just the labels in the code) indicating what addresses the FW will write to in the stub. BTW, I like the idea of allowing just a stub, since it'll resolve the issue where bare metal code needs to do odd tricks to get back into SVC from HYP mode. Removing the size limit if only loading armstub was the part I hadn't though about in my previous answer. I guess given that I'd suggest:

bits=32
if arm_control 64-bit flag is set:
    bits=64
if (armstub= not in config.txt) and (armstub8.bin exists):
    bits=64
if (not only loading armstub) and (kernel= not in config.txt) and (kernel8.img exists):
    bits=64
# Question: Is setting arm_control a good mechanism for people who want custom filenames,
# yet also want 64-bit boot?

if bits == 64:
    armstub_fn=armstub8.bin
    kernel_fn=kernel8.img
else:
    armstub_fn=armstub8.bin
    kernel_fn=kernel8-32.img

if armstub= in config.txt:
    armstub_fn=value_from_config

if kernel= in config.txt:
    kernel_fn=value_from_config

I chose 64-bit addresses for the 64-bit version for potential forwards-compatibility, just in case we ever get an RPi4/5/6 with a much larger amount of (or different location of) RAM. I suppose we could use 32-bit values instead since that's all that's necessary in practice. I obviously don't have visibility into whether this future-proofing is likely to be useful; either based on potential new HW or whether the new HW would require a completely different stub for other reasons.

Re: code size: I suspect armstub7.S can be optimized a bit, or at least the order of code and constants re-organized so we could e.g. always put the kernel address at 0x100 - sizeof(pointer). In particular, I notice it sets up MVBAR and does SVC in order to get to SVC mode to set some secure registers and/or return to HYP mode. Presumably the CPU is already in SVC mode so there's no need to invoke SVC to do that, and hence no need to set up MVBAR. Equally, getting into HYP mode should be as simple as an eret (or AArch32 equivalent). I expect that would save quite a few instructions.

@popcornmix
Contributor

I suspect armstub7.S can be optimized a bit

I found I couldn't write to CNTVOFF from reset state and needed the smc to _secure_monitor to do that. I got the code from uboot, but there may be simpler ways of achieving this.

Pull requests are welcome for a simplified armstub7. It would also be nice to get the same sort of spinning code. Currently armstub7 busy spins rather than using wfe. Not a big issue for a working kernel, but becomes a bit of a heat generator for bare metal users who only use one core.

@SylvainGarrigues

I have one question here. I am no expert, but I am under the impression that the 0x8000 jump address is hardcoded in the bootstub, right?

Is the kernel_address= config.txt parameter deprecated then? How does one select the kernel load address? FreeBSD kernels require to be loaded on a 1MB boundary, this 0x8000 value is not convenient.

@swarren
swarren commented Apr 8, 2016

@popcornmix Ah yes, the SVC is needed to write CNTVOFF. The CPU boots in supervisor mode. CNTVOFF can only be written in monitor mode with SCR.NS set or from HYP mode. So the the CNTVOFF write can't be before switching out of supervisor mode somehow.

I'm not sure that CNTVOFF needs to be initialized though. I'd expect that to be the job of whatever code runs in HYP mode. The Linux kernel appears to set CNTVOFF to 0 if it's started in HYP mode, and I assume any actual hypervisor would also initialize it.

It looks like the SMC is also required due to the need to write SCR. I think the code that sets SCR can be simplified though. The value can be constructed from scratch, so no need for the mrc instruction. I think the only bits that actually need to be set are NS and maybe SCE; I don't think AW/FW have any effect on regular use-cases. I think NS/SCE can be set up with a single mov. That should save about 3 instructions.

_secure_monitor could be moved to address 0x8 to avoid the jump, and so save 3 more instructions.

I'm not sure that it's necessary to write VBAR in hyp mode, and not by copying the value from system mode since I don't think it points at anything in particular. That might save 2 instructions.

So, there seems to be opportunity for optimization:-) Testing is probably the largest issue; a time-sink.

I think you wanted to save instructions to add a wfe to the SMP spin loop for the A7? That might be problematic; existing kernels don't issue a sev in bcm2836_smp_boot_secondary() at the moment, and that would be needed to wake up any wfe. The Pi Foundation could probably be updated in sync with new FW pretty easily, but not mainline or other OSs. I'm not 100% sure wfe can be added in a backwards-compatible way.

I think it should be easy to rework all stubs to read 32-bit DTB pointer from 0xf8 and 32-bit kernel address from 0xfc.

@pelwell
Contributor
pelwell commented Apr 8, 2016

@SylvainGarrigues Yes, the current stubs hard-code 0x8000 as the kernel entry point. The kernel_address parameter is the address to load the kernel to, so you could imagine creating a kernel where the entry point isn't the start of the file, but really kernel_address is meant for monolithic images that include stub and kernelm, specified using kernel_old=1. As you'll have noticed the new stubs will have a parameter indicating the entry point, so this won't be a problem in the future.

@SylvainGarrigues

@pelwell If I understand well, currently, the kernel file (kernel7.img or filename specified in kernel=) is loaded at 0x8000 and kernel_address specifies the location of the entry point, is that right? I always thought kernel_address was the address the kernel file (kernel7.img) was loaded in memory (to override the 0x8000), and so I was wrong.

@pelwell
Contributor
pelwell commented Apr 8, 2016

No, it's the other way around:

the current stubs hard-code 0x8000 as the kernel entry point. The kernel_address parameter is the address to load the kernel to

@popcornmix
Contributor

The Linux kernel appears to set CNTVOFF to 0 if it's started in HYP mode

That is believable. Changing the stub code to set HYP mode is quite new, so previously VCNTVOFF wasn't being set by kernel. I'll check if the initialisation of CNTVOFF can be removed now.

I think the only bits that actually need to be set are NS and maybe SCE; I don't think AW/FW have any effect on regular use-cases. I think NS/SCE can be set up with a single mov. That should save about 3 instructions.

You mean NS and HCE? That would be 0x101 which doesn't fit in a mov. We can still save two instructions by not preserving SCR (as the reset state is known).

@swarren
swarren commented Apr 8, 2016

@popcornmix i realized later that the entire existing SCR value can be set using a movw instruction (which supports a 16 bit constant field) so I guess there's no need to modify the value at all.

@popcornmix
Contributor

What do you think about putting a magic value and a version number in two words of the stub.
We'll only overwrite with device tree address and kernel address if the magic word is present.
The version may allow us to set additional words in the future.
We could perhaps use 0xf0 and 0xf4 and the firmware could set them to zero if recognised (so your spin table still works).

@popcornmix
Contributor

32-bit armstubs have been updated in github. This is the firmware that I will be pushing later:
https://dl.dropboxusercontent.com/u/3669512/temp/firmware_armstub3.zip

The stub should contain:

.org 0xf0
.word 0x5afe570b      @ magic value to indicate firmware should overwrite atags and kernel
.word 0               @ version
atags:  .word 0x100   @ device tree address
kernel: .word 0x8000  @ kernel start address
@popcornmix popcornmix added a commit that referenced this issue Apr 8, 2016
@popcornmix popcornmix pwm_sdm: fix ring buffer UINT_MAX wraparound bug
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: IL ISP: Correct RGB to YUV matrices, and ignore code side info

firmware: MJPEG encode: Handle stereoscopic images
See: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=138325&p=918041

firmware: IL Camera: Change unspecified colour space to being JFIF
See: raspberrypi/userland#78

firmware: OV5647: Option to configure auto lens shading to use potential fix

firmware: arm_loader: Factor out DT support into arm_dt
See: raspberrypi/linux#1394

firmware: arm_ldconfig: Switch to using arm stubs generated from tools/mkimage
firmware: arm_ldconfig: Support loading arm stubs from file
See: #579
e968a4e
@popcornmix popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 8, 2016
@popcornmix popcornmix pwm_sdm: fix ring buffer UINT_MAX wraparound bug
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: IL ISP: Correct RGB to YUV matrices, and ignore code side info

firmware: MJPEG encode: Handle stereoscopic images
See: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=138325&p=918041

firmware: IL Camera: Change unspecified colour space to being JFIF
See: raspberrypi/userland#78

firmware: OV5647: Option to configure auto lens shading to use potential fix

firmware: arm_loader: Factor out DT support into arm_dt
See: raspberrypi/linux#1394

firmware: arm_ldconfig: Switch to using arm stubs generated from tools/mkimage
firmware: arm_ldconfig: Support loading arm stubs from file
See: raspberrypi/firmware#579
4bc6b67
@popcornmix
Contributor

Current firmware has been pushed and can be got through rpi-udpate.
The stubs match current github ones, but the 64-bit could do with updating to the new offsets.

@swarren
swarren commented Apr 9, 2016

The latest firmware.git content doesn't seem to be initializing the UART correctly on 64-bit Pi 3 (or at least something is going wrong); I see corrupted output. This is true even once I've updated the 64-bit stub for the new layout, copy that to the SD card, and reference it from config.txt using armstub=armstub8.bin.

With firmware_armstub.zip my test app worked fine. That's true with either the 64-bit stub built into the FW, or with config.txt armstub=armstub8.bin and the stub built from the latest firmware.git. It's also true whether or not I use mkknlimg on the app ("kernel") binary.

With firmware_armstub2.zip or firmware_armstub3.zip I see the same issue with the UART. I didn't test all the options here (e.g. external stub or not, mkknlimg use or not etc.) Sorry, I guess I forgot to pick up and test firmware_armstub2.git, and didn't get a chance to test firmware_armstub3.git.

@swarren
swarren commented Apr 9, 2016

BTW, I've pushed the updated armstub8.S to git://github.com/swarren/rpi-tools.git branch armstub8_magic_layout, but haven't issued a pull request since I haven't successfully tested it. My test app is at git://github.com/swarren/rpi-3-aarch64-demo.git branch master directory 64/ if you want to try it.

@capnm
capnm commented Apr 9, 2016

@popcornmix Does it still make a sense to check DTOK?
As far as I can see, the arm64 kernel wouldn't boot without the DT anyway.

@pelwell
Contributor
pelwell commented Apr 9, 2016

But Linux isn't the only "kernel" - is there no 64-bit OS or loader that wants ATAGs?

I would consider changng the defaults, though.

@SylvainGarrigues

I can speak for NetBSD and FreeBSD on RPI 0/1/2/3: neither use ATAGS.

  • FreeBSD is fully configured by the DTB u-boot provides its bootloader, or by the DTB pointed to by the r2 register if directly booting the kernel.
  • NetBSD use VC properties to get command line and configuration info.

I wouldn’t make ATAGS the default anymore.

Le 9 avr. 2016 à 18:47, Phil Elwell notifications@github.com a écrit :

But Linux isn't the only "kernel" - is there no 64-bit OS or loader that wants ATAGs?

I would consider changng the defaults, though.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #579 (comment)

@SylvainGarrigues

If I may add: for non-Linux kernel, not having to rely on a 3rd-party tool (mkknlimg) to tag our DTB-aware kernel binaries to make them bootable looks a cleaner approach.

@swarren
swarren commented Apr 10, 2016

DT only (or at least by default) on AArch64 makes sense to me too.

@SylvainGarrigues For Linux, r2 is used on 32-bit ARM, but x0 is used on 64-bit ARM. I coded the Pi "ARM stub" to follow the same. I assume that works for FreeBSD too?

@zxombie
zxombie commented Apr 10, 2016

FreeBSD has it's own boot loader. It will most likely also use U-Boot (or UEFI if someone ports it) to boot.

@warped-rudi warped-rudi referenced this issue in OpenBricks/openbricks Apr 11, 2016
Closed

Raspberry pi 3 #57

@pelwell
Contributor
pelwell commented Apr 11, 2016

@swarren The Friday firmware release did have "issues" that may have been the cause of your problem. We've done a quick respin to fix it, but I'm still open to changing the defaults.

@swarren
swarren commented Apr 12, 2016

@popcornmix @pelwell The latest in firmware.git (e484afe "kernel: Bump to 4.4.6") seems to work well. Thanks. I validated:

  • Single SD card image with U-Boot boots on Pi B+, 2 B, 3 B (64-bit).
  • Built in 64-bit ARM stub works.
  • The new 64-bit stub I created that uses the new layout works.
  • Loading of separate armstub8.bin, with or without armstub= in config.txt.

I think the only remaining issues for 64-bit are:

  • Parse the kernel load address from the Linux image header (typically yielding 0x80000, whereas the default 0x8000 is good for U-Boot).
  • Use DTB by default on 64-bit.
@capnm
capnm commented Apr 12, 2016

I additionally validated:

  • The size limit on armstub8.bin is removed (I successfully executed a 64MB stub)

I think the default for the 'armstub8.bin only' case could be (not tested):

if the 0xf0: magic is not present, don't overwrite anything,
otherwise:

  • write the DT at the address found in
    • the device_tree_address=xy option (+ overwrite the atags: location)
    • at the address found in the atags: stub location
  • switch to ATAGS if available:
    • already documented /boot/overlays/README
      device_tree= option
    • invent a new force_atags: location in the stub?
@pelwell
Contributor
pelwell commented Apr 12, 2016

@swarren Thanks for the feedback. Tell me about this kernel header? Is it readable before the kernel is decompressed? (One of the main reasons for the existence of the kernel trailer is the inability to see inside a compressed kernel).

@capnm
capnm commented Apr 12, 2016

@pelwell the header is described here
https://www.kernel.org/doc/Documentation/arm64/booting.txt

The arm64 kernel supports only uncompressed or gzip compressed image that must
be decompressed by the firmware or the bootloader - before it's loaded into memory).

PS: the commit e484afe broke loading of DT overlays, reverting to a32be14
fixes the issue.

overlays/pi3-miniuart-bt.dtbo
001619.848: Failed to load overlay 'pi3-miniuart-bt'
@pelwell
Contributor
pelwell commented Apr 12, 2016

I'd like more clarity on the information flow between the kernel, stub and firmware. Initially it was simple:

config.txt -> firmware -> stub -> kernel

We then added a back channel from the kernel trailer, but it can be thought of as a separate file and conveniently doesn't affect the placement of the kernel:

config.txt + trailer flags -> firmware -> stub -> kernel

Now the kernel name is another input. The stub is being read, but only to determine where and whether to patch the device tree and kernel addresses.

config.txt + trailer flags + kernel name -> firmware <-> stub -> kernel

@swarren suggests the load address of the 64-bit kernel can be read from the kernel header, which will only work for uncompressed kernels. This is awkward because it would require the kernel to be read in two pieces or potentially moved, but that isn't enough reason in itself to discount it. An easier, more portable option would be to read the load address from the stub.

@capnm suggests the device tree address may be read from the stub. This in itself wouldn't be sufficient; the firmware uses device_tree_address and device_tree end - the end is needed because DTB can grow as a result of overlays being applied, and libfdt needs to know how large it can grow. We could include the end address in the stub as another firmware input - the kernel doesn't require it since the DTB includes a length - which could be overwritten later so as not to waste space. We'd probably want the version number to indicate its presence. In the event that the DTB was too large for the slot, should the firmware load it high instead?

For stub fields that can work in either direction (kernel_address, device_tree_address), the presence of a zero can indicate an input and non-zero an output.

Any thoughts?

@SylvainGarrigues

I am not in favor of reading a "kernel" loading address from the "kernel" image as "kernel" can be anything, not necessarily an ELF, and not necessarily Linux code.

I believe we should make as few assumptions as possible as to the content / format of the "kernel" binary image.

@capnm
capnm commented Apr 12, 2016

The arm64 linux kernel header includes a magic number. For an image without that, you can default to something else.

The idea behind that all is: you could put the 'make Image' linux kernel as kernel8 in the /boot folder and it would work without any additional configuration, boot-loader or a special stub ...

AFAIK the decompression cannot be done by the linux kernel itself any more, the gzip compression is optional and doesn't need to be supported. The RPi3 booted into my 64MB test stub in a few ms. The firmware could assume a compressed image as a non-linux image.

stub fields that can work in either direction

I like that idea. Currently there is no way for the stub-only based image to get the DTB.

@pelwell
Contributor
pelwell commented Apr 12, 2016

I'll be surprised if the 64-bit Linux entry point ever changes from 0x80000, but for the cost of some implementation effort I don't mind supporting the possibility that it might, as long as everybody is aware that this will only work with an uncompressed kernel.

I like that idea. Currently there is no way for the stub-only based image to get the DTB.

Can you explain this? In the latest firmware a compatible stub will be patched with the DTB address - is that not sufficient?

@popcornmix
Contributor

As far as I remember a compressed kernel boots significantly faster than an uncompressed kernel, so I question the value in supporting features for the uncompressed kernel that I doubt any popular distribution would make use of.

It also introduces an unexpected behaviour when switching from compressed to uncompressed kernel. It sounds like assuming 0x80000 as the default kernel address for 64-bit may be a more useful behaviour, which would still allow a kernel8.img to "just work" in most cases.

@TheSin-
TheSin- commented Apr 12, 2016

I have a quick question about the new firmware code to determine 64bit. If I've read everything right I assume that i need a file named kernel8.img (which I don't use this name in my dist) OR I need to set the stub, otherwise it has no idea I want 64bit right? If this is true, could I request having a config.txt param to turn on 64bit in stead of using the stub or filename? I use kernel=[customname] and I'd like to stay bout from needed a stub file and just use the firmware.

OR is the stub file not a real local file, just a switch to tell it which stub in the firmware to use, in which case that is a switch? like being able to do bits=64 in the config.txt would be more useful IMHO.

@pelwell
Contributor
pelwell commented Apr 12, 2016

Given that config.txt settings should ideally be left for users to edit, I can see the value in reading kernel_address and device_tree_address (and _end) from the active stub. Therefore I think version 1 of the stub standard should add device_tree_end and set any fields that aren't instructions to the firmware to zero, i.e. don't provide default values since the firmware will either ignore and overwrite them or read them and use them.

@popcornmix
Contributor

@TheSin-
You can name your kernel kernel8.img and 64-bit mode will be enabled automatically using a default 64-bit stub (recommended)
You can provide a armstub8.bin file and 64-bit mode will be enabled automatically
You can add arm_control=0x200 to enable 64-bit mode manually, and kernel.img (or kernel7.img) will be loaded. (We may add a friendlier name for this such as arm_64bit=1)

@TheSin-
TheSin- commented Apr 12, 2016

@popcornmix okay perfect so I did understand it, I just missed that arm_control switch. That is perfect I'm just about done my 64bit kernel so I'll give it a shot, thanks.

@capnm
capnm commented Apr 12, 2016

Can you explain this? In the latest firmware a compatible stub will be patched with the DTB address - is that not sufficient?

Nice, I don't know how I missed that.
If I write the stub code, I already need to know where the DTB
will be loaded to make a place for it. I think it would be more useful
if i could let a hint for the VC where to load it, e.g. behind the stub.

I really wish my jtag device supported the A53. A many years ago
I worked on a SoC with a radio core. There was an incredibly
useful option that told the radio to take the uart over and allowed
a simple (vcdbg-like) memory and arm core access.
How about to make something like that for the VC? :-)

sounds like assuming 0x80000 as the default kernel address for 64-bit may be a more useful behaviour
I'll be surprised if the 64-bit Linux entry point ever changes from 0x80000

I agree with everything you said. I believe it's in place for some future plans.
Currently it is possible to enable a address randomization (no idea for what's
that supposed to be good):

 # The byte offset of the kernel image in RAM from the start of RAM.
 ifeq ($(CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET), y)
 TEXT_OFFSET := $(shell awk 'BEGIN {srand(); printf "0x%03x000\n", int(512 * rand())}')
 else
 TEXT_OFFSET := .
 endif

If supporting the arm64 uncompressed image + header isn't complicated,
I would do it, otherwise just load it at 0x00080000 as a default.

@popcornmix popcornmix added a commit that referenced this issue Apr 12, 2016
@popcornmix popcornmix kernel: Bump to 4.4.7
firmware: config: Add arm_64bit setting
firmware: arm_ldconfig: Set kernel_address for 64-bit boot
See: #579
efdcf16
@popcornmix popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 12, 2016
@popcornmix popcornmix kernel: Bump to 4.4.7
firmware: config: Add arm_64bit setting
firmware: arm_ldconfig: Set kernel_address for 64-bit boot
See: raspberrypi/firmware#579
1e84c28
@popcornmix
Contributor

Latest update supports config.txt option arm_64bit=1 as a cleaner alternative to arm_control=0x200
It also defaults the kernel_address to 0x80000 when in 64-bit arm mode (0x8000 in 32-bit mode).

@swarren
swarren commented Apr 12, 2016

Let's slow down a little here; I haven't had a chance to read through all these points yet, but I think everyone affected by load addresses should get a chance to weigh in before making any decisions. Hopefully I'll get a chance to respond to things tonight/tomorrow night. Switching the default address is going to break current users, such as U-Boot. The 0x80000 needn't be hard-coded since the value must be parsed from the kernel headers, so no change is necessary. More later.

@swarren
swarren commented Apr 13, 2016

This tries to answer many posts from the last day or so. Hopefully I didn't miss anything.

The AArch64 kernel does not support any compression itself. To boot it, an uncompressed copy must be placed into RAM at a specific location and jumped to. This doesn't preclude compressing the image on-disk, it's just that any compression must be undone entirely outside the kernel and before it's entered.

The Linux kernel Image contains a header that indicates the required load address (well, it's actually an offset from any 2MB aligned address near the start of system memory). While that offset is currently 0x80000, the only standards-compliant/future-proof method of loading the Linux kernel is to parse that value out of the header and honor it. Anything else can break, e.g. if the user enables CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET; an option that is already present in the kernel code. Since the kernel Image is "always" uncompressed, the header can "always" be read easily.

It seem perfectly reasonable for me for the VC FW to honor that requirement of the Linux kernel image; the kernel Image has a magic number that identifies it. If that magic is present, the image load address can be determined from the Linux header. If the magic is not present, some other mechanism can be used, such as a hard-coded default, a config.txt option, or identifying other image formats. I would suggest the following algorithm:

load_addr = 0x8000 # backwards-compatible default
if config.txt contains a kernel address value:
    load_addr = load address from config.txt
if kernel header present:
    load_addr = value from kernel Image 

I could see an argument to swap those two if statements so that config.txt can override any Image parsing. That might be useful for testing or if there are false positives in the VC FW's identification of the kernel image as a Linux kernel image.

I suggest the default load address be 0x8000 to be backwards compatible. If not, I'll have to go and modify U-Boot and my rpi-3-aarch64-demo repo.

I don't see how reading the kernel load address from the stub would work. Each kernel image may have a different load address requirement, whereas the stub typically would be static and not regenerated with data from the kernel Image.

Assuming that a typical location for the DTB is top-of-ARM-RAM, I don't think reading the DTB address from the stub makes sense. top-of-ARM-RAM is a dynamic value based on GPU memory size configuration, whereas the stub is a static binary. Having VC FW calculate the location and write it into the stub, as is currently implemented, seems to make most sense. I believe non-default requirements can be handled by config.txt. Of course, this does mean the VC FW needs to be aware of any restrictions the OS may place on DTB placement, e.g. size of "lowmem" for Linux. Still this is true no matter what; either the VC FW must know this, or the value encoded into the static default ARM stub in the VC FW must be chosen to be compatible with any such restrictions. config.txt feels like a better way to address unusual configurations; I don't think custom stubs should be required to change simple values like this, only if radically different boot behaviour is required.

I don't follow why device_tree_end matters outside the VC FW. Isn't top-of-RAM a reasonable value in all cases? Perhaps someone wants to specify that value in config.txt, but again that should again only affect VC FW. Once the DT is loaded, the stub and the kernel should only need to know where the start is.

I think the DTB/ATAGS/kernel address should always be written into the stub by the VC FW and never read from the stub. config.txt can be used when the user wants to configure something unusual. Otherwise, we have both config.txt and stub content affecting what the VC FW does. Having just config.txt affect the VC FW seems simpler and less likely to cause problems/conflicts.

@swarren
swarren commented Apr 13, 2016

A thought on ATAGS-vs-DT booting. I think it can be made automatic without requiring any kind of header/trailer on the kernel image for 32-bit:

if the DTB file exists on disk:
load the DTB
set flags so the DTB is passed to the kernel
else:
set flags so ATAGs are passed to the kernel

I assume that on the Pi 3 in 32-bit mode, even if booting in ATAGs mode, the default UART would still be the mini UART, and it would still be configured (pinmuxed, baud divider set).

However, that's probably a topic for another bug if anyone wants to pursue it.

@pelwell
Contributor
pelwell commented Apr 13, 2016

@swarren Thanks for your thorough input. I think I can address some of your points:

I think config.txt should always be able to override any value in the kernel or stub because it is the easiest for a human to see and edit. I am suggesting reading some values from the stub under some circumstances because it would allow distributions, including more consumer-oriented applications such as media players and emulators, to be able to override the defaults without having to edit config.txt. The standard RPi update mechanisms - apt-get update/dist-upgrade and rpi-update - don't touch config.txt because it is considered too risky. Overwriting an entire stub file is much safer.

If we decide that zeroed fields are ignored then the standard stubs - the ones built-in and the ones used by Linux - wouldn't have to override the kernel values, but the order of precedence should be config.txt > stub > kernel, where the stub may choose not to express a preference.

device_tree_end is definitely only of interest to the firmware (I believe that's what I said), but its value should be chosen to match the kernel, which is why it could be read from the stub for the reasons I've stated above.

@capnm
capnm commented Apr 13, 2016

@swarren (Sorry for possible misunderstandings, it should be obvious,
that English isn't my native language;)

I propose that there is no compatibility issue for all arm64 related parts,
until this bug is closed. Then I think, it's ok to accept changes in our code.

I assume that anybody involved had already read the arm64 linux
kernel document. Yes, the support of the linux kernel header is marked
MANDATORY. You are right, it would be future-proof,
but assuming from the @popcornmix and @pelwell comments,
the VC had to perform additional copy gymnastics, actually without any reason.
Also enabling the randomization option, doesn't seem to have any other
purpose, than to test the feature itself.

I think defaulting to the hard-coded value is acceptable.
The support can be added later, the day it becomes something useful.
I don't see any compatibility issues, just set now in stone 0x0008_0000
as a pragmatical default for an image running in arm64 mode.

Maybe is the decompressing + copying actually slower than
looking at few header bytes + loading an uncompressed image,
also the problem evaporates completely. I don't know and I don't
have to implement that myself ;-) Again, I like the linux header
feature, if it's doable without essential drawbacks, I am entirely
behind you.

Now imagine you will use the stub-only, aka bar-metal feature
envisioned by @popcornmix . Your bare-metal stub were
very vulnerable and would break every time the fw decides
to write the DTB to some unexpected location.

That's why I like the @pelwell "zeroed field" idea. It should be invisible for
the standard (u-boot) image, the fw replaces the zeros with
the correct values and the stub author can still directly tell the fw
where he left a free memory location for the DTB.

@Ferroin
Ferroin commented Apr 13, 2016

CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET improves security, for the exact same reasons that usage of position independent executables with random load addresses does. Because you can't know where the kernel is in memory without extra work, you can't easily exploit it by injecting arbitrary code or return addresses. If you aren't building your own kernel and setting a couple of specific sysctl values correctly, it provides almost zero benefit, but for those who do build their own kernel and set said values, it makes it such that an attacker would need to brute force the address space to find anything, thus making it a lot less likely that they will successfully compromise the system before the attack is noticed.

It would be somewhat sad if this wasn't supported. (Yes, I know the Pi is first and foremost an educational tool, but part of learning about computers is learning how to keep them secure).

@capnm
capnm commented Apr 13, 2016

@Ferroin Good point. That's a compile time option and doesn't work (I mean is useless) for the distributed images.
I believe you could still set up that scenario with config.txt or your own stub manually.

Well, I told the kids to disable sudo without password and then immediately hear crying and yammering because the scratch had ceased to work ...

@Ferroin
Ferroin commented Apr 13, 2016

While it could be supported like that, that doesn't mean that's a good option for people who actually want secure systems. While it won't be enabled by default in the distribution kernel (it can be useful there, just not as much as in a custom kernel), people shouldn't need custom firmware just to use a basic security feature.

@capnm
capnm commented Apr 13, 2016

@pelwell

swarren suggests the load address of the 64-bit kernel can be read from the kernel header, which will only work for uncompressed kernels. This is awkward because it would require the kernel to be read in two pieces or potentially moved, but that isn't enough reason in itself to discount it. An easier, more portable option would be to read the load address from the stub.

Just shooting into the darkness, perhaps this could work:

  1. assume the kernel location will be 0x0008_0000

  2. load, resp. uncompress the image there.

  3. check the linux header magic && location, if someone wants a non-standard
    load address, e.g. enables the linux obscurity config, he has to wait (and
    have a kernel that doesn’t break with that;), piecewise relocate the image.

  4. write the right values into the stub, (close the eyes) run the arm core.

@popcornmix
Contributor
  1. load, resp. uncompress the image there.

That is one obstacle. No gunzip support in firmware.
The standard gzip code is GPL which is unsuitable for firmware.
I've found reference to public domain gzip code by Mark Adler (but not actually found the code yet).
However we are reluctant to include a large block of code in the firmware which is of no use to most users. Also the GPU is slower than the ARM (for scalar code) so wouldn't be the best place to handle the unzipping.

@pelwell
Contributor
pelwell commented Apr 14, 2016

@capnm That isn't a solution, that's describing the problem.

@capnm
capnm commented Apr 14, 2016

If you don't fear a short trip into a robust programming language
you could easily translate this (BSD)
https://golang.org/src/compress/gzip/gunzip.go
to C. I think the issue isn't a trivial and deserves its own ticket.
Let's move on?

I think it's brave to honour the effort invested
to writing the arm64 linux kernel document and support it.
I'll try to remember what we could do so far: (in order of precedence)

  1. relevant options in config.txt
    arm_64bit= defaults to 0
    kernel= arm64 defaults to kernel8.img (sorry, I don't care, hence don't remember the rest)
    kernel_address= defaults to 0x80000 when in 64-bit arm mode (0x8000 in 32-bit mode)
    device_tree_address= defaults to 0x100
    device_tree= disables DTB /enables ATAGS?

  2. kernel image
    the OPTIONAL gzip compression is currently unsupported, see bug:xy
    the firmware additionally supports the xy compression standard
    the kernel header magic marks a arm64 linux compatible header and valid kernel_address

  3. armstub8.bin
    .org 0xf0
    // magic value to indicate firmware should overwrite or read DTB/atags and kernel
    // in that case the firmware clears the magic and the version word
    .word 0x5afe570b
    version
    .word 0
    atags: .word 0x0 // default 0x100, device tree address
    kernel: .word 0x0 // default 0x00080000, kernel start address

@swarren what do you think? Help. Did I miss something important?
@pelwell I belive, I understood the GPL and GPU/ARM problems about @popcornmix wrote.
Sorry, could you elaborate more, what exactly is the problem? Are you not able to relocate
or reload the image?

@swarren
swarren commented Apr 14, 2016

@pelwell wrote:

I think config.txt should always be able to override any value in the kernel
or stub because it is the easiest for a human to see and edit

Sure, sounds fine.

I am suggesting reading some values from the stub under some circumstances
because it would allow distributions, including more consumer-oriented
applications such as media players and emulators, to be able to override
the defaults without having to edit config.txt.

Well, that is true, but ...

The standard RPi update mechanisms - apt-get update/dist-upgrade and
rpi-update - don't touch config.txt because it is considered too risky.

I guess editing the config file might be risky. What if the VC FW adopted a "config.d" style set of directories instead. That is: the FW first parses each of config.d/*.cfg in the same way as config.txt, then finally parses config.txt, with each parsing operation over-writing any of the previously parsed values. Each person/distro/package/... could create config.d/${some-unique-name}, thus allowing that file to be replaced entirely rather than edited. Nothing could be simpler from the perspective of adding/removing those files. Of course, the work in the VC FW is probably a fair bit more.

Overwriting an entire stub file is much safer.

That's true, but I honestly don't think forcing people to obtain, understand, modify, compile, manage updates to, and package a Pi-specific boot binary file is going to be greeted particularly well by those having to do it. There's a huge push in the ARM community in general to move away from device-specific boot special cases like this. A text file is unique to the Pi so a bit annoying, but at least can be created/edited using practically any tool and very little knowledge. Having to edit and compile the ARM stub just to change one or two 32-bit constants feels like massive overkill to me.

device_tree_end is definitely only of interest to the firmware (I believe
that's what I said), but its value should be chosen to match the kernel,
which is why it could be read from the stub for the reasons I've stated above.

That implies that the stub is specific to a particular OS or OS build. I'd rather not have that implied, to the extent possible. The stub currently does what Linux needs for Linux to boot, and so is related to Linux. However, the setup it performs is perfectly valid for other code such as U-Boot and could work just fine for others too. I don't know anything about BSD, but I don't see why it wouldn't work for BSD too, provided either config.txt contained a kernel load address override or the VC FW understood how to parse the BSD image format for load address too.

@capnm wrote:

I propose that there is no compatibility issue for all arm64 related parts,
until this bug is closed. Then I think, it's ok to accept changes in our code.

I guess that's true in general, but let's not make changes without good justfication, and before people can weigh in on any negative side-effects. U-Boot mainline already has support for the Pi 3 in 64-bit mode, and now it doesn't boot:-( @anholt already got caught out by this.

Re: your other comments about load address:

Interpreting the kernel Image's text_offset is mandatory. It's useful as has been pointed out. "We" should just do it somehow. The kernel image is not compressed by default so there's no complication reading the text_offset value from the header. I like your algorithm, "load to default address, parse header, memmove() the data if text_offset doesn't happen to match the default". Not 100% as optimal as simply always loading to text_offset directly, but simple and should be quick. However, if the VC FW can simply read the first sector/block of the image, derive the kernel load address from it, and then load the whole thing there, that'd be even better:-)

I suppose it's possible for the 64-bit ARM stub to do the 0x80000 -> text_offset memmove(), espeically now ARM stubs can be >256 bytes now. That would simplify the VC FW.

I suppose gzip decompression code could also be put into the stub, where GPL code likely isn't such a problem. We could add this feature later, since 64-bit Images aren't compressed by default.

@pelwell, I still don't like putting dtb_end into the stub. IIUC the current value of dtb_end is that it's at/near the end of ARM RAM, and DTB "grows down" from there. If so, the legal set of values can't be predicted when the stub is created; gpu_mem= can limit the maximum legal value. Attempting to put any high-valued dtb_end in the stub feels fragile.

If we assume that we want to put the DTB lowish in RAM rather than at the end (just not at 0x100 since there's no space there), this objection goes away; we can be pretty sure that e.g. 128MB into RAM is always available to the ARM. I'd argue that the stub should specify dtb_start not dtb_end in this case though. Given the default stub should support Linux by default I think putting a default override of e.g. 128M into the shipped/embedded stub would make sense, or should we just rely on the VC FW to choose a fixed lowish address that leaves plenty of room for the kernel (perhaps with decompression required)?

@swarren
swarren commented Apr 14, 2016

With the latest firmware (efdcf16 "kernel: Bump to 4.4.7"), with no armstub8.bin present, with kernel8.img present, I still need to specify arm_64bit=1 (or arm_control=0x200) or the system doesn't boot (into U-Boot at least).

The previous commit works fine.

Note: I tested the latest commit with U-Boot linked to 0x80000 (4 zeros) and the previous commit linked to 0x8000 (3 zeros), which I believe matches what the FW does.

@TheSin-
TheSin- commented Apr 14, 2016

I guess editing the config file might be risky. What if the VC FW adopted a "config.d" style set of directories instead. That is: the FW first parses each of config.d/*.cfg in the same way as config.txt, then finally parses config.txt, with each parsing operation over-writing any of the previously parsed values. Each person/distro/package/... could create config.d/${some-unique-name}, thus allowing that file to be replaced entirely rather than edited. Nothing could be simpler from the perspective of adding/removing those files. Of course, the work in the VC FW is probably a fair bit more.

In my dist, my boot loader package does this already, I have /etc/rpi/config-available and /etc/rpi/cmdline-available, and I have 2 helper scripts rpi[en|dis][conf|cmd] and update-rpiboot. the first one lines things into config-enabled and cmdline-enabled and the second reading the -enabled dirs to build and make config.txt and cmdline.txt

That way other packages can add to available or enable certain things, like a specific overlay, or options etc etc. It's very useful, though I likely keep mine the same and not use it on the fat partition but it's not a terrible idea so long as it doesn't add too much overhead in the fw

@capnm
capnm commented Apr 14, 2016

@TheSin- please no. We already have defaults, one config file and magicians.
That's already enough to drive myself crazy.

@swarren I can at least validate, that my armstub8.bin
works (partially) as expected. Without the config.txt and kernel8.img :

without stub magic: no memory change, the arm core is in the aarch64 reset state

with stub magic: kernel: = 0x0008_0000, atags: = 0x100 , the arm core is in the aarch64 reset state
BUG: there appears to be no change at 0x100, I still believe it is expected to load the DTB here.
I have to expect at 0x100 this header magic, right?
https://wiki.freebsd.org/FlattenedDeviceTree

Flattened device tree header (0x162f92c):
 magic                   = 0xd00dfeed
 size                    = 7858
 off_dt_struct           = 0x00000038
 off_dt_strings          = 0x000018ac
 off_mem_rsvmap          = 0x00000028
 version                 = 17
 last compatible version = 16
 boot_cpuid              = 0
 size_dt_strings         = 518
 size_dt_struct          = 6260

the <=kernel7 works with the new boot/*.btbo

@capnm
capnm commented Apr 14, 2016

In light of the fxl6408 disaster, I created the following dead-simple
(7 colour) test circuit:

//       .---------.
//      /           \
//      |  L  E  D  |
//      |           |
//      .-----------.
//      /   |    \  \
//    B     G     R   \ common anode (+)
//   /      |      |    \
//   -      -      -     |
//  |R|    |R|    |R|    |  220-300 Ohm (max 50mA/3)
//   -      _      _     |  resistor
//   |      |      |     |
//   |      |      |     |
//   |      |      |     |
// GPIO4  GPIO3  GPIO2  3.3V -+
//   U      U      U     U    | <-- Pin 1
//   o      o      o     o    |
// ---------------------------+

I'm aiming to make a set of /boot repositories for testing the compliance
for some sane combinations of otions, kernel images and boot stubs.
What do you think, is it worth the effort?

@Ferroin
Ferroin commented Apr 14, 2016

Regarding the decompression, other than the GZ header, it's just DEFLATE compression. The file format is documented openly here https://tools.ietf.org/html/rfc1952 and should be pretty simple to write a parser for without having to worry about licensing. For the decompression you just need a decompressor, not a full implementation, ,and there are a lot more options for that. I don't know if zlib's license is compatible, but there are all kinds of other options as well. The Wikipedia page on DEFLATE also mentions this http://code.google.com/p/miniz/ which might work (not 100% certain about the license, but I think it's public domain).

@capnm
capnm commented Apr 14, 2016

@Ferroin https://golang.org/src/compress/gzip/gunzip.go
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package gzip implements reading and writing of gzip format compressed files,
// as specified in RFC 1952.

@Ferroin
Ferroin commented Apr 14, 2016

I'm not saying that that wouldn't work, the point was that if zlib would work (or even that miniz thing), it's probably a easier to integrate code that's already written in C than it is something written in Go.

@capnm
capnm commented Apr 14, 2016

I doubt it, take the Go Tour, start coding ;) The main problem appears to me,
to agree how to stuff and call that code in the arm stub. That's why I proposed
to take the conversation to an another bug.

@TheSin-
TheSin- commented Apr 14, 2016

@TheSin- please no. We already have defaults, one config file and magicians.
That's already enough to drive myself crazy.

Naw, it's amazing how much help it is for packaging, and keeping the dist up to date with new options and such. For example my arch64 kernel can enable 64bit mode on config.txt by adding rpienconf 64bit in the postinst maintainer script. And my custom ui software can enable it's own config parts to hide console booting etc etc, and as I change things and there are 10s of thousands in the field, their config.txt and cmdline.txt will update and stay in sync.

I agree with whom ever said in this thread trying to deal with config.txt directly by parsing and changing things is not a good idea. by doing it this way a user can have a custom part that will never get messed with and always included. Though I admit since this is in my boot loader package it really has nothing to do with this topic, I was just commenting on how nice it is to have a .d style directory for configs to be able to control it via packaging.

@capnm
capnm commented Apr 14, 2016

You can't reduce the complexity by adding more complexity. Think about it.
The scripts should actually use the device tree monster.

@TheSin-
TheSin- commented Apr 14, 2016

again I disagree, but lets just agree to disagree.

That being said I do have one question. And I'm a package maintainer thus I do not pretend to understand the code or logic that the FW vs stub vs kernel have. But on x86_64 machines I can boot with a 32bit kernel or 64bit kernel and I don't need to change anything, nor do I need any special stub or kernel name. Just curious why that isn't possible here? Again this is just a question, just seems odd that we need a switch at all to me, it's not a huge deal I'm just more curious then anything.

@Ferroin
Ferroin commented Apr 14, 2016

On 64-bit x86, the kernel still has a 32-bit entry point, and switches to 64-bit mode itself (because of all the legacy crap on x86). On 64-bit ARM, that isn't the case, the kernel just has a 64-bit entry point, so the boot-loader has to switch to 64-bit mode.

@TheSin-
TheSin- commented Apr 14, 2016

ahh, okay thanks, what are the benefits from moving from that type of setup? Extra boot time? extra code? Just seems there is lots of work here to try and figure out things that x86 has been doing for years without and config feels or custom stubs etc etc

@Ferroin
Ferroin commented Apr 14, 2016

The choice is largely with the architecture designers. On x86, even the 64-bit processors start in 16-bit real mode. This was a design choice made to provide backwards comparability with existing 32-bit boot-loaders, which already had code to switch from real to protected mode prior to booting the OS (which in turn was because 32-bit x86 processors started in 16-bit mode so that you could still boot DOS directly on them). ARM however has none of that legacy stuff to deal with, and in fact, 64-bit ARM processors don't even have to support running 32-bit code (it's unlikely that such a processor will exist any time in the near future, but that's beside the point). As far as I understand it (and I might be wrong), the difference in the ARM boot stubs is to handle the different memory and addressing requirements between AArch32 mode and AArch64 mode (and then the differences between ARMv7 and ARMv6 on the 32-bit side).

@capnm
capnm commented Apr 15, 2016

I've started to write a documentation, everyone is invited to participate.
Raspberry Pi 3 ARM64 Support

@swarren
swarren commented Apr 15, 2016

@pelwell, is the switch to 0x80000 as the default 64-bit kernel load address permanent and not likely to change again? If so, I'll send a patch to fix U-Boot to use that address. If not, I'll hold off until things are stable.

BUG: there appears to be no change at 0x100, I still believe it is expected
to load the DTB here.

@capnm, I believe the VC FW currently loads the DTB somewhere high in memory, not at 0x100. So, it's not surprising you don't see anything at 0x100. The reason is that DTBs don't always fit between 0x100 and 0x8000, so the FW moved where it put the DTB so it would always fit.

A firmware tester sounds interesting. I'm very slowly setting up a new server upon which I'll run Jenkins to automatically test U-Boot. I did idly wonder whether I should have it test firmware.git/boot/ too. Still, even if I did do that, I'd likely test only the single boot configuration I typically use, and it'll be quite some time before I get it going even if I do. You should go for it!

Regarding the decompression, other than the GZ header, it's just DEFLATE
compression

@Ferroin and everyone: So the kernel itself doesn't contain any compression support. As such, there's no particular requirement to choose one compression algorithm over another. These days, gzip is slow and innefficient in comparison to other algorithms. Put another way, other algorithms generate smaller compressed images and can be decompressed faster. I would suggest picking one of those instead. lz4 might be a good choice.

You can't reduce the complexity by adding more complexity. Think about it.
The scripts should actually use the device tree monster.

@capnm I think increasing the complexity of some components (especially those written once like VC FW) can reduce the complexity of other components (e.g. every one distro that wants to manipulate config.txt, of which there could be many), and hence reduce overall complexity, or at least effort.

DT isn't an alternative to config.txt. DT is intended to represent purely the HW that is present. config.txt addresses SW policy/configuration.

@TheSin the primary difference between x86_64 and the Pi is that by default there's no ARM-based bootloader on the Pi to set things up before the kernel enters. A lot of what the stub is doing, and indeed some/all of what the VC FW is doing, is usually implemented by a BIOS/bootloader resident on the main CPU in other ARM or x86 systems.

@capnm
capnm commented Apr 15, 2016 edited

the switch to 0x80000 as the default 64-bit kernel permanent

Looks good to me.

I believe the VC FW currently loads the DTB somewhere high in memory, not at 0x100. So, it's not surprising you don't see anything at 0x100. The reason is that DTBs don't always fit between 0x100 and 0x8000, so the FW moved where it put the DTB so it would always fit.

I checked that the firmware writes the 0x100 value in the atags: field and there is no DTB magic at 0x100. Also one of that must be broken. @popcornmix @pelwell could you please take a look? (Just to be clear: I deleted config.txt, kernel8* and have the good blob armstub8.bin
and all the efdcf16 DTB files in the /boot folder)

I think increasing the complexity of some components ...

Agree, if it's done in the right place. Wasn't it great as the X11 config file(s) vanished into thin air? ;-)

So the kernel itself doesn't contain any compression support.

The upstream arm64 kernel make-files actually support only the gzip compressing. Good to hear that nobody is against using an another algorithm. As @popcornmix said, the VC is probably too slow for this task and for ARM you would need a sort of mini u-boot to setup the cache, etc.
Is somewhere a signal that triggers during the reboot-reset, usable for speed measurements?

@capnm
capnm commented Apr 15, 2016 edited

For stub fields that can work in either direction (kernel_address, device_tree_address), the presence of a zero can indicate an input and non-zero an output.

Are there any arguments against implementing that?

I belive this should work:

  1. currently for the stub author, writing any value in the atags: and kernel: stub field has no effect on firmware.

  2. in the case without an external stub, nothing changes, the firmware decides where to load DTB and kernel and passes those values to the internal stub fields.

  3. if the stub author lets the atags: and kernel: fields zero, the same as in 2) happens, except the stub author can additionally expect that the firmware doesn't writes the DTB inside the stub code because that would be for the stub author unpredictable and dangerous.

  4. if the stub author writes a value to the atags: field and lets the kernel: field zero, an available kernel image will be not loaded by the firmware, because the kernel position would be for the stub author unpredictable and dangerous.

  5. if the stub author writes a non-zero values in both atags: and kernel: fields, the firmware can assume that he knows what he is doing and follows the order.

I am not sure (and personally don't care) if the values coming from the stub fields should take a preference for non-default values (config.txt, kernel header, ...) or not.

@swarren Would that be fine for the u-boot?

@pelwell
Contributor
pelwell commented Apr 15, 2016 edited

Let me respond to a number of points that have accumulated:

  1. The failure to select 64-bit mode for a kernel8.img with no armstub8.bin is a bug and will be fixed in the next release.

  2. The firmware doesn't yet read device_tree_address from the stub, but that will change in the next release. See also 4.

  3. As of the next release, all kernels will be assumed to be DT-capable unless there is a trailer that says otherwise or the config.txt includes device_tree=.

  4. I'm withdrawing the request for a device_tree_end field in the stub. Instead, I'm going to make use of the bottom 8 bits of device_tree_address (as read from the stub) to derive the start and (exclusive) end addresses as follows, where all addresses are 256-byte aligned and high_address is chosen to be low-mem safe:

0x00000000: (high_address - dt_length) .. high_address  (populated by the firmware)
0x00000100: 0x00000100 .. 0x00004000 (for backwards compatibility)
0xuuuuuv00: 0xuuuuuv00 .. high_address
0xuuuuuvww: 0xuuuuuv00 .. (0xuuuuu000 + 0xww000)

Some examples:

0x00000108 -> 0x00000100 .. 0x00008000
0x00000180 -> 0x00000100 .. 0x00080000
0x10000080 -> 0x10000000 .. 0x10080000

The scheme is chosen for simplicity (as opposed to maximum flexibility) while still allowing all memory from 0x100 to any sensible upper bound to be described, and supports a maximum DTB size of just under 1MB (which should be sufficient).

  1. 32-bit kernels are loaded to 0x8000 by default, and 64-bit kernels to 0x80000. The default can be overridden by the stub, or by setting kernel_address=....

  2. 64-bit Linux kernels must be uncompressed initially - there is a lot of useful development work that can be done with an uncompressed kernel while we evaluate the feasibility and merits of firmware- and stub-based decompression.

@capnm
capnm commented Apr 15, 2016

@pelwell
LGTM, thanks.

@pelwell
Contributor
pelwell commented Apr 15, 2016
  1. Even non-zero fields in the stub may be modified by the firmware, either because the values have been overridden by config.txt settings or, in the case of the device tree address at 0xf8, because the lower order bits have been zeroed to form the real DT load address.
@swarren
swarren commented Apr 16, 2016

Is somewhere a signal that triggers during the reboot-reset, usable for speed measure

The ARM architected timers should work for that.

Re: #579 (comment)

@capnm I'm not sure that item 4 describes what the FW does; I believe it loads a kernel if one is present on disk, otherwise not. I don't think the FW needs the feature of skipping kernel load if kernel address is zero in the stub; just delete the kernel image instead.

@pelwell note that the Linux kernel at least allows up to a 2MB DTB, whereas your device_tree encoding allows only 1MB. That is probably fine. If you wante,d perhaps "w" could take up 1 more nibble in the encoded value, giving 2048-byte alignment and up to 16MB alignment? It's probably not a big deal either way. We can always change that later with a version bump if we find people using very large DTBs.

Overall, the points in your comment LGTM.

Once a FW is published the implements the points in @pelwell's last two posts, I'll close this bug. We can always open separate bugs focused on any specific remaining issues, such as long-term investigation of compressed kernels and parsing text_offset from the kernel header.

@swarren
swarren commented Apr 16, 2016

@pelwell armstub8.S currently contains:

.globl kernel_entry32
kernel_entry32:
        .word 0x8000

Do you think it's best if that's switched to 0x80000 to match the FW default, or to 0 so it's not explicitly requesting anything?

@capnm
capnm commented Apr 16, 2016 edited
  1. Even non-zero fields in the stub may be modified by the firmware, either because the values have been overridden by config.txt settings or, in the case of the device tree address at 0xf8, because the lower order bits have been zeroed to form the real DT load address.

LGTM

Do you think it's best if that's switched to 0x80000 to match the FW default

maybe just 0x0 (at some point the firmware may support linux kernel header address)
for both LGTM

I'm not sure that item 4 describes what the FW does; I believe it loads a kernel if one is present on disk, otherwise not. I don't think the FW needs the feature of skipping kernel load if kernel address is zero in the stub; just delete the kernel image instead.

Good point. If the kernel image is a part of distribution, the people
would hesitate to delete it. It's also easy to forget to delete the kernel image
and for several hours be amazed what's going on. I think it is also
a valuable information for the stub code, to be able to alert the user.
I think the firmware "kernel loading" state could be even more explicit.

RFC:
8) the presence of the magic value 0x5afe570b in the stub kernel field indicates the firmware must not load or didn't loaded the kernel image

@capnm
capnm commented Apr 16, 2016 edited

@swarren

Is somewhere a signal that triggers during the reboot-reset, usable for speed measure

The ARM architected timers should work for that.

No, I will exactly with a logic analyser measure the time
SoC reset by reboot <> ARM reset, i.e. my gpio pin activated in the stub or kernel image

It's hart to argue in the compression discussion without having an evidence.

edit: Oh, I found it in the DT :-)

Name:   gpio-poweroff
Info:   Drives a GPIO high or low on reboot
Load:   dtoverlay=gpio-poweroff,<param>=<val>
Params: gpiopin                 GPIO for signalling (default 26)
@pelwell
Contributor
pelwell commented Apr 17, 2016

@swarren Now that the firmware is writing them, all stubs should have zeroes in the 0xf8 (dtb) and 0xfc (kernel) slots. This is especially true of the dtb field, where 0x100 is likely to cause a failure because the gap (which is interpreted as being 0x100-0x4000) probably isn't large enough.

@capnm I like your idea, but I want to change the magic value and extend it to include the dtb field as well, so:

  1. If the kernel field (0xfc) is 0xffffffff then the firmware should not load a kernel. If a kernel is not loaded, either by request or through error, the firmware writes 0 as the kernel address.

  2. If the dtb/atags field (0xfc) is 0xffffffff then the stub/kernel does not want a DTB. If the DTB is not loaded, either by request or through error, the firmware writes 0 as the DTB address. N.B. The fact that the firmware will try to load the DTB anyway for its own purposes is an implementation detail and doesn't affect this logic.

@popcornmix popcornmix added a commit that referenced this issue Apr 17, 2016
@popcornmix popcornmix kernel: BCM270X_DT: Add dpi24 overlay
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: #449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: #579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
9628ed8
@popcornmix popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 17, 2016
@popcornmix popcornmix kernel: BCM270X_DT: Add dpi24 overlay
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: raspberrypi/firmware#449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: raspberrypi/firmware#579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
f063c24
@popcornmix
Contributor

Updated firmware has been pushed with the changes @pelwell has described.

@capnm
capnm commented Apr 18, 2016 edited

My first tests show that a stub code located behind the 0x0008_0000 does not executes. Either it is overwritten or doesn't get copied. It is independent of the stub magic or the value of the kernel field. It works again if I revert to efdcf16... kernel: Bump to 4.4.7

  1. kernel8.img with no armstub8.bin works.
@popcornmix
Contributor

In the not-working case, can you show your config.txt settings, and whether you have armstub or kernel files?

@capnm
capnm commented Apr 18, 2016

armstub8.bin only

@popcornmix
Contributor

So there are no kernel*.img files on sdcard?

@capnm
capnm commented Apr 18, 2016 edited

I've had some stale files, sorry for the false alarm. A fresh checkout and suddenly it worked again.

I tested the essential ARM64 relevant cases, everything LGTM. Now it's time for useful development work. Thanks @swarren @popcornmix @pelwell !

tested (armstub8.bin, rm kernel8* config.txt)

      1. read device_tree_address from the stub:

stub without magic no change, ok

0x00000108 -> 0x00000100 .. 0x00008000
0x00000180 -> 0x00000100 .. 0x00080000
0x10000080 -> 0x10000000 .. 0x10080000
dtb_magic present, ok

0xffffffff -> 0x0, ok
0x00000100 -> 0x0, ok

  1. tested (kernel8.img, config.txt, rm armstub8*)
    dtb_magic present, ok
    device_tree=, x0 == 0x0, ok

  2. tested (kernel8.img, config.txt, rm armstub8*)
    kernel8.img default@0x0008_0000, ok
    kernel_address=0x0080_0000, dtb_magic present, ok

  3. tested (kernel8.img, armstub8.bin, rm config.txt)
    kernel: 0xffffffff -> 0x0, ok
    kernel: 0x0080_0000, atags: 0x108 -> 0x0080_0000, dtb_magic present, ok
    kernel: 0x0, atags: 0x108 -> 0x0008_0000, dtb_magic present, ok

@swarren
swarren commented Apr 19, 2016

@capnm

Good point. If the kernel image is a part of distribution, the people
would hesitate to delete it. It's also easy to forget to delete the kernel image
and for several hours be amazed what's going on.

Well, nothing is going to help if you delete the kernel accidentally. If you delete it deliberately, then having to update the stub as well is going to be a bit annoying.

I think it is also
a valuable information for the stub code, to be able to alert the user.
I think the firmware "kernel loading" state could be even more explicit.

Well, it could perhaps flash the LED on the board, but that's about all it could do. I wouldn't expect noticing that the kernel file is missing to take particularly long, but perhaps that's simply because I have scripts that copy everything to the SD card, so I'd probably just re-run them, find it works, be puzzled and move on.

@pelwell the 0xffffffff changes make sense to me.

I tested the latest FW with both rpi-3-aarch64-demo and U-Boot with just a kernel8.img and no armstub8.bin and no config.txt options to force 64-bit and everything worked as expected.

Since everything is basically working fine w.r.t. armstub8.bin, I'm closing this bug. If problems are found, we can open new bugs for those specific changes. Thanks!

@swarren swarren closed this Apr 19, 2016
@SylvainGarrigues

@pelwell Setting kernel_address=0x100000 in config.txt doesn't work, is it intended? How do I modify the load address these days? Shall I provide my own stubs?

Right now the only solution I have is to have a jump to 0x100000 as the first instruction of my kernel, which I have padded with zeros between 8004 and 100000:

/usr/bin/printf "\001\366\240\343" > first_commands
dd of=kernel7.img bs=1m count=4 if=/dev/zero
dd of=kernel7.img bs=1 conv=notrunc if=first_commands
dd of=kernel7.img bs=32k oseek=31 conv=notrunc if=kernel.bin
@pelwell
Contributor
pelwell commented Apr 19, 2016

@SylvainGarrigues No, that is not what is intended, and it isn't immediately obvious how that could be happening. Are you using an external stub? Is the stub being patched, and if so with what?

@SylvainGarrigues

@pelwell I am not using any stub, and my config.txt just reads:

device_tree=rpi2.dtb
gpu_mem=128
disable_overscan=1
max_usb_current=1
audio_pwm_mode=2

Padding the kernel with 0 like I mentioned earlier works with this config.txt.

If I add the line kernel_address=0x100000 in config.txt and do not pad with 0, nothing happens.

@pelwell
Contributor
pelwell commented Apr 19, 2016

That's not my experience. With this config.txt:

enable_uart=1
dtparam=audio=on
kernel_address=0x100000
device_tree=bcm2710-rpi-3-b.dtb

and after zeroing the first 2MB of memory (not a requirement, just to confirm that it isn't picking up anything from the previous attempt), it boots. This is with a standard 32-bit compressed kernel built to run at 0x8000, so the decompressor must be decompressing from 0x100000 back to 0x8000 (the memory at 0x8000 once the kernel is running looks like the start of a normal kernel).

@pelwell
Contributor
pelwell commented Apr 19, 2016

And to remove any doubt, I see the kernel being loaded to 0x100000 as expected. All modifications to the internal kernel_address variable, apart from the initialisation from config.txt, are conditional on it being non-zero, so I don't see how this can be failing for you.

@swarren
swarren commented Apr 19, 2016

FWIW, yes the 32-bit ARM zImage decompressor is position-independent and so will always decompress to 0x8000 (or wherever the Image is linked, I suppose) no matter where it's loaded. Well, IIRC the decompressor must be loaded within the first 128MiB of RAM for AUTO_ZRELADDR to work correctly. As an aside, when using zImage, 0x8000 (or thereabouts) is actually about the worst place the kernel could be loaded, since the decompressor must first copy itself somewhere else and then perform decompression, so that the decompressed data doesn't over-write the compressed data during decompression. If the zImage was loaded elsewhere (say, 32MB into RAM) that initial memcpy wouldn't be needed. Still, too late to change that default I suspect.

@SylvainGarrigues
SylvainGarrigues commented Apr 19, 2016 edited

I am sorry, I was mistaken, I had a serial problem.

My kernel is a raw binary (therefore neither an ELF nor a zImage) and it appears my kernel is loaded at the address specified by kernel_address and it is run from there without any stub, just with the kernel_address line in config.txt. Which is just what I wanted.

Now I can boot FreeBSD with just two lines in config.txt:

device_tree=rpi2.dtb
kernel_address=0x100000

So cool! No need for mkknlimg anymore, no need for u-boot either, no need for a stub, no need for kernel_old.

Thanks!

@popcornmix popcornmix added a commit that referenced this issue Apr 19, 2016
@popcornmix popcornmix kernel: scripts/mkknlimg: Append a trailer for all input
kernel: bcm2835_thermal: Don't report unsupported trip type

kernel: scripts/dtc: Only emit local fixups for overlays

kernel: bcm2835: do not require substream for accessing chmap ctl
kernel: bcm2835: add fallback channel layouts if channel map API is not used
kernel: bcm2835: log which channel map is set
See: raspberrypi/linux#1257

firmware: armstubs: Zero kernel and DTB addresses to match external stubs
firmware: arm_dt: If the trailer exists, ignore device_tree=
firmware: arm_loader: Load DTB high if insufficient space
See: #579

firmware: dtoverlay: Refactor applying overlays to permit snooping
firmware: dtoverlay: Add dtparam command
firmware: dtmerge: Don't crash if the overlay fails to load
firmware: dtoverlay: Integer overrides can create and extend properties
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: platform: Don't overwrite disable_pvt=2

firmware: board_info: CM3 has no Bluetooth

firmware: dispmanx: Remove ifdefs for obsolete platforms
firmware: vc_image: Remove ifdefs for obsolete platforms
firmware: vc_audio: Remove ifdefs for obsolete platforms
firmware: vcfw: Remove ifdefs for obsolete platforms

firmware: scalerlib: testing: treat HVS dst_x parameter as 11-bit unsigned
c5e1319
@popcornmix popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 19, 2016
@popcornmix popcornmix kernel: scripts/mkknlimg: Append a trailer for all input
kernel: bcm2835_thermal: Don't report unsupported trip type

kernel: scripts/dtc: Only emit local fixups for overlays

kernel: bcm2835: do not require substream for accessing chmap ctl
kernel: bcm2835: add fallback channel layouts if channel map API is not used
kernel: bcm2835: log which channel map is set
See: raspberrypi/linux#1257

firmware: armstubs: Zero kernel and DTB addresses to match external stubs
firmware: arm_dt: If the trailer exists, ignore device_tree=
firmware: arm_loader: Load DTB high if insufficient space
See: raspberrypi/firmware#579

firmware: dtoverlay: Refactor applying overlays to permit snooping
firmware: dtoverlay: Add dtparam command
firmware: dtmerge: Don't crash if the overlay fails to load
firmware: dtoverlay: Integer overrides can create and extend properties
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: platform: Don't overwrite disable_pvt=2

firmware: board_info: CM3 has no Bluetooth

firmware: dispmanx: Remove ifdefs for obsolete platforms
firmware: vc_image: Remove ifdefs for obsolete platforms
firmware: vc_audio: Remove ifdefs for obsolete platforms
firmware: vcfw: Remove ifdefs for obsolete platforms

firmware: scalerlib: testing: treat HVS dst_x parameter as 11-bit unsigned
f325bb8
@capnm
capnm commented Apr 20, 2016

@SylvainGarrigues
And I can boot linux with just one line in config.txt ;)

device_tree=bcm2837-rpi-3-b.dtb

scp arch/arm64/boot/Image $rpi3/kernel8.img
scp arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b.dtb $rpi3/
...
uname -a
Linux rpi3 4.6.0-rc3-next-20160413+ #1 SMP PREEMPT Wed Apr 20 13:50:53 CEST 2016 aarch64 GNU/Linux

@XECDesign XECDesign added a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
@popcornmix @XECDesign popcornmix + XECDesign pwm_sdm: fix ring buffer UINT_MAX wraparound bug
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: IL ISP: Correct RGB to YUV matrices, and ignore code side info

firmware: MJPEG encode: Handle stereoscopic images
See: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=138325&p=918041

firmware: IL Camera: Change unspecified colour space to being JFIF
See: raspberrypi/userland#78

firmware: OV5647: Option to configure auto lens shading to use potential fix

firmware: arm_loader: Factor out DT support into arm_dt
See: raspberrypi/linux#1394

firmware: arm_ldconfig: Switch to using arm stubs generated from tools/mkimage
firmware: arm_ldconfig: Support loading arm stubs from file
See: raspberrypi#579
b062323
@XECDesign XECDesign added a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
@popcornmix @XECDesign popcornmix + XECDesign kernel: Bump to 4.4.7
firmware: config: Add arm_64bit setting
firmware: arm_ldconfig: Set kernel_address for 64-bit boot
See: raspberrypi#579
6192786
@XECDesign XECDesign added a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
@popcornmix @XECDesign popcornmix + XECDesign kernel: BCM270X_DT: Add dpi24 overlay
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: raspberrypi#449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: raspberrypi#579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
f2bfe22
@XECDesign XECDesign added a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
@popcornmix @XECDesign popcornmix + XECDesign kernel: scripts/mkknlimg: Append a trailer for all input
kernel: bcm2835_thermal: Don't report unsupported trip type

kernel: scripts/dtc: Only emit local fixups for overlays

kernel: bcm2835: do not require substream for accessing chmap ctl
kernel: bcm2835: add fallback channel layouts if channel map API is not used
kernel: bcm2835: log which channel map is set
See: raspberrypi/linux#1257

firmware: armstubs: Zero kernel and DTB addresses to match external stubs
firmware: arm_dt: If the trailer exists, ignore device_tree=
firmware: arm_loader: Load DTB high if insufficient space
See: raspberrypi#579

firmware: dtoverlay: Refactor applying overlays to permit snooping
firmware: dtoverlay: Add dtparam command
firmware: dtmerge: Don't crash if the overlay fails to load
firmware: dtoverlay: Integer overrides can create and extend properties
See: https://www.raspberrypi.org/forums/viewtopic.php?f=107&t=139732

firmware: platform: Don't overwrite disable_pvt=2

firmware: board_info: CM3 has no Bluetooth

firmware: dispmanx: Remove ifdefs for obsolete platforms
firmware: vc_image: Remove ifdefs for obsolete platforms
firmware: vc_audio: Remove ifdefs for obsolete platforms
firmware: vcfw: Remove ifdefs for obsolete platforms

firmware: scalerlib: testing: treat HVS dst_x parameter as 11-bit unsigned
0bc7cde
@stapelberg stapelberg added a commit to stapelberg/documentation that referenced this issue Nov 23, 2016
@stapelberg stapelberg Document default kernel_address values
The values come from @pelwell, who works on the boot loader:
raspberrypi/firmware#579 (comment)
0f42ec1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment