Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does second stage bootloader not set VTOR and SP? #6

Closed
thejpster opened this issue Jan 23, 2021 · 28 comments · Fixed by #10
Closed

Why does second stage bootloader not set VTOR and SP? #6

thejpster opened this issue Jan 23, 2021 · 28 comments · Fixed by #10
Labels
enhancement New feature or request

Comments

@thejpster
Copy link

Having to make 0x100 in flash a function as opposed to a vector table complicates the use of other programming languages and SDKs, which assume flash starts with a vector table.

Is there a reason why the second stage bootloader doesn't set VTOR to 0x1000_0100, SP to [0x1000_0100] and then jump to [0x1000_0104]?

@kilograham
Copy link
Contributor

Having to make 0x100 in flash a function as opposed to a vector table complicates the use of other programming languages and SDKs, which assume flash starts with a vector table.

I'm not sure that is an assumption they should make (unless flash is at 0x00000000), Still if they don't reference the stack pointer you could replace it with something like

0x47184b00, // ldr r3, [pc, #0]; bx r3

to jump to the reset vector, and actually put a stack pointer in there (though all your bootstrap code need to be in the first 256 bytes of flash somewhere)

@Wren6991
Copy link
Contributor

complicates the use of other programming languages and SDKs, which assume flash starts with a vector table.

Seems like the second stage bootloader already breaks this assumption, so I guess the other issue is the alignment requirement of VTOR, and not wanting to hoist the reset handler over the vector table to fill the alignment hole?

@kilograham
Copy link
Contributor

P.S. yay, rust i'm guessing - i wanted to give it a try myself, but i had zero time, and my Rust is not very idiomatic yet!

@Wren6991
Copy link
Contributor

Is there a reason why the second stage bootloader doesn't set VTOR to 0x1000_0100, SP to [0x1000_0100] and then jump to [0x1000_0104]?

Purely a size consideration, in case we needed to squeeze more in there at a later time (as you can see it is already quite tight), and want to keep the interface stable. No strong objections to that mechanism but it looks like it would add a few bytes. Then again I guess now would be the time to change it, if ever.

@Wren6991
Copy link
Contributor

(I'm guessing one of the unstated priors here is that you would rather not ship your own second stages with those SDKs)

@Wren6991
Copy link
Contributor

Wren6991 commented Jan 23, 2021

So I understand, you're suggesting a patch like this? (edit: and also setting VTOR as well as vectoring through the table)

--- a/src/rp2_common/boot_stage2/boot2_w25q080.S
+++ b/src/rp2_common/boot_stage2/boot2_w25q080.S
@@ -277,7 +277,10 @@ soft_reset:
     pop {r0}
     cmp r0, #0
     bne 1f
-    ldr r0, =(XIP_BASE + 0x101)
+    ldr r0, =(XIP_BASE + 0x100)
+    ldr r1, [r0]
+    msr msp, r1
+    ldr r0, [r0, #4]
 1:
     bx r0

@kilograham
Copy link
Contributor

oh i had entirely misread the original topic - ignore everything i said!

@Wren6991
Copy link
Contributor

Thought about this a bit longer, and it makes sense to put vector table at the start in flash images, but we also have a NO_FLASH build flag, where binaries are built to be loaded into RAM. The issue is that when loaded over UF2->USB, the bootrom enters these images in the same way that boot2 currently enters flash crt0.

This means we'll have vector table first on flash builds (which by default we then copy out and relocate into RAM anyway), and second on RAM-only builds. I can't see this causing any particular trouble other than making people's eyes bulge, so I'll raise a PR to make these changes.

@thejpster
Copy link
Author

Thanks!

https://GitHub.com/thejpster/pico-blink-rs contains my terrible hacks.

@thejpster
Copy link
Author

And to be clear, yes I would prefer not to ship either ASM or blobs with a Rust application. ASM is preferable, but it involves pulling in big dependencies to kick off gcc / as automatically.

@jonas-schievink
Copy link

Shipping ASM shims is much easier if you take the approach that I've used in the cortex-m crate, it works without any external dependencies: https://github.com/rust-embedded/cortex-m/blob/master/asm/lib.rs

@Wren6991
Copy link
Contributor

Ok, so the ideal situation from a dependency point of view is to reimplement boot2 in rust, if you are able? In other words, you would be free to change the handover between that second stage and the reset handler of the rust program?

I think it still makes sense to make this change to the boot2 files shipped with the SDK.

@thejpster
Copy link
Author

I think the change would help with supporting other OSes too.

We'll look at doing boot2 in Rust. It'll be an interesting challenge!

@kilograham
Copy link
Contributor

Note the other thing we have in the first 256 bytes is the root of our machine walkable "binary info" tree (used by picotool)
This is a bunch of (semi) self-describing information embedded in the binary; we have a 20 byte header that is only looked for in 10000100 to 10000200 in flash binaries (though we can obviously change that if it is going to be an issue for other languages, so goot to get your opinion)

Our hope is to have other languages include other information that might be useful to users (you can run this against a BIN/UF2/ELF file or against a device to see what is on it!)

The information is designed to be compact, but if nothing else included the end address of the binary is good because it allows you to do picotool save to save your program off your RP2040 device into a UF2/BIN

e.g.

$ picotool info -a firmware.elf
File firmware.elf:

Program Information
 name:            MicroPython
 version:         v1.13-289-g1196871a0
 features:        USB REPL
                  thread support
 frozen modules:  _boot, rp2, ds18x20, onewire, uasyncio, uasyncio/core, uasyncio/event,
                  uasyncio/funcs, uasyncio/lock, uasyncio/stream
 binary start:    0x10000000
 binary end:      0x10038ac0
 embedded drive:  0x100a0000-0x10200000 (1408K): MicroPython

Fixed Pin Information
 none

Build Information
 sdk version:       1.0.0
 pico_board:        pico
 build date:        Jan 20 2021
 build attributes:  MinSizeRel

or

$ picotool info -a ./scanvideo/sprite_demo/sprite_demo.elf
File ./scanvideo/sprite_demo/sprite_demo.elf:

Program Information
 name:          sprite_demo
 features:      UART stdin / stdout
 binary start:  0x10000000
 binary end:    0x100179ec

Fixed Pin Information
 0-4:    Red 0-4
 6-10:   Green 0-4
 11-15:  Blue 0-4
 16:     HSync
 17:     VSync
 18:     Display Enable
 19:     Pixel Clock
 20:     UART1 TX
 21:     UART1 RX

Build Information
 sdk version:       1.0.0
 pico_board:        vgaboard
 build date:        Jan 19 2021
 build attributes:  Release

@thejpster
Copy link
Author

Off the top of my head, 256 bytes minus 20 bytes leaves room for 57 interrupt vectors, which is more than enough I think. Configuring an SDK to put a special section after the vector table is usually much easier than trying to put it in front of the vector table.

@Dirbaio
Copy link

Dirbaio commented Jan 24, 2021

I'm not that sure moving the vectors to 0x10000100 is that advantageous. If we're going to copy the vectors to RAM anyway, they can be anywhere in flash really. They could even be first thing in .data, so they'll be initialized by the .data copy.

@Wren6991
Copy link
Contributor

I'm not that sure moving the vectors to 0x10000100 is that advantageous

I'm not aware of the build system or language context here. It makes sense to me that existing projects might assume their image starts with a vector table, as they may expect the hardware to be doing the vectoring. It seems a bit cleaner for boot2 (whose contents ought to be language-agnostic, and whose size is fixed) to be the only thing bumped on the front of that.

@kilograham
Copy link
Contributor

Yes FYI the original reason for this was that we support the very non standard idea of RAM UF2s, and the bootrom has no idea what the entry point is in the binary, so we go with the lowest address. Also RAM binaries means that you can't just expect to always download code in debugger, then reboot, you should also support "file" and "run".

Anyway I think we support all that with the new layout (which I agree is more sensible) without needing to change the bootrom :-)

@kilograham
Copy link
Contributor

kilograham commented Jan 24, 2021

I'm not that sure moving the vectors to 0x10000100 is that advantageous. If we're going to copy the vectors to RAM anyway, they can be anywhere in flash really. They could even be first thing in .data, so they'll be initialized by the .data copy.

Yeah definitely worth considering the RAM binary use case. (RP2040 with no flash, running something on the device without disturbing the flash) use cases, since you do need code at the beginning for that (although clearly you could just trade out the boot stage 2 for a 256 byte indirect thru boot vector at 0x100)

@kilograham
Copy link
Contributor

Off the top of my head, 256 bytes minus 20 bytes leaves room for 57 interrupt vectors, which is more than enough I think. Configuring an SDK to put a special section after the vector table is usually much easier than trying to put it in front of the vector table.

Agreed; more of an FYI than a concern over space

@conkerkh
Copy link

Traditionally the main app will relocate VTOR if necessary, which is why it doesn't make sense to do it in boot stage2. You can have a look at SystemInit function in system_XXX.c files in CMSIS libraries. This concept makes sense, you can have multiple implementations of system_XXXX.c, one which will copy VTOR to RAM and relocate address there, another while will use FLASH for this purpose. I don't think that that it's necessary to do it in boot_stage2.

On the other hand, I think that you could use C directly writing boot_stage2 code. It's more readable for most people and the size of the output is pretty much the same, or very close (it fits 252bytes). I've verified this by rewriting this in C. I'm porting this ASM to C and the results look ok. I think I should get my pico tomorrow and will check if everything works. If it's ok then I will post a like to my repo.

@kilograham
Copy link
Contributor

Yeah, the question here is not about installing the VTOR, but whether to allow the real binary to have its vector table at the beginning of the binary (which we currently don't as we jump there). The change would just be to pull initial PC,SP from 0x100-0x108 instead of jumping to 0x100

@conkerkh
Copy link

Oh damn, yeah missed the point :D I've looked at the linker file again, I get what you mean. That looks really weird indeed... What was the initial idea behind placing reset section in front of vector?

@Wren6991
Copy link
Contributor

That looks really weird indeed... What was the initial idea behind placing reset section in front of vector

Keeping the handover between boot2 and crt0 absolutely minimal in case there was any critical hardware setup we needed to use that space for. We've released now, things seem to be basically working, so it makes sense to spend those bytes to bring our binary layout in line with other codebases' assumptions

I've verified this by rewriting this in C. I'm porting this ASM to C and the results look ok. I think I should get my pico tomorrow and will check if everything works. If it's ok then I will post a like to my repo.

Please do share -- right now I'm keen on just getting the binary layout change merged before people go too much further with bare metal language bringup stuff, but it would be interesting to look at rewriting in C in the future (and might let us have a single, more configurable boot2)

@kilograham
Copy link
Contributor

I added a comment to the PR, but I don't think we should actually install the VTOR. we should just redirect thru the first 8 bytes, and leave it to the language runtime to decide how they setup vector table. Thoughts?

@Wren6991
Copy link
Contributor

I think the issue with not installing VTOR is that a lot of existing runtime will just turn on IRQs without modifying VTOR (as VTOR is not modifiable on some systems anyway), under the assumption that VTOR points at the vector table that you came into the image through, which is true on almost every system apart from ours.

If we didn't install VTOR, this would send all such code through the bootrom vector table.

This doesn't affect our runtime anyway because we relocate VTOR before enabling IRQs, so it wouldn't affect our code, and would give other crt0s an experience that's more consistent with other Cortex-M platforms.

@kilograham
Copy link
Contributor

Yes, you are right, my brain was out to lunch

@kilograham kilograham linked a pull request Jan 27, 2021 that will close this issue
@kilograham kilograham added bug Something isn't working enhancement New feature or request and removed bug Something isn't working discussion labels Jan 27, 2021
@Wren6991
Copy link
Contributor

The answer to the question in this thread's title is now "it does" :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants