Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite implementation ideas #29

Closed
Knight-Ops opened this issue Oct 5, 2019 · 24 comments
Closed

Rewrite implementation ideas #29

Knight-Ops opened this issue Oct 5, 2019 · 24 comments

Comments

@Knight-Ops
Copy link

Knight-Ops commented Oct 5, 2019

First of all these tutorials are excellent. They are an excellent resource overall for Rust embedded work so thank you for that!

I have been working through some of these examples and I wanted to provide my 2 cents for implementation details for v2.

There are two main details that I found confusing or inefficient that I thought would be good to at least have documented somewhere so people understand trade offs.

  1. Mailbox API - The documentation for this interface is absolutely awful (not at all your fault). One of the things I would suggest would be abstracting raw Mailbox calls into separate functions (ex. https://github.com/r0ck3tAKATrashPanda/raspi-os/blob/master/src/bsp/rpi3/mbox.rs#L623) This will significantly reduce the chance of botching one of these mailbox calls which are a nightmare to debug. Since this is a RPi3 OS it is less of an issue, but it may even be worth pushing the mailbox creation, usage, and destruction into this function if you ever wanted to add a BSP that didn't support something similar (Again that is probably just something that people may find as food for thought in the tutorials). The cons of this method is that we bloat code size fairly significantly. We can use LTO in order to throw out unused functions at link time, which should help remove that, but it is just a trade-off to mention.

I was curious on the design choice of using modules and set values instead of enums that held all the variants for things such as Tags within the mailbox api. (ex. https://github.com/r0ck3tAKATrashPanda/raspi-os/blob/master/src/bsp/rpi3/mbox.rs#L77)

Additionally, not that it affects anything, but mbox.buffer[4] (5th byte of the buffer) doesn't need to be filled out, apparently that is used by the VideoCore to send back the size of the response buffer. It should be parsed in order to determine if the size changed when the mail was responded too. Again this isn't an issue by any means, just hoping to make more information available to people about the mailbox API.

  1. println! macro and console() - Currently in v2 you are using the magical QEMU structure to print. In the current implementation we have a console() function that is building the magical structure each time it is called, which is every time print! or println! is called. Obviously this isn't an issue in QEMU and it is an empty struct, but this design has major performance implications when you move to hardware and start to use UART, the amount of overhead for re-initializing and the risk of having multiple consumers attempting to access the same peripheral is quite high. Instead this should be moved to a single static ref (maybe even for the QEMU struct for good practice early on in the process of learning)

If we implement UART something like

lazy_static! {
    pub static ref CONSOLE: Mutex<Uart> = {
        let mut uart = Uart::new();
        let mut mbox = mbox::Mbox::new();
        {
            uart.init(&mut mbox, 4_000_000);
        }

        Mutex::new(uart)
    };
}

We A) don't initialize UART until we need it B) We have a mutexed (spin) UART access so we can println! from anywhere without worry (sorta).

Again, these tutorials are excellent and I am not trying to nit pick, but these are some pain points I ran into, that would be great if we could get covered in one way or another. It would be great to hear your thoughts and let me know if there is anything I can do to contribute!

@andre-richter
Copy link
Member

andre-richter commented Oct 5, 2019

Hi,

thanks for the feedback, appreciate it :)

This is a good opportunity to write down the story behind the rewrite. I have been asked by other people offline as well.

How did the v1 tutorials came to be ?

I started the v1 tutorials by rewriting existing C bare-metal tutorials in Rust. And rewriting was quite literally the case. At the time I started there were almost no aarch64 Rust resources available, so rewriting something existing to see if it would work in Rust already was a personal proof-of-concept challenge for me.

Over time, my personal focus shifted towards making those tutorials Operating System tutorials, and I drifted away from the bare-metal only origins of the C tutorials. This caused three major issues:

  1. The problem with the new focus on being OS tutorials was that the original lessons didn't fit from a didactic perspective anymore. For example, the v1 tutorials introduce a lot of the RPi3 HW very early. Two UARTs, the Mailbox, RNG, Timers, and so on.
    For OS tutorials, it makes sense to introduce some more OS groundwork and abstractions first, e.g. Driver traits, and then implement board-specific drivers on top of those.
  2. Originating from being a quite literate rewrite of C tutorials, features of the Rust language were used only sparsely.
  3. A whole lot more abstraction and modularization is needed for an OS. RPi specific driver code and Kernel code is intermingled too much in the v1 version.

Hence I decided to start a rewrite.

What is the plan for v2

With the v2 tutorials, I have set myself four main requirements:

  1. They should be suitable for readers with no OS dev background, and introduce OS concepts from the ground up.
  2. At the same time, they should be suitable for readers with very little Rust background and clearly pinpoint areas where Rust needs a whole lot more and different attention than C/C++ (for example, working with mutable globals (statics).
  3. They should tell a nice didactic story that introduces stuff in small and easily digestible chunks.
  4. Make the RPi3 the main hardware (because it is one of the most accessible multi-core boards out there that is also affordable), but have the code in a state where it's modularity and abstractions would allow a quite easy implementation on top of other HW (e.g. RPi 4 or RISCV boards).

The current outline

I have an outline in my head, but I am a bit reluctant to put it out there wholly, because several times already I realized that I had to shift it around quite a lot when I started actually implementing a lesson. The whole thing is a big moving target and I still learn a lot and have to adapt when I am actually coding it.

But the general idea is this:

I wanted to be able to arrive at a globally usable print!() macro (that works with QEMU) at the earliest. This enables people to annotate arbitrary sections of the code and they can do "printf" debugging very early.
For this reason, I introduced the "magic" QEMU struct. Not much magic under the hood, it merely abuses the fact that QEMU's UART emulation only really cares about data being written to the UART TX MMIO register, and doesn't need anything else being set up, for example the GPIO pins.
So the magic QEMU output will only work with QEMU until I introduce the GPIO and UART device drivers at a later point.

The story I will tell from there on will go something like this:

  1. "Look, we already have a globally usable print function that calls an implementation of fmt::Write, which takes an &mut self, but we cheated and this is not usable in the real world. Every time we print, a completely new QEMUOutput struct is created, that is why it works with the &mut self from any context.
    What if this QEMUOutput struct would have some state that we want to preserve? We would need to save it in a global so that code will always access the same struct when printing and not a newly generated one every time.

    • BAM. First big "Rust is different than C" lesson: We can only have globals using &mut self functions with static mut, but that requires unsafe, and we don't want that.
      • Introducing mutual exclusion concepts for OSes (Rust basically demands us to have those, whereas C wouldn't care at all, how nice is that? :) ) and how they are implemented idiomatically in Rust (as a first shot again by the means of the NullLock from the v1 tutorials)."
  2. From there on, I will most probably introduce a DeviceDriver trait and write the UART (the one where we don't need the Mailbox yet) and GPIO drivers against them, and put them in a global static behind a NullLock. Now we have a single struct that is globally accessible in a safe way. First big goal achieved 🎉.

  3. As soon as we have the physical UART working, having the raspbootin lesson to enable easy testing on the real HW.

From there on, in an order I have yet to decide:

  • Lowering to EL1

  • JTAG

  • Bump allocation

  • Virtual memory

  • Exceptions

  • Interrupts

  • Timers

  • and so on

    Regarding your questions, @r0ck3tAKATrashPanda:

Mailbox API

Completely agree. The reason it is implemented like it is right now: Did it very early and basically just translating C to Rust. I will definitely revisit your suggestions when introduction of the Mailbox is due, and implement it more idiomatically wrt Rust.

Currently in v2 you are using the magical QEMU structure

I hope the intention is answered already by my previous statements. It is this way for both didactic and practical reasons and will be corrected by the tutorials that follow.

lazy_static!

I want to build and explain synchronization primitives all by ourselves. Just using lazy_static leaves the reader in the dark about a very important OS concept, which is not beneficial for people wanting to learn OS dev and also in violation of my requirements :)

Hope that helps.
Best,
Andre

@mkroehnert
Copy link

@andre-richter thanks for providing a detailed answer and also thanks for the tutorial(s) in general.
I found this tutorial because I explicitly want to run Rust on RPI3 on bare-metal.
Will this still be an option, after the v2 rewrite?

@andre-richter
Copy link
Member

Yes, RPi3 will stay the main target platform.

Anyways, don't wait for the rewrite to finish anytime soon. My freetime is sparse at the moment...

@Knight-Ops
Copy link
Author

Knight-Ops commented Oct 9, 2019

Looks like the DevieDriver went in yesterday and it looks good. I got a assembly-free version of raspbootin here : https://github.com/r0ck3tAKATrashPanda/raspbootin-rs Obviously this could use some work, and I had a lot of issues with the rebasing of the image since the linker script thinks it is in the wrong location.

I am probably missing something to make this first memcpy easier, but without assembly it is a hassle (especially with the linker thinking the binary is loaded somewhere other than 0x80_000). But it was a major hassle.

Changes that should probably be made :

  • No reason for .got to not be it's own section in my linker script
  • Clean up of _start. I had major issues with the compiler trying to do too much in terms of optimizations. I had some issues with stack usage prior to stack initialization, as well as not letting me use a GOT entry for my jump to init() without optimizing it out (hence the weird if statement)
  • rebase_image probably doesn't need to be extern "C"
  • This is all based on utilizing UART0

I am not sure if this implementation is helpful at all for saving time for your implementation or not, but just wanted to leave that here.

Additionally, have you considered something like a Discord channel (or the embedded-rust channel) people that are going through this/interested in this type of work? It may make collaboration easier or features other people are working on that could be contributed easier to discuss and collaborate on.

@andre-richter
Copy link
Member

Hi @r0ck3tAKATrashPanda,

thanks for the raspbootin link, that is just in time, because this is one of the very next ones I will tackle. Thanks!

Regarding Discord:
There is the embedded WG's Matrix channel at #rust-embedded:matrix.org which would be the logical channel to get in contact. Let me know if that works for you.
I usually cannot respond during my daytime though.

One issue where I'd be happy to get input is the current folder structure. I am not happy with the current state. It is way better than the organically grown v1, bit it still features rather poor composability. I am tackling this early now because I really want to encourage people to contribute other BSPs in the future and it should be without overhead. Raspberry Pi 4 (I don't have one yet) is the logical first addition.

Right now, architecture code and drivers hide in the rpi3 bsp folder. Adding an rpi4 folder given the current structure would need to copy most of the code. Doesn't make sense.
I just uploaded a first shot a restructuring that I had laying around on my harddrive for some days now at: https://github.com/rust-embedded/rust-raspi3-OS-tutorials/tree/folder_restruct/06_drivers_gpio_uart

  • Architecture code is separated into its own folder now.
  • There is an rpi_common folder now which aims to include commonalities between rpi3 and rpi4, which are pulled in by the respective BSPs then, e.g. the drivers.
    • I am still pondering if device drivers should be outside the BSP folders as well though. I tend to say yes. Would need to find a folder structure then where BSPs can easily pull them in and instantiate them. The actual BSP folders would be rather minimalistic afterwards. Pulling in drivers that are featured on the board, instantiating them with the correct base MMIO addresses and/or any other BSP-specific parameters and exporting some bsp calls. Question is if rpi_common is needed anymore when drivers live outside the BSP folder.

I'll continue to experiment with it, but feel free to give your thoughts.

@Knight-Ops
Copy link
Author

Knight-Ops commented Oct 10, 2019 via email

@andre-richter
Copy link
Member

The trait for drivers is something that will definitely happen. I envision it something like follows:

For example, there is a trait NetworkDriver that would be implemented by the respective driver. The trait brings with it facilities to register itself with a global NetworkManager that is provided by the kernel, from where on it will be managed.

My primary concern is really duplication of code. I would very much like to avoid that. How would you tackle that rpi3 and rpi4 BSPs would have LOTS of identical driver code?
Also, I guess there is possibly a bunch of common peripherals that one might find one various boards. Ethernet or wifi chips for example (not that we will have drivers for those anytime soon, hehe...).

So my idea was to have driver folder featuring, for example:

  • BCM2xxxGPIO
  • BCM2xxxMiniUart

And the rpi3 and rpi4 bsp top level file would pull them in, and instantiate them with the BSP-specific parameters (e.g. MMIO address).
Not sure if would be practical, though. I'll try it locally and see if it is nice to use. I don't like to raspi_common approach very much, it doesn't look very scalable. It would break as soon as there would be a common peripheral on a non-raspi-family board.

@andre-richter
Copy link
Member

Update:
I carved out device driver code into bsp/driver and now just instantiate compatible drivers in the BSP. I liked it enough to go for the change. I think it reduces duplication and encourages code reuse between BSPs.

@Knight-Ops
Copy link
Author

I haven't gotten a chance to mess with the changes, but it looks good, my only suggestion would be maybe to add subfolders on BSP/drivers so that all the bcm drivers are together and you branched into another board/CPU you don't have a ton of files just flat on that driver dir.

@andre-richter
Copy link
Member

Good point. I added that 👍

@andre-richter
Copy link
Member

andre-richter commented Oct 13, 2019

@r0ck3tAKATrashPanda Spent some time today on a pure Rust raspbootin. Ran into the same problem as you with the compiler generating a relative jump (your weird if).

I think I found a really neat solution that doesn't even require transmute. One can take advantage of the fact that vtables always store the absolute addresses (linktime address), so by indirecting through a trait object, you can elegantly jump to the relocated code.

https://github.com/rust-embedded/rust-raspi3-OS-tutorials/blob/chainloader_wip/07_uart_chainloader/src/runtime_init.rs#L7

https://github.com/rust-embedded/rust-raspi3-OS-tutorials/blob/chainloader_wip/07_uart_chainloader/src/bsp/rpi3.rs#L31

I'll upload it on the rewrite branch as soon as I've cleaned up stuff and finished the code.

@Knight-Ops
Copy link
Author

Knight-Ops commented Oct 13, 2019 via email

@andre-richter
Copy link
Member

I agree its a bit heavy in contrast to just pulling in a tiny piece of assembly, but we're here for the challenge, aren't we? 😎
But there's a huge plus to this as well: It is completely type-safe since the transmute is out of the picture. And portable. I will check if its feasible to even put it outside of BSP code.

Thanks for the mailbox code. Will definitely look at it when Mailbox is due. I haven't yet decided when to bring it into the picture. Most likely alongside introducing virtual memory / MMU and heap allocation. It can then serve as a lesson for setting the correct page table attributes (non-cacheable) for mailbox memory.

@andre-richter
Copy link
Member

Struggling a bit to get the chainloader working with the MiniUart. Apparently I get RX overruns on. Bummer... 😩

@Knight-Ops
Copy link
Author

Knight-Ops commented Oct 16, 2019 via email

@andre-richter
Copy link
Member

So it turns out that I was fooled by a spurious RX byte...
It appears somewhere between UART init and requesting the payload, then gets consumed when reading the size on the RPi, leading to the RPi expecting way more bytes then there are. It would then not kick itself out of the receive loop, what at first sight looked like bytes get lost due to overrun.

Adding a FIFO clear just before starting with the raspbootin protocol does the trick (its anyways more robust this way, one should always do it). It works fine now with the MiniUart.

@andre-richter
Copy link
Member

And by the way, I really really like the idea of having the host program in an interpreted language and not the heavy C++ that is currently used. Ideally in Ruby for the tuts though, don't too many different languages mixed in there.
It would heavily depend on having functionality for switching into a terminal-mode after sending the kernel, like the current raspbootcom that is used does.
So that the user can seamlessly start interacting with the chainloaded binary.

@Knight-Ops
Copy link
Author

Knight-Ops commented Oct 16, 2019 via email

@Knight-Ops
Copy link
Author

So I worked on the python script a little : https://github.com/r0ck3tAKATrashPanda/raspi-os/blob/master/raspbootcomm.py now it actually does like you suggested where it drops you into a miniterm immediately after uploading the kernel, it is super nice. If you want to port it to Ruby that would make the loading/interacting much simpler I think.

Additionally, I started the very very beginning of the DWC USB 2.0 driver: https://github.com/r0ck3tAKATrashPanda/raspi-os/blob/master/src/bsp/driver/dwc_usb_2_0_hs_otg.rs I am not sure how far I will get without dynamic memory, so I may end up stalling in the middle in order to add an allocator before going back to it, but this should have most of the mindless registers done. There is so much code required for this, I don't even know if it has any place in the tutorials at all, no time soon, that is for sure.

@andre-richter
Copy link
Member

Will definitely have a look at your python script, thanks!
I just uploaded the chainloader chapter, if you find some time, kindly let me know if it works for you.

re drivers: Yeah, probably stuff for the later tutorials. But it is good to have you laying the groundwork and making first experiences. Once the tuts get there in the distant future, we can surely reuse and learn from your experience.

@andre-richter
Copy link
Member

@r0ck3tAKATrashPanda, I added support for the Raspberry Pi 4.

The MiniUart just wouldn't function. Some dynamic frequency change stuff was going on that I just could not get rid of. Googling indicated it might be an issue with firmware in early batches, but I wasn't happy to boot Raspbian on it to update it (it is on-chip now, not fetched from SD card anymore).

Threw it out for good now. The Videocore sets the PL011 UART frequency now before kernel load thanks to config.txt, so we can use it without Mailbox.

@Knight-Ops
Copy link
Author

The only downside of not utilizing MiniUart that I could find is that it will make using bluetooth harder. I don't know if that is ever in the plan for the tutorial, but my understanding is that the PL011 UART is connected to the Bluetooth chip and is the only method to communicate with it on the board. This was just from piecing together a few sources on how the UARTs were split, so feel free to correct that if it is wrong.

@andre-richter
Copy link
Member

True, I also read that you can attach the MiniUart to BT though, which then only yields reduced bandwidth compared to PL011.
It's okay for what I have in mind. MiniUart showed too many quirks, and the tutorials will anyways be very CPU and memory centric for quite a while.

@andre-richter
Copy link
Member

Closing this for now. Feel free to reopen whenever new questions arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants