[Rp2040 datasheet](https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf) is an excellent source for understand power-on reset sequence and boot sequence (see respective sections). You can read the notes below and then checkout the datasheet. 

# Basic firmware execution
 + When power is turned on, main voltage rail and main PLL turn on. The power-on-reset hardware might wait until the voltage ramps up to stable operational voltage and then supply voltage, clock to components required for boot operation to work, such as the AXI fabric, ROM, internal SRAM etc and reset them
     - Note that, for *any* C code to run, it needs a system stack. This means that some kind of RAM needs to have its voltage, clock supplied and reset.
     - It is possible that, if the boot code is written in assembly language, it may work without a stack 
 + In ARM, the core's hardware state machine, right after reset, copies the contents of address 0x0 to the stack pointer and 0x4 (for 32-bit ARM) to PC.
     - The fabric is up at this point and its address map tells it which slave has addresses 0x0 and 0x4
     - MMU, Cache, NVIC (basically all interrupts) may be completely disabled at this point. 
     - In some processors, depending upon the values of boot pins different blocks of memory is aliased to a small block starting at 0x0. This will enable booting from ROM or internal flash or RAM
     - Typically there is a VTOR (Vector Table Offset Regsiter) that the hw uses to know where the interrupt vector table is. The POR value of VTOR is 0x0.- 
 + The address of the function Reset_Handler() is what is stored at 0x4. So PC starts executing Reset_Handler(). It in turn calls SystemInit()
 + SystemInit() configures, enables interrupts, configures, enables MMU, invalidates and enable caches, configures PLLs, configures registers of peripheral controllers etc. 
 + Eventually the Reset_Handler() checks the integrity of the user firmware in flash, change the VTOR to where the IVT of the user code is located and jump to main_ without the possibility of returning (This is the main() function in the flash).
 
One can think of many variations to this sequence. For instance, the bootcode may be of two stages, where the job of the first stage is to simply copy the second stage into the SRAM or TCM and change the VTOR to that memory and call the second stage in that memory. This is a way of speeding up booting as reading from ROM may be slower than reading from an SRAM. The first stage may even verify the integrity of the second stage before executing it.

There is also the possibility that the boot code is divided into two parts, first stage given by chip manufacturer that is universal to that IC and second stage given by the OEM that makes boards with that IC, that is applicable only to that board. The second stage code may or many not be in the ROM. For convenience, the second stage could be in the external EEPROM, flash/SD card etc. The first stage would then verify configure/enable chip functionalities (hardware regs of modules, NVIC, MMU etc.), configure enable interface to the external flash, do checksum on the second stage and then start the second stage. In PCs, the second stage might be in an EEPROM, which will read board specific settings to know how much DRAM is in that board, what peripherals are available etc., do necessary inits. This second stage is called BIOS in computers. UEFI is the replacement for legacy PC BIOS

Typically, the boot code first checks some GPIO pins to see if the user wants to download a firmware or if it needs to proceed to the firmware already present in the flash. If it needs to download the firmware, it will have enable the interface used for downloading (say, UART), do checksum checks and then proceed with the regular booting sequence. 

Before switching program sequence to execute from the firmware, the boot code will set up the MMU/MPU with a page table such that different regions in the firmware have appropriate read/write access controls (and more).  Below are the various section in the firmware for reference.

<img src = "fwcodememlayout.png" width=200 />

 + .text contains code
 + .ro contains const and strings (read only). Sometimes .ro is mentioned as part of the .text
 + .data contains globals, static variables that have an initial value
 + .bss contains uninitialised globals, static variables
 + Stack grows downwards from the top
 + Free RAM area is used for heap and grows upwards from the end of the .bss section

The .data, .bss sections need to be in RAM besides the stack. If the firmware is a bin file, then firmware would have to know the address map of the system, and would have an init code that would copy initialize the stack pointer register in CPU, copy .data to appropriate RAM location, init .bss in its RAM location and then run the main code. If the firmware is an ELF file, then the ELF file will have data about where in the file the .data, .bss setctions are and where in RAM they need to be located (The linker needs to know the address map). Then the bootcode can use the linker file script to understand what to do with the .data, .bss - And once they are done with what needs to be done, they will call the main function.

But this means the MPU/MMU has to be set up so that virtual address corresponding to these areas are converted to physical addresses in RAM. The boot code might initialise .bss with zeros. The .rodata may or may not be copied to RAM. If it is copied to RAM (for the sake of faster access), the MPU/MMU has to be told not to allow write access to this region. 

Some info about how to display thse sections at the end of code compilation and some other interesting stuff pertaining to cortex M0 can be found [here](https://mcuoneclipse.com/2013/04/14/text-data-and-bss-code-and-data-size-explained/)

## Power on self test (POST)
POST is performed by BIOS to check hardware integrity. It detects if RAM, certain peripherals are present, what is the RAM size (The user may add remove RAM, daughter cars, hard disk etc. from startup to startup), provide a UI to the user for changing boot priority etc. It may also check if the processor fan is working fine and the temperature isn't too high. It may check a collection of errors and throw what is called beepcodes. Typically these result from failure of tests such as flash bootloader (third stage of boot) parity error, RAM memory corruption in first 64KB etc. 

POST shouldn't be confused with BIST (Built-in Self Test), which is a testing facility in built in ICs. BIST is mainly used by machines to check if a given part is good or bad. 

# Program execution with OS
This is similar to what was mentioned in the previous section, where instead of the boot code switching execution to firmware, it will switch execution to the OS. OS will start executing in privileged mode or handler mode. 

If the OS itself is an RTOS, the tasks themselves wouldn't have 

If the OS is a general purpose OS (GPOS) like linux or windows, then it will most likely run on a CPU that has an MMU. And it will also likely load the entire firmware into RAM (.text included) and use MMU to enforce where .bss, .data are located, read/write permissions etc. The compiler, linker of the firmware can assume that it will be executing from RAM with a continguous 4GB space (If the PC,SP are 32-bit registers) all for itself and simply locate .bss, .data along with .text in that contiguous memory. Things actually become quite simple in a GPOS due to MMU. 

## Context switching
The root of OS's operations are interrupts. The OS sets up a counter (in ARM, sysTick timer) that periodically triggers an ISR (let us call it tick ISR). The CPU runs in handler mode whenever this interrupt is triggered. But if a return from interrupt code is executed (*reti* in ARM), the CPU switches to unprivileged mode. 

In ARM processors, sysTick counter is used for the tick ISR. Let us say there are two tasks A and B, and A is currently running. When the tick ISR is triggered, the hardware automatically pushes tasks A's PC into task A's stack, and then enters the ISR. The ISR has compiler attributes *signal*, *naked*, indicating to the compiler that it is an ISR (and not a regular function), and that doesn't use any stack for its operations, and that it will push processor registers (see next paragraph) by itself (If there is only the signal atrribute, then the compiler itself will generate the context saving operation). It looks like this:

void tick_ISR( void ) __attribute__ ( ( signal, naked ) );


In the ISR, the OS pushes A's context, i.e., processor registers, with the exception of the stack pointer, into A's stack. The OS then copies A's stack pointer into a struct (or basically a memory block) managed by the OS (called Process Control Block or Thread Control Block). Then the OS copies task B's stack pointer value (that it copied the last time it switched out of B) into the stack pointer register, pops processor context from B's stack into the processor's registers, and execute a return from interrupt instruction (*reti* in ARM) which will pop the PC from B's stack into the PC register and start executing task B.

Apart from the tick ISR, there are other interrupts used to get out of the user mode into the OS handler mode. For instance, a user program may be blocked due to the non avilability of a resource and it may want to invoke the OS (so it can switch to another task). Or a user program may need to use a service that OS provides (like opening/closing files, writing to hardware registers using kernel drivers etc.). In these cases, the user code would trigger a SWI (software interrupt/exception), which is called "triggering a system call. This will execute its corresponding ISR (and CPU will automatically go into handler mode), where the OS code with do what is needed. 

Typically, the OS keeps a struct for each task that contains information about the task including (but no limited to), its current state (running, blocked, ready etc.), the semaphores, queues etc. that it might be using to communicate with other tasks, its priority etc. Whenever a task is initiated (in freeRTOS xTaskCreate... system call), the OS code would dynamically allocate memory to be used as that task's stack (one of xTaskCreate's parameters). 

# References
1. [How to boot a Cortex M7 system](https://developer.arm.com/documentation/ka001193/latest)
2. [Booting a bare metal system](https://developer.arm.com/documentation/den0013/d/Boot-Code/Booting-a-bare-metal-system)
3. [C Compiler](https://www.ele.uva.es/~jesus/hardware_empotrado/Compiler.pdf) 
4. [Bare metal C part 1] (https://interrupt.memfault.com/blog/zero-to-main-1)
5. [Bare metal C part 2] (https://interrupt.memfault.com/blog/how-to-write-linker-scripts-for-firmware)
6. [Multi-threading vs. Multi-processing. Aka General purpose OS vs. RTOS](https://www.digikey.in/en/maker/projects/what-is-a-realtime-operating-system-rtos/28d8087f53844decafa5000d89608016)
7. [GPOS and MMU](https://blogs.sw.siemens.com/embedded-software/2019/09/16/do-you-need-a-memory-management-unit/)
8. [Context switching in RTOS (freeRTOS)](https://www.freertos.org/implementation/a00018.html)
9. [Linker scripts](https://www.ele.uva.es/~jesus/hardware_empotrado/Compiler.pdf)

