Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniLoad fails on latest PI 4 8G #79

Closed
mikegil opened this issue Oct 28, 2020 · 18 comments
Closed

MiniLoad fails on latest PI 4 8G #79

mikegil opened this issue Oct 28, 2020 · 18 comments

Comments

@mikegil
Copy link

mikegil commented Oct 28, 2020

I have thoroughly enjoyed your tutorials, but I couldn't get the MiniLoad to work over the UART correctly. I spent a while debugging (and learned a ton in the process), and tracked the issue down to this section:

unsafe {
    // Read the kernel byte by byte.
    for i in 0..size {
        *kernel_addr.offset(i as isize) = console().read_char() as u8;
        // console().read_char() as u8;
    }
}

(from main.rs)

If I comment out the memory assignment and just throw away each byte as I read it (the commented line) then all is well, except that obviously it doesn't actually boot the new kernel. If I leave the code as shown however, the PI 4 will hang (it will never complete this loop) and I get no information back on why. I am assuming that it is some kind of memory protection issue and perhaps it is new to the latest PI 4 version? Anyway, I don't know how to get around this and I'm hoping some brighter spark out there might. If it matters, I am cross-compiling from a Mac running Mac OS X 10.15.7 and using rustc 1.49.0-nightly (25c8c53dd 2020-10-03)

@andre-richter
Copy link
Member

Hmm, interesting.

We‘ve had it a couple of times already that newer PI versions and/or their firmware versions require some adaptations somewhere, e.g. config.txt. I think you’re the first one reporting issues with an 8 GiB version. I have a 4 GiB.

Did you double check the config.txt already?

Does it work to put one of the later tutorials directly on the SD card and boot it? Is it correctly echoing back the characters as you type them, or can you see some corruption?

@mikegil
Copy link
Author

mikegil commented Oct 28, 2020

Yep, checked config.txt and it looked good:

arm_64bit=1
init_uart_clock=48000000

I tried disabling the splash to see if I'd get an informative message (I didn't), and explicitly setting the initial baud rate to 230400 which didn't help (I didn't expect it to). I've tried copying each of the kernels from exercise 08 and 09 over via the SD card and they run as expected. That's when I added some code to Ex. 07's kernel to reflect back the file size it's expecting (it was correct), read each of the chars throwing them away and then send back 'OK' before continuing...all of that works and the OK is received back on my host computer.

I tried reflecting a '.' for each char it reads and letting it execute:

*kernel_addr.offset(i as isize) = console().read_char() as u8;

as per the original code and I do get a number of them back on the host before it all hangs, but nowhere near the 12,144 I'm expecting so I am assuming we have some weird memory violation going on, but in fact that may not be the case as the PI may be sending all of the right info an crashing before the UART has actually sent it. When I get back to it this evening I will see if I can put in some kind of delay between the memory loading loop and the attempt to call the loaded kernel to see if I get the [ML] Loaded! message in that case. Any debugging ideas you have will be warmly accepted ;-)

@andre-richter
Copy link
Member

Hmm, strange.

For starters, can you try with the following:

diff --git a/07_uart_chainloader/src/main.rs b/07_uart_chainloader/src/main.rs
index 808e5c7..416af27 100644
--- a/07_uart_chainloader/src/main.rs
+++ b/07_uart_chainloader/src/main.rs
@@ -122,6 +122,8 @@ mod relocate;
 mod runtime_init;
 mod synchronization;
 
+use cortex_a::barrier;
+
 /// Early init code.
 ///
 /// # Safety
@@ -191,6 +193,11 @@ fn kernel_main() -> ! {
     // Use black magic to get a function pointer.
     let kernel: extern "C" fn() -> ! = unsafe { core::mem::transmute(kernel_addr as *const ()) };
 
+    unsafe {
+        barrier::dsb(barrier::SY);
+        barrier::isb(barrier::SY);
+    }
+
     // Jump to loaded kernel!
     kernel()
 }

If it doesn't help, you could try to incorporate the additions to Tutorial 12 (exceptions) into tutorial 7, so that you would see an exception dump when a CPU exception happens.
It would be important that you only port the exception stuff, none of the earlier additions like virtual memory. We would want no active MMU. But that exercise would be quite a stretch...

@mikegil
Copy link
Author

mikegil commented Oct 28, 2020

Quick update...I will try the above when I get a moment, but I don't think that will help...adding a delay after the load loop did not help (even if the delay was effectively infinite - waiting on input) so I tried loading smaller kernels and bingo, the problem is related to the size of the kernel. Kernel's from 01 and 02 work (8 bytes and 320 bytes), but the Kernel from 03 (5,431 bytes) fails before the memory load completes. I have changed the packet size from the 512 byte slices to 16, 1024, 4096 and even everything in one go without success. I am certain that the actual TX works, because if I comment out the memory load and just read the chars I get the right number of characters, so the only thing I can think of next is that it is violating some memory access restriction when it hits a particular address (seems weird but hey?). Alas I am no Embedded expert. I will try your hint above anyway and see what happens and then I will try adding the exception stuff - I might learn something either way ;-).

To be clear: above I am referring to using Kernel 07 to transmit Kernel 01, 02, and 03 to the target...01 & 02 work as you would expect...load happens then they do 'nothing' as they are supposed to, 03 never finishes the load

@mikegil
Copy link
Author

mikegil commented Oct 28, 2020

I did get 5 minutes to try the barrier concept...alas no joy. I believe that it is crashing during the memory load loop so at least according to my theory it would never hit these statements anyway. I'll update if I get any further in my investigations.

@mikegil
Copy link
Author

mikegil commented Oct 28, 2020

Random theory: I believe this code loads at the standard Kernel load address and then relocates itself elsewhere so we can load the new kernel in the same spot. It's behaving like it hasn't jumped the execution to the relocated code and hence is overwriting its own code as it loads the new kernel. Is there an easy way to print the current execution address to the console (before we overwrite with the new kernel)? If so I can prove/disprove this idea.

@andre-richter
Copy link
Member

There is some "good news":

It actually reproduces on my Raspberrys as well. It must have been one of the recent maintenance changes I did, or some code size bloat due to compiler version bump.

Anyways, I can debug this now as well. But it might take a bit of time.

@andre-richter
Copy link
Member

andre-richter commented Oct 28, 2020

You were right, the jump to the relocated binary did not happen.
It seems the compiler got smarter and did not fall for my trait-object mumbo jumbo anymore when jumping to the relocated binary, and I failed to notice.
That would also explain why it still worked for very small kernels. They would not overwrite deep enough to clash with the currently executing code.

I uploaded a fix that hopefully unblocks you for now, together with some other clean-ups.
Using discrete pointer arithmetic now will make it a bit more obvious what's going on anyways. I might clean it up a bit more in the future, but this should do for now.

Thanks again for reporting this!

@mikegil
Copy link
Author

mikegil commented Oct 28, 2020

That works....thanks!

@abdes
Copy link

abdes commented Oct 28, 2020

The fix works

@abdes
Copy link

abdes commented Oct 29, 2020

...but the latest commit eb69b6f(More chainloader cleanup) broke it again for rpi4.

@andre-richter
Copy link
Member

That's odd. I tested it on my Pi4 and it works.

Can you give me your rustc -V, and maybe you can attach an objdump for me to look at, but please change the objdump arguments from --disassemble to --disassemble-all beforehand.

Thanks.

@abdes
Copy link

abdes commented Oct 29, 2020

[] ~/Projects/rust-raspberrypi-OS-tutorials/07_uart_chainloader <master> ✗ rustc -V
rustc 1.49.0-nightly (ffa2e7ae8 2020-10-24)

objdump.txt

Built from master.

[] ~/Projects/rust-raspberrypi-OS-tutorials/07_uart_chainloader <master> ✗ BSP=rpi4 make                      
[] ~/Projects/rust-raspberrypi-OS-tutorials/07_uart_chainloader <master> ✗ BSP=rpi4 make objdump              
[] ~/Projects/rust-raspberrypi-OS-tutorials/07_uart_chainloader <master> ✗ cp kernel8.img /media/abdessattar/RPI4BOOT
[] ~/Projects/rust-raspberrypi-OS-tutorials/07_uart_chainloader <master> ✗ BSP=rpi4 make chainboot 
Minipush 1.0


[MP] ✅ Connected

[MP] ⚡ Protocol Error: Remove and insert the USB serial again

Unplug -> Replug

[MP] ⏳ Waiting for /dev/ttyUSB0
[MP] ✅ Connected

[MP] ⚡ Protocol Error: Remove and insert the USB serial again

The board:
IMG_1441

@mikegil
Copy link
Author

mikegil commented Oct 29, 2020 via email

@andre-richter
Copy link
Member

@abdes, two things:

  1. From your objdump, I see you are using aarch64-unknown-none and not aarch64-unknown-none-softfloat. That is a ticking time bomb, because it could hang your kernel anytime the compiler stumbles over something that it can auto-vectorize. The current kernel does not enable the features needed to run the emitted vectorized code, which would cause a synchronous exception that manifests as a silent hang of the kernel.
  2. Can you check if 19763f8 fixed your problem?

@abdes
Copy link

abdes commented Oct 29, 2020

Works now, from master, with aarch64-unknown-none-softfloat.

$ BSP=rpi4 make chainboot
Minipush 1.0


[MP] ✅ Connected

[MP] ⚡ Connection Error: Reinsert the USB serial again
[MP] ⏳ Waiting for /dev/ttyUSB0
[MP] ✅ Connected
 __  __ _      _ _                 _ 
|  \/  (_)_ _ (_) |   ___  __ _ __| |
| |\/| | | ' \| | |__/ _ \/ _` / _` |
|_|  |_|_|_||_|_|____\___/\__,_\__,_|

           Raspberry Pi 4            

[ML] Requesting binary
[MP] ⏩ Pushing 6 KiB ==========================================🦀 100% 0 KiB/s Time: 00:00:00
[ML] Loaded! Executing the payload now

[0] Booting on: Raspberry Pi 4
[1] Drivers loaded:
      1. BCM GPIO
      2. BCM PL011 UART
[2] Chars written: 93
[3] Echoing input now

Thanks for your help. I was not really aware of the full impact of not using -softfloat.

@abdes
Copy link

abdes commented Oct 29, 2020

Btw, the main problem was softfloat but I found the changes you made in 19763f8 are also good; make things much simpler for the linker layout.

@andre-richter
Copy link
Member

@abdes:

Btw, the main problem was softfloat

Right, upon further looking into the objdump, I see a couple of lines that use floating-point registers, like here in memcpy (q0):

 2001a5c:      	ldr	q0, [sp, #0x10]

I haven't checked, but I'm, quite sure that this will cause a trap, because the processor is in EL2 state and most likely, floating point has not been enabled in CPTR_EL2.FPEN by the previous firmware.

memcpy is used by the printing functions, so this is why the kernel will hang before seeing any character printed.
The memcpy provided by softfloat won't utilize the FP registers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants