Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about aarch64 memory access #10

Closed
binaryfields opened this issue Jan 21, 2019 · 4 comments
Closed

Question about aarch64 memory access #10

binaryfields opened this issue Jan 21, 2019 · 4 comments

Comments

@binaryfields
Copy link

binaryfields commented Jan 21, 2019

I am running into an interesting issue where a slightly modified version of 05_uart0 example keeps hanging on my rpi3 hardware.

I modified the example and for clarify moved code to get the board serial number into its own function. The function takes a slice (&mut [u32]) to fill in the part of mbox buffer that represents request data and uses it to collect response output.

The issue I'm running into is around how my slice gets copied into the mbox buffer.

If I simply do a loop and iterate over slice values, it works fine.

    for i in 0..data.len() {
       mbox.buffer[5 + i] = data[i];
    }

However, if I use rust's copy_from_slice function, the program hangs

mbox.buffer[5..7].copy_from_slice(&data[0..2]);

This feels to me like some unaligned access issue but I'm not sure how to further debug it. Is there anything I'm doing wrong when trying to use copy_from_slice?

Thanks
-Seb

Here is the complete modified source code for main.rs.

#![no_std]
#![no_main]
#![feature(asm)]

const MMIO_BASE: u32 = 0x3F00_0000;

mod gpio;
mod mbox;
mod uart;

use core::sync::atomic::{compiler_fence, Ordering};

fn kernel_entry() -> ! {
    let mut mbox = mbox::Mbox::new();
    let uart = uart::Uart::new();

    // set up serial console
    match uart.init(&mut mbox) {
        Ok(_) => uart.puts("\n[0] UART is live!\n"),
        Err(_) => loop {
            unsafe { asm!("wfe" :::: "volatile") }; // If UART fails, abort early
        },
    }

    uart.puts("[1] Press a key to continue booting... ");
    uart.getc();
    uart.puts("Greetings fellow Rustacean!\n");

    let mut data = [0, 0];
    let serial_avail = get_serial(&mut mbox, &mut data[0..2]);
    if serial_avail {
        uart.puts("[i] My serial number is: 0x");
        uart.hex(data[1]);
        uart.hex(data[0]);
        uart.puts("\n");
    } else {
        uart.puts("[i] Unable to query serial!\n");
    }

    // echo everything back
    loop {
        uart.send(uart.getc());
    }
}

#[inline(never)]
fn get_serial(mbox: &mut mbox::Mbox, data: &mut [u32]) -> bool {
    // get the board's unique serial number with a mailbox call
    mbox.buffer[0] = 8 * 4; // length of the message
    mbox.buffer[1] = mbox::REQUEST; // this is a request message
    mbox.buffer[2] = mbox::tag::GETSERIAL; // get serial number command
    mbox.buffer[3] = 8; // buffer size
    mbox.buffer[4] = 8;
    mbox.buffer[5..7].copy_from_slice(&data[0..2]);
    //for i in 0..data.len() {
      //  mbox.buffer[5 + i] = data[i];
    //}
    //mbox.buffer[5] = data[0]; // clear output buffer
    //mbox.buffer[6] = data[1];
    mbox.buffer[7] = mbox::tag::LAST;

    // Insert a compiler fence that ensures that all stores to the
    // mbox buffer are finished before the GPU is signaled (which is
    // done by a store operation as well).
    compiler_fence(Ordering::Release);

    // send the message to the GPU and receive answer
    match mbox.call(mbox::channel::PROP) {
        Err(_) => false,
        Ok(()) => {
            //data[0..2].copy_from_slice(&mbox.buffer[5..7]);
            for i in 0..data.len() {
                data[i] = mbox.buffer[5 + i];
            }
            //data[0] = mbox.buffer[5];
            //data[1] = mbox.buffer[6];
            true
        },
    }
}
raspi3_boot::entry!(kernel_entry);
@binaryfields
Copy link
Author

The asm output for the 1st variant (loop over values) generates 32-bit load/stores

   801d8:   ldr   w8, [x1]
   801dc:   movk  w21, #16128, lsl #16
   801e0:   str   w8, [x0, #20]
   801e4:   ldr   w8, [x1, #4]
   801e8:   stp   w8, wzr, [x0, #24]

The second variant (copy_from_slice) seems to use one 64-bit load/store

   801d8:   ldr   x8, [x1]
   801dc:   movk  w21, #16128, lsl #16
   801e0:   str   wzr, [x0, #28]
   801e4:   stur  x8, [x0, #20]

I'm not sure why one would work and not the other.

@andre-richter
Copy link
Member

andre-richter commented Jan 21, 2019

Hi,

the cluprit is 801e4: stur x8, [x0, #20], resulting from mbox.buffer[5..7].copy_from_slice(&data[0..2]);, which is indeed an unaligned data access. It tries to store a 64bit word to a non-64bit-aligned address (20%8 != 0).

This causes a synchronous exception in AArch64, that's why your RPi3 hardware just halts.
If you port your code to chapter 0E and insert it after vector tables have been set up, you would get prints from the exception handler pointing you directly to this instruction address. You need to disable the MMU setup though (I opened #11 to remind myself to investigate this).

@binaryfields
Copy link
Author

Ah, thank you for the pointer and the explanation. That makes perfect sense now. Exceptions are next on my list so I guess now will be as good of a time as ever to jump into them.

Big thanks for the work on this repo and other embedded crates. They made the experience of porting my C64 emulator to bare-metal RPI3 quite enjoyable!

Cheers,

@andre-richter
Copy link
Member

Glad to hear, thanks 👍

I'll go ahead and close this, and try to get +strict-align upstream so this doesn't happen anymore.

bors added a commit to rust-lang/rust that referenced this issue Feb 2, 2019
targets: aarch64-unknown-none: Add +strict-align

On AArch64, an unaligned access causes a synchronous exception. In the current
state of the target, the compiler might generate unaligned accesses, see
rust-embedded/rust-raspberrypi-OS-tutorials#10.

Since this is a bare-metal target, it is possible that there is no exception
handling in place (yet) to recover from this case, causing a binary to just
silently fail.

Add `+strict-align` to avoid this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants