Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SHA256): Add initial hardware acceleration impl for SHA256 #19

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AnthonyGrondin
Copy link
Contributor

Description

This adds an initial support to use the hardware accelerated drivers for sha256 operations.
It is currently in a PoC phase.

It is my first time writing a Rust implementation for FFI bindings, so feel free to provide as much feedback as needed.

Current issue.

I am currently stuck on whether or not this implementation is safe and sound, because it successfully passes the self_tests, but it will fail to run the examples in a real_case scenario, triggering Store/AMO access fault errors and such.

Building / Bindings

You can re-compile mbedtls and generate new bindings using the xtask to work with the sha256_alt..

It's not obligatory since I've edited mbedtls and included the compiled libraries in this PR to provide a temporary fix for #13, where it would not print if a test has passed or failed

Testing

  1. You first need to patch esp-wifi to be compatible with the breaking changes from Implement embedded_hal_async::delay::DelayUs trait for SYSTIMER alarms esp-hal#812
    Here's a patch that works for me, for a quick copy paste with the current origin:
diff --git a/esp-wifi/src/timer_esp32c3.rs b/esp-wifi/src/timer_esp32c3.rs
index bd05624..3efb8b0 100644
--- a/esp-wifi/src/timer_esp32c3.rs
+++ b/esp-wifi/src/timer_esp32c3.rs
@@ -20,9 +20,9 @@ static ALARM0: Mutex<RefCell<Option<Alarm<Periodic, 0>>>> = Mutex::new(RefCell::
 
 pub fn setup_timer_isr(systimer: Alarm<Target, 0>) {
     let alarm0 = systimer.into_periodic();
-    alarm0.set_period(TIMER_DELAY.into());
-    alarm0.clear_interrupt();
-    alarm0.interrupt_enable(true);
+    alarm0.set_period(TIMER_DELAY.into_duration());
+    alarm0.interrupt_clear();
+    alarm0.enable_interrupt(true);
 
     critical_section::with(|cs| ALARM0.borrow_ref_mut(cs).replace(alarm0));
 
@@ -160,7 +160,7 @@ fn BT_BB(_trap_frame: &mut TrapFrame) {
 fn SYSTIMER_TARGET0(trap_frame: &mut TrapFrame) {
     // clear the systimer intr
     critical_section::with(|cs| {
-        unwrap!(ALARM0.borrow_ref_mut(cs).as_mut()).clear_interrupt();
+        unwrap!(ALARM0.borrow_ref_mut(cs).as_mut()).interrupt_clear();
     });
 
     task_switch(trap_frame);
@@ -179,8 +179,8 @@ fn FROM_CPU_INTR3(trap_frame: &mut TrapFrame) {
         let mut alarm0 = ALARM0.borrow_ref_mut(cs);
         let alarm0 = unwrap!(alarm0.as_mut());
 
-        alarm0.set_period(TIMER_DELAY.into());
-        alarm0.clear_interrupt();
+        alarm0.set_period(TIMER_DELAY.into_duration());
+        alarm0.interrupt_clear();
     });
 
     task_switch(trap_frame);
  1. I've added the crypto_self_test example to test current and future crypto acceleration implementations
    How to run (esp32c3):
    cargo run --release --example crypto_self_test --target riscv32imc-unknown-none-elf -F esp32c3,esp-wifi/wifi-logs

- Currently, esp-wifi needs to be patched to work with this hal revision
@bjoernQ
Copy link
Collaborator

bjoernQ commented Oct 9, 2023

Are the exceptions raised in copy_nonoverlapping? Maybe the alignment doesn't match when running the example but they happen to fit in the test

@AnthonyGrondin
Copy link
Contributor Author

This is a chain of unsafe operations, and I think many of them are causing the issue.
It first panic at:

let hasher = Sha::new(&mut (*ctx).peripheral, mode);
when it borrows the peripheral from the context to create a sha.

If I replace this by a static variable that contains SHA from peripheral, instead of stealing it in _init(), it doesn't overflow there anymore.

It then overflows where I try to save the hasher in the context, at this line:

(*ctx).hasher = hasher;

@bjoernQ
Copy link
Collaborator

bjoernQ commented Oct 9, 2023

I had a look and I think I partly know what is going wrong:

typedef struct mbedtls_sha256_context {
  void* peripheral;
  void* hasher;
} mbedtls_sha256_context;

is not the same as

#[repr(C)]
pub struct mbedtls_sha256_context<'a> {
    peripheral: SHA,
    hasher: Sha<'a>,
}

On the C side there are just two pointers while on the Rust side there is a ZST and something which is definitely not ZST.

I tried something and came up with

use crate::hal::peripherals::SHA;
use crate::hal::sha::Sha;
use crate::hal::sha::ShaMode;
use esp_mbedtls_sys::c_types::*;
use nb::block;

#[repr(C)]
pub struct mbedtls_sha256_context<'a> {
    peripheral: *mut SHA,
    hasher: *mut Sha<'a>,
}

#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_init(ctx: *mut mbedtls_sha256_context) {
    let hasher_mem = crate::calloc(1, core::mem::size_of::<Sha<'_>>() as u32) as *mut Sha<'_>;
    (*ctx).hasher = hasher_mem;
}

#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_free(ctx: *mut mbedtls_sha256_context) {
    if !ctx.is_null() && !(*ctx).hasher.is_null() {
        crate::free((*ctx).hasher as *const u8);
        (*ctx).hasher = core::ptr::null_mut();
    }
}

#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_clone<'a>(
    dst: *mut mbedtls_sha256_context<'a>,
    src: *const mbedtls_sha256_context<'a>,
) {
    core::ptr::copy_nonoverlapping(src, dst, 1);
    mbedtls_sha256_init(dst);
    core::ptr::copy((*src).hasher, (*dst).hasher, 1);
}

#[allow(unused_variables)]
#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_starts(
    ctx: *mut mbedtls_sha256_context,
    is224: c_int,
) -> c_int {
    #[cfg(not(feature = "esp32"))]
    let mode = if is224 == 1 {
        ShaMode::SHA224
    } else {
        ShaMode::SHA256
    };

    #[cfg(feature = "esp32")]
    let mode = ShaMode::SHA256;

    let hasher = Sha::new(SHA::steal(), mode);
    core::ptr::copy(&hasher as *const _, (*ctx).hasher, 1);
    0
}

#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_update(
    ctx: *mut mbedtls_sha256_context,
    input: *const c_uchar,
    ilen: usize,
) -> c_int {
    let slice = core::ptr::slice_from_raw_parts(input as *const u8, ilen as usize);
    let mut remaining = &*slice;

    while remaining.len() > 0 {
        remaining = block!((*ctx).hasher.as_mut().unwrap().update(remaining)).unwrap();
    }
    0
}

#[no_mangle]
pub unsafe extern "C" fn mbedtls_sha256_finish(
    ctx: *mut mbedtls_sha256_context,
    output: *mut c_uchar,
) -> c_int {
    let mut data = [0u8; 32];
    block!((*ctx).hasher.as_mut().unwrap().finish(&mut data)).unwrap();
    core::ptr::copy_nonoverlapping(data.as_ptr(), output, data.len());
    0
}

This is not ideal but I tried to avoid changing the C side. You can just ignore peripheral: SHA since it's ZST and stealing it is a no-op. In the above code I dynamically allocate memory for the Sha struct and use that (that is the part which is not that good - would be better to let mbedtls allocate a suitable context and just use that)

With the above code the tests still work and the example runs into -29056 (MBEDTLS_ERR_SSL_INVALID_MAC) - I guess there is something wrong with the cryptographic functionality.

Especially I wonder about the clone function - if mbedtls clones a context and later wants to resume working with that, the hardware is in a different state than what is needed to resume working (hope that makes sense), Even without cloning it's a problem since mbedtls might create a few contexts and use them in any order (without finishing any of them).

e.g. esp-idf does that here: https://github.com/espressif/esp-idf/blob/master/components/mbedtls/port/sha/block/esp_sha256.c#L106 - especially in esp_sha_write_digest_state

@AnthonyGrondin
Copy link
Contributor Author

Hi,

Thanks for the feedback and help. We can always change the C side headers, as we are providing the alternative context. If there are better ways to represent the context in C, we should do it right that way.

Yeah, I'm not sure about the handling of the clone either.

I've also looked at what we write in the output in _finish() to see if we put in the right data, and the values are the same as the data slice. Which is expected, since the self-tests pass.

I think the main difference between self_tests and a real operation, is that self_tests don't use clone, while doing a request clones the context a few times.

I'm seeing that the C implementation in esp-idf linked above is quite different than the current one, I don't know how close we should be to that implementation.

I've ran sync_client_mTLS using the changes above, and I'm getting the following output. It looks like it runs out of memory.

We are connected!
Making HTTP request
Start tls connect
WARN - no Tx token available
WARN - no Tx token available
WARN - no Tx token available
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 2312 bytes
WARN - Unable to allocate 524 bytes
WARN - Unable to allocate 1036 bytes
WARN - Unable to allocate 1036 bytes
 
 
!! A panic occured in 'examples/sync_client_mTLS.rs', at line 144, column 33
 
PanicInfo {
    payload: Any { .. },
    message: Some(
        called `Result::unwrap()` on an `Err` value: MbedTlsError(-28160),
    ),
    location: Location {
        file: "examples/sync_client_mTLS.rs",
        line: 144,
        col: 33,
    },
    can_unwind: true,
}

@bjoernQ
Copy link
Collaborator

bjoernQ commented Oct 10, 2023

That's definitely an improvement then. I guess the general idea of the code here is fine so no need to worry about esp-idf looking quite different. I think the real remaining problem is that the driver needs a way to "resume" an operation and switch between them. Technically I think we just need to restore some state but the way the driver works currently (IIRC) might require some API changes (i.e. there is currently no safe way to create multiple instances of it)

We can then make mbedtls_sha256_context in the C header fit the hasher so that we don't need additional allocations (and let mbedtls manage that for us)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants