async rand_core trait #1155

newAM · 2021-08-04T16:21:31Z

Background

The embedded-hal crate provides traits for embedded hardware. In #291 embedded-hal removed their RNG trait in favor of rand_core. embedded-hal is now working on adding async traits using GATs in #285.

This issue is really more of a question than a feature request: Is rand_core the appropriate place to add async RNG traits? If there are use-cases for async RNG traits outside of embedded rust I think it would be a good idea to include this in rand_core, otherwise it would probably be more appropriate for embedded-hal.

What is your motivation?

Developing async on embedded targets has been progressing nicely, and with GAT stabilization on the way it will soon be possible to write #![no_std] friendly async traits on rust stable.

HW RNG peripherals are fast, but can still benefit from async methods that allow the CPU to yield to other tasks while waiting for the hardware to generate entropy. On the STM32WL (ARM Cortex-M4), generating a [u32; 4] block of entropy takes ~2662 CPU cycles on a cold-boot, and ~412 cycles when a steady state is reached. An async context switch takes a minimum of 62 cycles using the embassy executor.

What type of application is this?

Hardware entropy generation for embedded development on #![no_std] targets.

Feature request

Add an async RNG trait.

This would look something like this (prior art from embassy-rs):

use core::future::Future;

/// Random-number Generator
pub trait Rng {
    type Error;

    type RngFuture<'a>: Future<Output = Result<(), Self::Error>> + 'a
    where
        Self: 'a;

    /// Completely fill the provided buffer with random bytes.
    ///
    /// May result in delays if entropy is exhausted prior to completely
    /// filling the buffer. Upon completion, the buffer will be completely
    /// filled or an error will have been reported.
    fn fill_bytes<'a>(&'a mut self, dest: &'a mut [u8]) -> Self::RngFuture<'a>;
}

The text was updated successfully, but these errors were encountered:

vks · 2021-08-04T21:17:19Z

I think it makes sense to add such a trait to rand_core, just to make interoperability easier.

There are some open questions about how this would interact with the existing traits. For instance: Can we implement Rng for all types that implement AsyncRng?

josephlr · 2021-08-05T01:47:51Z

One other question is if rand_core or rand should have any types that implement RngAsync. The only ones I can think of would be our various Rng adapters like:

BlockRng, BlockRng64 (if the underlying R implements RngAsync)
ReseedingRng (if the underlying Rsdr implements RngAsync)

It might actually make more sense have an async version of BlockRngCore. This abstraction would seem to better match the underlying hardware. It would also discourage using the (presumably slow) hardware RNG directing, and instead incentivizing use of it through a SeedableRng or ReseedingRng.

dhardy · 2021-08-05T09:09:58Z

This issue is really more of a question than a feature request: Is rand_core the appropriate place to add async RNG traits?

So, will there be direct interoperability between async and synchronous RNGs? If so this may make sense; if not a dedicated crate may be preferable(?).

Can we implement Rng for all types that implement AsyncRng?

How? Technically, yes, by spinning until poll is ready, but futures are usually waited on by an executor. But there is no standard executor and this is not a good place to be opinionated.

The reverse, implementing AsyncRng for every RngCore, would be easy, and perhaps makes more sense: users of RngCore will block until a result is yielded; users of AsyncRng can use their executor for concurrency.

Note that if we do this, a type cannot directly support both async and sync usage. But if we don't, an adapter is required to use a sync RNG in an async function; this is probably fine, thus it may be better not to have any auto impl.

It might actually make more sense have an async version of BlockRngCore

Is your point that ReseedingRng could still implement sync behaviour by requesting a fresh seed in a future which is polled on each request for bytes, only doing the actual reseeding once poll returns Ready? That might work (perhaps with some limit before it blocks, for security reasons).

Or is it simply that derived RNGs might implement both RngCore and AsyncRngCore depending on what their underlying RNG implements? Sure.

Another question: should getrandom support async usage? If so we can have AsyncOsRng (or OsRngAsync). But this doesn't need to be answered now.

newAM · 2021-08-05T19:20:15Z

Can we implement Rng for all types that implement AsyncRng?

The reverse, implementing AsyncRng for every RngCore, would be easy, and perhaps makes more sense: users of RngCore will block until a result is yielded; users of AsyncRng can use their executor for concurrency.

Note that if we do this, a type cannot directly support both async and sync usage. But if we don't, an adapter is required to use a sync RNG in an async function; this is probably fine, thus it may be better not to have any auto impl.

This is a design decision, but personally I would keep these separate. The choice to impl an async trait vs a sync trait should be representative of what the underlying hardware/code is doing.

It might actually make more sense have an async version of BlockRngCore. This abstraction would seem to better match the underlying hardware.

This would be a good fit for the hardware RNG I am currently working with. Hopefully other embedded users can comment on what the ideal trait would be.

It would also discourage using the (presumably slow) hardware RNG directing, and instead incentivizing use of it through a SeedableRng or ReseedingRng.

I should explain the cycle counts a bit more; the question asked in the embedded-rust matrix chat was if async RNG traits make sense at all. Polling is faster than async if the time to switch context is longer than polling hardware for completion.
Based on the numbers I have available I do think that there are valid use-cases for async RNG traits; but they are definitely more specialized.

Sidetracking for a moment; the STM32WL hardware is quite fast as compared to software algorithms.

Source	Cycles per `[u32; 4]`
`ChaCha20Rng`	2,875
`ChaCha12Rng`	1,764
`ChaCha8Rng`	1,216
STM32WL HW RNG	412

That being said there are still valid reasons to use software random number generation when hardware acceleration is available:

Hardware RNGs can fail, whereas most(?) software RNG's are infallible after successfully seeding with hardware.
Code portability
Concurrent random number generation

dhardy · 2021-08-06T07:05:41Z

So, my current understanding from the above:

there is a demand for an async RNG trait somewhere common such as this library
the only desired output type is [u8] with some Result<(), Error> indicator
there is no desire for direct interoperability with rand distributions, shuffling, etc.
there is no need to make the trait object-safe
BlockRngCore would be a useful batching system, but presumably with async output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async rand_core trait #1155

async rand_core trait #1155

newAM commented Aug 4, 2021 •

edited

vks commented Aug 4, 2021

josephlr commented Aug 5, 2021

dhardy commented Aug 5, 2021

newAM commented Aug 5, 2021

dhardy commented Aug 6, 2021

async rand_core trait #1155

async rand_core trait #1155

Comments

newAM commented Aug 4, 2021 • edited

Background

Feature request

vks commented Aug 4, 2021

josephlr commented Aug 5, 2021

dhardy commented Aug 5, 2021

newAM commented Aug 5, 2021

dhardy commented Aug 6, 2021

newAM commented Aug 4, 2021 •

edited