Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore DMA-based APIs built on top of &'static mut references #37

Open
japaric opened this issue Feb 9, 2018 · 26 comments
Open

Explore DMA-based APIs built on top of &'static mut references #37

japaric opened this issue Feb 9, 2018 · 26 comments

Comments

@japaric
Copy link
Member

@japaric japaric commented Feb 9, 2018

Summary of this thread as of 2018-04-02

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

  • be memory safe, obviously
  • expose a device agnostic API, so we can use it to build DMA based HAL traits
  • support one-shot and circular transfers
  • support mem to peripheral and peripheral to mem transfers
  • not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

  • Canceling an ongoing transfer
  • Memory to memory transfers
  • Peeking into the part of the buffer that the DMA has already written to
  • Data widths other than 8-bit
  • Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

  • Configuration stuff like enabling burst mode or constraining usage of a channel to certain
    peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

  • Ownership and stable addresses are a must for memory safety
    • (remember that mem::forget is safe in Rust)
    • The ongoing transfer value must take ownership of the buffer and the DMA channel
    • storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it
      changes when the Transfer struct is moved
    • &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated
      collections. See owning_ref::StableAddress for a more complete list.
  • These concepts need to be abstracted over (using traits or structs):
    • An ongoing DMA transfer
    • A DMA channel, singletons make sense here
    • Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be
      restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

  • Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit
    aligned? The answer probably depends on the target device.

Attempts at the problem:

  • Memory safe DMA transfers, a blog post that explores using
    &'static mut references to achieve memory safe DMA transfers

  • stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains
    implementations of one-shot and circular DMA transfers


What the title says. This blog post describes the approach to building such APIs. The last part of
the post covers platform agnostic traits that could be suitable for inclusion in embedded-hal.

This issue is for collecting feedback on the proposed approach and deciding on what should land in
embedded-hal.

@therealprof
Copy link
Contributor

@therealprof therealprof commented Feb 9, 2018

Great stuff. I like how you're tackling DMA. However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point. Whether that is better or not depends on the application, after all you still might do other processing in interrupt handlers while busy waiting for a single transmission to complete.

Loading

@therealprof
Copy link
Contributor

@therealprof therealprof commented Feb 9, 2018

I should probably add that I view interrupt support as a much more important topic for async IO and DMA mostly as a method to reduce interrupt frequency and contention. DMA without interrupts is not actually that useful.

Loading

@jamesmunns
Copy link
Member

@jamesmunns jamesmunns commented Feb 9, 2018

I still need to do some reading (my DMA knowledge is extremely limited), but it would also be interesting how to model peripheral to peripheral transfers.

For example, if you (for some reason) wanted to have a DMA driven UART passthrough (data comes in on one, goes out on the other), how would this be modelled? The transaction should take ownership of the serial ports and DMA channel, but wouldn't require a buf, as far as I know.

Loading

@protomors
Copy link

@protomors protomors commented Feb 9, 2018

@jamesmunns As far as I know STM32 does not support peripheral to peripheral DMA transfers. Do other microcontrollers have such feature?

Loading

@jamesmunns
Copy link
Member

@jamesmunns jamesmunns commented Feb 9, 2018

@protomors based on reading this thread, I thought it might be possible, though that might be more by accident than by design. AN4031 section 1.1.4 though seems to exclude p-to-p DMA.

I can find some mentions of peripheral to peripheral DMA being used on other architectures, such as the PIC32, as well as the PSoC3 (8051), but can't really find any compelling examples for Cortex-M based devices.

I don't have a use case (or experience with this), but it looks like this is "not a problem" at the moment.

Loading

@jamesmunns
Copy link
Member

@jamesmunns jamesmunns commented Feb 9, 2018

Actually, I just found the PSoC5LP (Cortex-M3 based) does support this, judging by this app note from Cypress, but this still looks like a pretty rare feature (might not be worth defining interfaces/traits for the edge cases).

Loading

@therealprof
Copy link
Contributor

@therealprof therealprof commented Feb 12, 2018

@jamesmunns Plenty of Cortex-M chips have a peripheral to peripheral communication but it's often not called DMA (because it isn't necessarily just that), e.g. the Nordic NRF have it and it's called Programmable Peripheral Interconnect (PPI) there, the ATSAMD call it EVSYS. As I mentioned before an important aspect of DMA is being able to process interrupts generated by the DMA engine and "p-to-p" is about letting one peripheral know that some other peripheral has finished processing, via DMA or not is not of importance. If you have to poll for DMA completion it's pretty much just a waste of resources.

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Feb 13, 2018

@therealprof

However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

Interrupts are a topic orthogonal to the DMA that deserve their own issue / discussion thread. I have my thoughts on the topic but I won't voice them in this issue.

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point.

You don't need interrupt handlers / callbacks to avoid busy waiting for the DMA transfer to complete either (if that's what you are implicitly suggesting). If you use cooperative multitasking (i.e. generators) you can suspend (yield) the task while the transfer is in progress, go do other tasks and/or sleep, and resume the first task only when the transfer is done -- no busy waiting or interrupt handlers involved.

@jamesmunns

I don't know about peripheral to peripheral transfer but the reference implementation is missing DMA functionality like memory to memory transfers, priority levels and burst mode. However, I think that functionality could be provided as methods on the channel types; it just needs to be implemented. Mem2mem transfers sound like they may be useful to have as a trait but I'm not sure if priority and burst mode should have traits.

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Feb 13, 2018

You don't need interrupt handlers / callbacks to avoid busy waiting

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

Loading

@therealprof
Copy link
Contributor

@therealprof therealprof commented Feb 13, 2018

@japaric

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

Well, yes, you can. But it's a completely manual and static setup and fully separate from the hal drivers.

no busy waiting or interrupt handlers involved.

That's not what I meant. Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

Fair enough. I still stand by my point that DMA was introduced to reduce interrupt overhead and contention and is somewhat pointless without being able to use interrupts.

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Feb 13, 2018

Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention. I already have an implementation in my head but haven't had time to test it; it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

Loading

@therealprof
Copy link
Contributor

@therealprof therealprof commented Feb 13, 2018

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention.

So how would a task know it can make progress without polling or being notified by an interrupt handler?

it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

That's yet another aspect why you want to be able to use interrupts. ;)

I'll be happy to look at your ideas when you have something. Just wanted to throw in that for me personally DMA is way down on the implementation list without a better way to use interrupts with drivers, it's just a ton of work to implement with very little benefit over a stupid/simple register based implementation.

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Apr 1, 2018

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

  • be memory safe, obviously
  • expose a device agnostic API, so we can use it to build DMA based HAL traits
  • support one-shot and circular transfers
  • support mem to peripheral and peripheral to mem transfers
  • not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

  • Canceling an ongoing transfer
  • Memory to memory transfers
  • Peeking into the part of the buffer that the DMA has already written to
  • Data widths other than 8-bit
  • Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

  • Configuration stuff like enabling burst mode or constraining usage of a channel to certain
    peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

  • Ownership and stable addresses are a must for memory safety
    • (remember that mem::forget is safe in Rust)
    • The ongoing transfer value must take ownership of the buffer and the DMA channel
    • storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it
      changes when the Transfer struct is moved
    • &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated
      collections. See owning_ref::StableAddress for a more complete list.
  • These concepts need to be abstracted over (using traits or structs):
    • An ongoing DMA transfer
    • A DMA channel, singletons make sense here
    • Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be
      restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

  • Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit
    aligned? The answer probably depends on the target device.

Attempts at the problem:

  • Memory safe DMA transfers, a blog post that explores using
    &'static mut references to achieve memory safe DMA transfers

  • stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains
    implementations of one-shot and circular DMA transfers

Loading

@ryankurte
Copy link
Contributor

@ryankurte ryankurte commented Apr 2, 2018

@japaric IIRC (at least with DMA USB drivers) regardless of word size the buffer has to be 32-bit aligned on the m3's I've used. Almost definitely varies with platfom though.

Loading

@dcarosone
Copy link

@dcarosone dcarosone commented Apr 17, 2018

I was a bit stuck trying to figure out how to use this for a while with RTFM: japaric/ws2812b#5

I found examples/mpu9250-log.rs useful, especially this trick for what to pass around in resources rather than lots of individual items:

type TX = Option<Either<(TX_BUF, dma1::C4, Tx<USART1>), Transfer<R, TX_BUF, dma1::C4, Tx<USART1>>>>;

I was wishing for a way to reuse a Transfer repeatably (update buffer and 'go again') or get back all the channels and parts to make a new one over the same resources. This does that, but I wonder if there's a nicer way to put more of this in the generic hal rather than user code?

However, this (in the SYS_TICK handler of that example) surprised me:

    let (buf, c, tx) = match r.TX.take().unwrap() {
        Either::Left((buf, c, tx)) => (buf, c, tx),
        Either::Right(trans) => trans.wait(),
    };

wait() is a blocking call, and I had just assumed I couldn't use it! Perhaps it would be fine in a handler for a DMA completion interrupt, where we're already 'sure' it won't block, but this is in a timer and seems like that guarantee doesn't hold. I guess it's just being glossed over for the sake of a simple example, and that the ticks are assumed to be far enough apart that the dma will have completed?

So:

  • have I misunderstood, and wait() is somehow ok to use generally?
  • is there a non-blocking is_done() or some similar check so the tick handler could return early and come back next time? (say we have larger buffers or serial flow control) Edit: yes there is, duh.
  • is wait() always ok to use in the corresponding dma completion interrrupt handler?
  • should proper code for this log example listen for the dma completion, call .wait() there to update the TX, and have the tick handler exit early or otherwise handle an incomplete transfer when the timer expires?

Loading

@thenewwazoo
Copy link
Contributor

@thenewwazoo thenewwazoo commented May 5, 2018

I spent some time adding DMA support to the stm32l4x6_hal crate, and wanted to add some notes. I attempted to use the Transfer and CircBuffer abstractions with serial ports and RTFM.

Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do. I ended up adding a couple of functions: restart for Transfer and partial_peek for CircBuffer which return slices of the buffer. They work well enough for receiving data, but I haven't found a good solution for transmitting because of the inability to pass Transfers around.

Loading

@Ravenslofty
Copy link

@Ravenslofty Ravenslofty commented Nov 6, 2018

So, the PS2, which I've been developing on, has a hardware requirement for 16-byte aligned DMA (it ignores the lower four bits of the specified memory address).

For the sake of throwing ideas out, perhaps the implementation could define a zero-sized Alignment type which is #[repr(align(T))] for T-byte minimum alignment, and then the user can either borrow from the aligned crate, or use their own.

I'd rather not reinvent the DMA interface wheel if somebody's going to change its shape.

Loading

@nraynaud
Copy link

@nraynaud nraynaud commented Feb 10, 2019

Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do.

I think one way to do it is to declare your resource Option<Tx<...>> then in your task you .take() the resource, that gets it, and other tasks will just see a none.
the comment right above yours is actually a complicated way to do it.

The Option part is for taking the object, and the Either part is because either there is a transfer running and you need to wait() it to get the bit necessary to start a new transfer, or there is currently no transfer running and it just stores the bits and pieces to start a transfer.

Loading

@nraynaud
Copy link

@nraynaud nraynaud commented Feb 10, 2019

On the topic of this RFC, I see a need for non-static DMA buffer for transactions that will be using the DMA but are blocking.

It might seem counterintuitive, but the byte-by-byte APIs of various MCU devices are sometimes impossible to use at the speed of the physical interface (I suspect it's because of the bus traffic in between the bytes), but there is still no interest in going full async on the API (in particular by blocking we can keep the buffer on the stack, then release it).

Here is a mention of the problem: https://javakys.wordpress.com/2014/09/04/how-to-implement-full-duplex-spi-communication-using-spi-dma-mode-on-stm32f2xx-or-stm32f4xx/

Here is some code for your consideration:
https://github.com/nraynaud/stm32f103xx-hal/blob/153b97b682790bc16169868a02c7df97c8c601c6/src/serial.rs#L508

pub trait WriteDmaImmediately<A>: DmaChannel
    where
        A: AsRef<[u8]> + ?Sized,
        Self: core::marker::Sized,
{
    fn write_immediately(self, chan: Self::Dma, buffer: &A) -> (&A, Self::Dma, Self);
}

On a slightly different subject, I have rendered the buffer + ?Sized in my DMA APIs, so that I can pass slices to the DMA transfer.

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Feb 11, 2019

@nraynaud if the operation is blocking I think there's no need for taking the receiver, or the buffer, by value. Something like fn write_all(&mut self, buffer: &[u8]) should work fine. Note that you also need the compiler fences in blocking code.

Also, the blog post linked in the issue description is a bit dated by now. The embedonomicon has the latest information on DMA; there have been a few updates: Pin instead of &'static mut (which allows Box, Rc, etc) and the compiler fences have been softened (while preserving correctness).

Loading

@nraynaud
Copy link

@nraynaud nraynaud commented Feb 11, 2019

Thanks, I will read all that, and try to use references. It didn't occur to me.

Do you have a documentation/blog post on the compiler fences and when to use them? What about volatile?

Loading

@japaric
Copy link
Member Author

@japaric japaric commented Feb 11, 2019

@nraynaud the embedonomicon explains why the compiler fences are needed. volatile ops can't be reordered wrt to each other but non-volatile opss (like operations on buffers) can be reordered wrt to volatile ops; the compiler fences are there to prevent the latter, which can result in compiler misoptimizations.

Loading

@burrbull
Copy link
Contributor

@burrbull burrbull commented Feb 11, 2019

I've tried to rewrite dma transfer according embedonomicon here but haven't tested it yet.

Loading

@justacec
Copy link

@justacec justacec commented Jul 24, 2019

How are these efforts going? I noticed that the last posting on the thread is from Feb 11 2019. I am new to embedded rust and would like to play with some DMA stuff. While I could just manipulate the chip registers directly and cobble something together, there has obviously been much more thought put into this here. I would like to play with what you guys are suggesting. :)

Loading

@hannobraun
Copy link
Member

@hannobraun hannobraun commented Jul 24, 2019

@justacec More DMA implementations keep appearing. I've personally worked on two (stm32f7xx-hal, stm32l0xx-hal) over the last weeks.

I'm not aware of any efforts for creating a platform-independent API that could be included into embedded-hal. That would be great to have, but also a significant effort, I believe.

Loading

@rudihorn
Copy link

@rudihorn rudihorn commented Jul 16, 2020

I submitted a draft PR for the stm32f1xx-hal crate (stm32-rs/stm32f1xx-hal#244) which allows circular DMA buffers to support reading arbitrary lengths of data as it comes in.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet