New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore DMA-based APIs built on top of `&'static mut` references #37

Open
japaric opened this Issue Feb 9, 2018 · 17 comments

Comments

Projects
None yet
8 participants
@japaric
Copy link
Member

japaric commented Feb 9, 2018

Summary of this thread as of 2018-04-02

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

  • be memory safe, obviously
  • expose a device agnostic API, so we can use it to build DMA based HAL traits
  • support one-shot and circular transfers
  • support mem to peripheral and peripheral to mem transfers
  • not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

  • Canceling an ongoing transfer
  • Memory to memory transfers
  • Peeking into the part of the buffer that the DMA has already written to
  • Data widths other than 8-bit
  • Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

  • Configuration stuff like enabling burst mode or constraining usage of a channel to certain
    peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

  • Ownership and stable addresses are a must for memory safety
    • (remember that mem::forget is safe in Rust)
    • The ongoing transfer value must take ownership of the buffer and the DMA channel
    • storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it
      changes when the Transfer struct is moved
    • &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated
      collections. See owning_ref::StableAddress for a more complete list.
  • These concepts need to be abstracted over (using traits or structs):
    • An ongoing DMA transfer
    • A DMA channel, singletons make sense here
    • Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be
      restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

  • Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit
    aligned? The answer probably depends on the target device.

Attempts at the problem:

  • Memory safe DMA transfers, a blog post that explores using
    &'static mut references to achieve memory safe DMA transfers

  • stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains
    implementations of one-shot and circular DMA transfers


What the title says. This blog post describes the approach to building such APIs. The last part of
the post covers platform agnostic traits that could be suitable for inclusion in embedded-hal.

This issue is for collecting feedback on the proposed approach and deciding on what should land in
embedded-hal.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Feb 9, 2018

Great stuff. I like how you're tackling DMA. However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point. Whether that is better or not depends on the application, after all you still might do other processing in interrupt handlers while busy waiting for a single transmission to complete.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Feb 9, 2018

I should probably add that I view interrupt support as a much more important topic for async IO and DMA mostly as a method to reduce interrupt frequency and contention. DMA without interrupts is not actually that useful.

@jamesmunns

This comment has been minimized.

Copy link
Member

jamesmunns commented Feb 9, 2018

I still need to do some reading (my DMA knowledge is extremely limited), but it would also be interesting how to model peripheral to peripheral transfers.

For example, if you (for some reason) wanted to have a DMA driven UART passthrough (data comes in on one, goes out on the other), how would this be modelled? The transaction should take ownership of the serial ports and DMA channel, but wouldn't require a buf, as far as I know.

@protomors

This comment has been minimized.

Copy link

protomors commented Feb 9, 2018

@jamesmunns As far as I know STM32 does not support peripheral to peripheral DMA transfers. Do other microcontrollers have such feature?

@jamesmunns

This comment has been minimized.

Copy link
Member

jamesmunns commented Feb 9, 2018

@protomors based on reading this thread, I thought it might be possible, though that might be more by accident than by design. AN4031 section 1.1.4 though seems to exclude p-to-p DMA.

I can find some mentions of peripheral to peripheral DMA being used on other architectures, such as the PIC32, as well as the PSoC3 (8051), but can't really find any compelling examples for Cortex-M based devices.

I don't have a use case (or experience with this), but it looks like this is "not a problem" at the moment.

@jamesmunns

This comment has been minimized.

Copy link
Member

jamesmunns commented Feb 9, 2018

Actually, I just found the PSoC5LP (Cortex-M3 based) does support this, judging by this app note from Cypress, but this still looks like a pretty rare feature (might not be worth defining interfaces/traits for the edge cases).

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Feb 12, 2018

@jamesmunns Plenty of Cortex-M chips have a peripheral to peripheral communication but it's often not called DMA (because it isn't necessarily just that), e.g. the Nordic NRF have it and it's called Programmable Peripheral Interconnect (PPI) there, the ATSAMD call it EVSYS. As I mentioned before an important aspect of DMA is being able to process interrupts generated by the DMA engine and "p-to-p" is about letting one peripheral know that some other peripheral has finished processing, via DMA or not is not of importance. If you have to poll for DMA completion it's pretty much just a waste of resources.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Feb 13, 2018

@therealprof

However I feel there's one very important piece missing in the embedded-hal world: The ability to tie interrupts to HAL periperals.

Interrupts are a topic orthogonal to the DMA that deserve their own issue / discussion thread. I have my thoughts on the topic but I won't voice them in this issue.

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

You nicely described how DMA gets rid of the need to wait for the transfer to complete on the spot but you still need to spend (potentially) a lot of cycles to check whether your transfer has completed at some point.

You don't need interrupt handlers / callbacks to avoid busy waiting for the DMA transfer to complete either (if that's what you are implicitly suggesting). If you use cooperative multitasking (i.e. generators) you can suspend (yield) the task while the transfer is in progress, go do other tasks and/or sleep, and resume the first task only when the transfer is done -- no busy waiting or interrupt handlers involved.

@jamesmunns

I don't know about peripheral to peripheral transfer but the reference implementation is missing DMA functionality like memory to memory transfers, priority levels and burst mode. However, I think that functionality could be provided as methods on the channel types; it just needs to be implemented. Mem2mem transfers sound like they may be useful to have as a trait but I'm not sure if priority and burst mode should have traits.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Feb 13, 2018

You don't need interrupt handlers / callbacks to avoid busy waiting

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Feb 13, 2018

@japaric

Also you can already use HAL abstractions with interrupts using RTFM at zero cost.

Well, yes, you can. But it's a completely manual and static setup and fully separate from the hal drivers.

no busy waiting or interrupt handlers involved.

That's not what I meant. Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

What I'm trying to get at here is that there are a few ways to avoid the busy waiting and they have more to do with the task model (reactive / callback-y / interrupt-driven, cooperative, serial / blocking, etc.) than the DMA abstraction. I'd like the DMA abstraction to be general / flexible enough to support all the task models out there with minimal code duplication (I don't know if that's possible at this point though)

Fair enough. I still stand by my point that DMA was introduced to reduce interrupt overhead and contention and is somewhat pointless without being able to use interrupts.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Feb 13, 2018

Even with cooperative approaches you need to periodically check in whether the transfer has finished and depending on how slow the transfer is you might still waste lots of MCU cycles doing that.

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention. I already have an implementation in my head but haven't had time to test it; it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

@therealprof

This comment has been minimized.

Copy link
Contributor

therealprof commented Feb 13, 2018

With a naive implementation, yes. Above, I was refering to a Tokio (epoll / kqueue) like implementation; there a task is only resumed when it can make progress -- this eliminates the continuous polling you mention.

So how would a task know it can make progress without polling or being notified by an interrupt handler?

it uses the interrupt mechanism (NVIC), because otherwise you can't go to sleep, but doesn't involve any (device specific) interrupt handler.

That's yet another aspect why you want to be able to use interrupts. ;)

I'll be happy to look at your ideas when you have something. Just wanted to throw in that for me personally DMA is way down on the implementation list without a better way to use interrupts with drivers, it's just a ton of work to implement with very little benefit over a stupid/simple register based implementation.

@japaric

This comment has been minimized.

Copy link
Member

japaric commented Apr 1, 2018

Current status of the DMA API design:

I think we have a clear idea of what the API should provide:

It MUST

  • be memory safe, obviously
  • expose a device agnostic API, so we can use it to build DMA based HAL traits
  • support one-shot and circular transfers
  • support mem to peripheral and peripheral to mem transfers
  • not require dynamic memory allocation, i.e. allocator support should be optional

It SHOULD support these features:

  • Canceling an ongoing transfer
  • Memory to memory transfers
  • Peeking into the part of the buffer that the DMA has already written to
  • Data widths other than 8-bit
  • Using a part / range of the buffer; the API still needs to take ownership of the whole buffer in this scenario

This is out of scope:

  • Configuration stuff like enabling burst mode or constraining usage of a channel to certain
    peripherals. The API for this functionality is up to the HAL implementer to provide / design.

What we know so far from existing implementations:

  • Ownership and stable addresses are a must for memory safety
    • (remember that mem::forget is safe in Rust)
    • The ongoing transfer value must take ownership of the buffer and the DMA channel
    • storing [T; N] in the Transfer struct doesn't work because the address is not stable -- it
      changes when the Transfer struct is moved
    • &'static mut fulfills these two requirements but so do Box, Vec and other heap allocated
      collections. See owning_ref::StableAddress for a more complete list.
  • These concepts need to be abstracted over (using traits or structs):
    • An ongoing DMA transfer
    • A DMA channel, singletons make sense here
    • Buffers compatible with DMA transfers. cf. StableAddress. Also the element type must be
      restricted -- e.g. a transfer on &'static mut [Socket] doesn't make sense.

Unresolved questions:

  • Alignment requirements. Does a transfer on a [u16] buffer require the buffer to be 16-bit
    aligned? The answer probably depends on the target device.

Attempts at the problem:

  • Memory safe DMA transfers, a blog post that explores using
    &'static mut references to achieve memory safe DMA transfers

  • stm32f103xx-hal, PoC implementations of the idea present in the blog post. It contains
    implementations of one-shot and circular DMA transfers

@ryankurte

This comment has been minimized.

Copy link
Contributor

ryankurte commented Apr 2, 2018

@japaric IIRC (at least with DMA USB drivers) regardless of word size the buffer has to be 32-bit aligned on the m3's I've used. Almost definitely varies with platfom though.

@et1975 et1975 referenced this issue Apr 13, 2018

Open

dma hal? #20

@dcarosone

This comment has been minimized.

Copy link

dcarosone commented Apr 17, 2018

I was a bit stuck trying to figure out how to use this for a while with RTFM: japaric/ws2812b#5

I found examples/mpu9250-log.rs useful, especially this trick for what to pass around in resources rather than lots of individual items:

type TX = Option<Either<(TX_BUF, dma1::C4, Tx<USART1>), Transfer<R, TX_BUF, dma1::C4, Tx<USART1>>>>;

I was wishing for a way to reuse a Transfer repeatably (update buffer and 'go again') or get back all the channels and parts to make a new one over the same resources. This does that, but I wonder if there's a nicer way to put more of this in the generic hal rather than user code?

However, this (in the SYS_TICK handler of that example) surprised me:

    let (buf, c, tx) = match r.TX.take().unwrap() {
        Either::Left((buf, c, tx)) => (buf, c, tx),
        Either::Right(trans) => trans.wait(),
    };

wait() is a blocking call, and I had just assumed I couldn't use it! Perhaps it would be fine in a handler for a DMA completion interrupt, where we're already 'sure' it won't block, but this is in a timer and seems like that guarantee doesn't hold. I guess it's just being glossed over for the sake of a simple example, and that the ticks are assumed to be far enough apart that the dma will have completed?

So:

  • have I misunderstood, and wait() is somehow ok to use generally?
  • is there a non-blocking is_done() or some similar check so the tick handler could return early and come back next time? (say we have larger buffers or serial flow control) Edit: yes there is, duh.
  • is wait() always ok to use in the corresponding dma completion interrrupt handler?
  • should proper code for this log example listen for the dma completion, call .wait() there to update the TX, and have the tick handler exit early or otherwise handle an incomplete transfer when the timer expires?
@thenewwazoo

This comment has been minimized.

Copy link
Contributor

thenewwazoo commented May 5, 2018

I spent some time adding DMA support to the stm32l4x6_hal crate, and wanted to add some notes. I attempted to use the Transfer and CircBuffer abstractions with serial ports and RTFM.

Because the Transfer owns its Tx or Rx pin and the buffer, and wait()ing on the Transfer consumes the Transfer (returning the buffer and pin), it's not possible to use them in a task because whatever you pass to the task must be consumed to start or end the transfer. For example, a DMA1_CHANNEL5 task must move the Transfer resource to operate on the buffer contents, which it cannot do. I ended up adding a couple of functions: restart for Transfer and partial_peek for CircBuffer which return slices of the buffer. They work well enough for receiving data, but I haven't found a good solution for transmitting because of the inability to pass Transfers around.

@ZirconiumX

This comment has been minimized.

Copy link

ZirconiumX commented Nov 6, 2018

So, the PS2, which I've been developing on, has a hardware requirement for 16-byte aligned DMA (it ignores the lower four bits of the specified memory address).

For the sake of throwing ideas out, perhaps the implementation could define a zero-sized Alignment type which is #[repr(align(T))] for T-byte minimum alignment, and then the user can either borrow from the aligned crate, or use their own.

I'd rather not reinvent the DMA interface wheel if somebody's going to change its shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment