New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast double-buffered DMA #226
Conversation
Currently there is 2 PCLK delay (equivalent to up to 8 CPU cycles) when clearing transfer complete flags for the update to trickle through the bus matrices, the peripheral and the irq synchronizer after clearing an interrupt flag. In many cases the delay is not required because there are enough instruction following before exiting an ISR that would fully absorb the delay. This change adds a method to clear the flag without additional delay.
This add a new unsafe method to the DMA API for significantly faster transfer handling. In many cases the safety and universality offered by `next_transfer_with` is not necessary and costly in terms of cycles. This is especially the case when multiple double-buffer DMA streams are handled together and when the risk of loosing the race to the inactive buffer against the running DMA transfer is either acceptable or excluded by design of the handler.
It's really interesting that such a dramatic speedup is possible. I guess that's with release mode / optimisations enabled in both cases? The PR looks good to me. |
Even without address poisoning and buffer swapping we can detect the case where the user processing is too slow and DMA wins the race to the inactive buffer. This returns `Err(DMAError::Overflow)` in that case at the small expense of a single additional peripheral read.
Yes. That's always in |
Thanks! bors r+ |
When handling multiple double-buffered DMA streams the current safe-and-universal API becomes slower than necessary.
This adds
next_dbm_transfer_with()
, a closure-based API to the inactive buffer, without address poisoning, compiler fences, data barriers, or buffer swapping. For some applications this reduces the overhead significantly (speedup of 5).