Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LPSPI: embedded-hal 1.0 rework #145

Draft
wants to merge 74 commits into
base: main
Choose a base branch
from

Conversation

Finomnis
Copy link
Contributor

No description provided.

@mciantyre mciantyre mentioned this pull request Nov 25, 2023
6 tasks
@mciantyre mciantyre self-requested a review November 27, 2023 13:22
Copy link
Member

@mciantyre mciantyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this, and sorry for the slow review. There's lots of work in this branch, including changes to CI tools and updated dependencies. If you think these are helpful to include separately, I can take a look and help with separation.


We're free to break APIs and explore new drivers. But I'll admit: I'm going to miss the 0.5 Lpspi driver. It wasn't perfect, but I appreciated that the driver exposed the lower-level details of the peripheral to users. I've used this to drive I/O from interrupts (without async conveniences). And although I hadn't tried, I thought it could be the foundation for a future async driver, either in this package or another package.

During your prototyping, did you hit limitations of the existing Lpspi driver that prevented it from being usable as this driver's foundation? (I may be in the minority about the "give users low-level control" design decisions, so I'm happy to scrap it.)


Our approach is a single Lpspi driver that supports blocking, interrupt-driven async, and DMA-driven async IO in one / both directions. Having the interrpt and DMA async IO in a single implementation lets the user decide if the want interrupts to send data and DMA to receive data, DMA to send and receive data, etc. I'm curious if there's a way to let the user compose these features themselves, without it all being in one larger driver. Here's a rough outline of what it might look like:

Outline, trade-offs
  • lpspi::InterruptTransmit for asynchronously sending data with interrupts. Includes its own InterruptHandler-style object, which is used in the LPSPI ISR. Doesn't care about embedded-hal-async (EHA).
  • lpspi::InterruptReceive for async receiving data with interrupts. Includes its own InterruptHandler-style object, used inside the LPSPI ISR. Doesn't care about EHA.
  • lpspi::DmaTransmit for async-sending data with DMA. Includes an InterruptHandler-style object, used inside a DMA ISR. Doesn't care about EHA.
  • lpspi::DmaReceive: async-receive, complementary handler used in DMA ISR, no EHA care.
  • lpspi::AsyncLpspi accepts a combo of the *Transmit and *Receive splits. The user is already juggling the InterruptHandlers, and that's how we realize the async IO. This combiner implements the EHA async traits. The user constructs this.
  • Some internal means of splitting the LPSPI instance to realize these APIs. Waves hands waves hands.
  • Keep blocking behaviors separate.

The user manages two InterruptHandler objects, instead of the single InterruptHandler shown today. AsyncLpspi supports u16 IO only when composed of the two Interrupt* halves. This approach might not let us realize some kinds of optimizations ("use interrupts, not DMA, when you can saturate the TX FIFO and quickly return to caller") without planning.

My goal with this design would be to build a solid interrupt-driven async foundation, then make it easy to eventually develop DMA-driven async without changing the large LPSPI driver's plumbing. It looks like we're approaching this idea with the internal read- and write-halves; one more cut, and we're nearly there.

Comment on lines +14 to +20
transfer_complete_waker: Option<Waker>,
error_caught: Option<LpspiError>,
error_caught_waker: Option<Waker>,
tx_fifo_watermark_busy: bool,
rx_fifo_watermark_busy: bool,
tx_fifo_watermark_waker: Option<Waker>,
rx_fifo_watermark_waker: Option<Waker>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're tracking four wakers in this shared state, each specific to a certain condition of the peripheral. I'm curious if this is necessary. What's the condition when at least two of these wakers will be associated with distinct futures produced by the LPSPI driver?

From what I can tell, the user can only produce one future from the LPSPI driver. This is because the embedded-hal-async traits all require exclusive references (&mut Lpspi<...>), and we're not exposing methods that produce futures from shared LPSPI references (&Lpspi<...>). So even when we internally compose futures with select_biased! and join!, the waker associated with all of those state machines is the same.

If that's correct, could we get away with one Waker? Once we wake that one waker for any condition, the executor polls our top-level LPSPI future, which in turn figures out if transfers completed, or if errors are caught, or if watermarks were crossed.

(My concern is that we're setting up execution where we could excessively follow function pointers and wake an executor inside a critical section. Heads up that I'm not measuring any of this, so this is just coming from code study. I've written code that does this too, so I'm generally looking for different approaches.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might actually be true. Haven't thought of it like this yet. It must be guaranteed though, otherwise this falls back to busy waiting.

@Finomnis
Copy link
Contributor Author

Finomnis commented Dec 17, 2023

I may be in the minority about the "give users low-level control" design decisions

I personally don't like this, because it introduces ways for the user to break the internal state of the driver. I kind of consider this a functional unsoundness. I found the existing driver very hard to use because it requires me to understand the peripheral, which I as a user don't want to have to understand. I felt I would have to write a driver for the driver to actually use it in a project.

I would personally rather go the other route - closing it off completely and then step by step adding features as required. But that's just my personal opinion.

@Finomnis
Copy link
Contributor Author

Finomnis commented Dec 17, 2023

Here's a rough outline of what it might look like:

Let me think about this. You might be onto something.

Be aware that:

  • not every u8 supports dma. It only supports full dma if both the read and write buffer are of the same u32 alignment. Otherwise it will always be the case that we have to decide whether to dma read or write, both aren't possible if they are misaligned.
  • there will always be a benefit from using interrupts, even in the dma case. For error handling and for the flush call.
  • the few unaligned bytes before and after a transfer also benefit from interrupts, and would still be required on the dma version.

@teburd
Copy link
Member

teburd commented Dec 17, 2023

I think it’s cool that you are adding support for eh 1.0. I’d note that interrupts are not free of cost and sometimes it really is faster to poll. I saw this in a few spi peripherals in Zephyr that I’ve modified/reviewed. At times interrupts can cause slowdowns. Especially true when doing XIP or on cores with longer pipelines with icache/dcache involved like the M7

@Finomnis
Copy link
Contributor Author

That might be true!

If we

  • drop dma support for u8/u16
  • assume busy waiting for hal-async's flush call is acceptable

Then we can go with @mciantyre's architecture proposal.

@Finomnis
Copy link
Contributor Author

The biggest open question I still have is error handling. But I have to clarify:

  • errors can only occur if NOSTALL is activated. Otherwise it's guaranteed that the overflow errors can never occur. So the entire error discussion is irrelevant if we don't allow this flag to be set. I would like to keep the possibility, though.
  • without interrupts, errors would have to be polled repeatedly. According to the RM, if an error occurs, the transmission would have to be aborted immediately. If we relax this requirement for the dma case, this might be possible.

@mciantyre
Copy link
Member

mciantyre commented Dec 17, 2023

I found the existing driver very hard to use because it requires me to understand the peripheral, which I as a user don't want to have to understand.

We support embedded-hal for hiding the peripheral's complexity. We still need to give the user APIs for driver configurations, since embedded-hal doesn't help us here. But after you configure your driver and pass ownership into your embedded-hal-using component, it should become difficult to break the driver's internal state.

I'm sure there's use-cases I'm not considering, but that's the thinking for giving users the lower-level control.

there will always be a benefit from using interrupts, even in the dma case. For error handling and for the flush call.

We could require the user to transform InterruptTransmit into DmaTransmit, for example. Once DmaTransmit takes ownership of InterruptTransmit, it could select the I/O behaviors, wait for flush, check errors.

I’d note that interrupts are not free of cost and sometimes it really is faster to poll.

Good point. We're free to spin in async code if we find it profitable. We block the executor and other tasks.

examples/rtic_spi.rs Outdated Show resolved Hide resolved
src/common/lpspi/bus/eh1_impl.rs Show resolved Hide resolved
@Finomnis
Copy link
Contributor Author

Finomnis commented Dec 19, 2023

But after you configure your driver and pass ownership into your embedded-hal-using component, it should become difficult to break the driver's internal state.

I agree. So we should maybe look into something like a builder pattern?

@Finomnis
Copy link
Contributor Author

Finomnis commented Dec 19, 2023

@mciantyre Might have to take a step back over the next couple of days/weeks. If someone has an idea, be free to use my code as a base/reference. There are a couple of things in there that took a while to figure out, like the proper settings of the timing registers and the clock multiplier calculation, which I think are improvements over the original version in a couple of details.

If I do continue (which I will if nobody else wants to take over), I might start incorporating a couple of improvements into the existing driver if you think it would be better to start with the existing one as a low-level backend. We could then add a high-level driver which takes ownership of the low-level one that then can do Embedded Hal Async stuff. Although @mciantyre I would need some guidance on the naming of things, because I feel like both the low level and the high level driver deserve the name Lpspi.

@Finomnis
Copy link
Contributor Author

Moved discussions over to #147.


/// Returns whether or not the busy flag is set.
fn busy(&self) -> bool {
ral::read_reg!(ral::lpspi, self.lpspi(), SR, MBF == MBF_1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is buggy on imxrt1062 (or all, maybe)? Writing to the FIFO does not set MBF immediately; if compiled with optimizations this has a chance to report not busy although a transfer is in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants