Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementations of blocking embedded_hal traits should never return "Busy" #136

Open
Florob opened this issue May 19, 2023 · 8 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Florob
Copy link
Contributor

Florob commented May 19, 2023

Implementations of blocking embedded_hal traits should never return Busy.
That's a pretty bold claim to make, considering documentation for these traits is virtually non-existent in embedded_hal 0.2, so let me elaborate:
Consumer crates of these traits generally assume that their associated functions can be called back-to-back. I.e. either

  1. Functions block until an operation has finished
  2. If a function can't execute, because an operation is in progress, the function blocks until it can execute

Since the error type is opaque to consumer crates, they can't tell if an error is critical or just needs a retry.

The 1.0 release of the embedded_hal crate is going to add documentation to that extent to the spi module. For the I2C case this is still pending.

@mciantyre
Copy link
Member

It's a correct claim. Signaling a driver-specific busy error through the blocking interface is an oversight. I see this affects LPSPI (example) and LPI2C (example); let me know if you've seen others.

I prefer approach 2 when possible. It should be straightforward to change today's "if busy, return error" behaviors to "while busy, wait for not busy."

@mciantyre mciantyre added bug Something isn't working help wanted Extra attention is needed labels May 21, 2023
@Florob
Copy link
Contributor Author

Florob commented May 21, 2023

It's a correct claim. Signaling a driver-specific busy error through the blocking interface is an oversight. I see this affects LPSPI (example) and LPI2C (example); let me know if you've seen others.

These are the only ones I've seen.

I prefer approach 2 when possible. It should be straightforward to change today's "if busy, return error" behaviors to "while busy, wait for not busy."

That is what I currently have locally to get LPSPI working. I was wondering if one could do even better, since the code already waits for available FIFO space. That way new operations could be enqueued even if the peripheral is busy. However, I don't really understand how different functions interact sufficiently to tell whether that's possible. E.g. I suppose there must be some reason that operations currently start with clearing the FIFOs.

@mciantyre
Copy link
Member

Clearing the FIFO, particularly the RX FIFO, is important for error recovery. Suppose we're using the LPSPI to transfer (send and receive) data. While we're filling up the TX FIFO in the exchange loop, we observe a transmit error1. Although we've exchanged some data, we return early with an error, leaving received data in the RX FIFO. If we didn't clear the RX FIFO, the next exchange call would return data from the previous, abandoned exchange call.

The implementation clears the TX FIFO just in case the user had been manually issuing commands / data with enqueue_transaction / enqueue_data. (The "if busy, return error" behavior should mean this is moot.) Considerations for the RX FIFO come into play here, too; if the user hasn't called read_data to empty the RX FIFO after those enqueue_* calls, that data remains in the FIFO when they use a higher-level transfer call.

There was an attempt to make the lower-level I/O interface (like enqueue_transaction, enqueue_data, read_data) play well with the embedded-hal interface, so that they could co-exist on the same type. We're free to change that, either by design or documentation. We might need that to seamlessly realize approach 2.

If we only support the embedded-hal interface, we should be able to wait for space in the TX FIFO without checking for busy. That's the happy path. If there's an error, maybe clearing the FIFOs before returning the error would be sufficient? I'll think about this some more...

Footnotes

  1. LPSPI transmit errors occur when the TX FIFO underruns. Let's pretend an underrun occurs because an urgent, CPU-bound interrupt takes over just before we put more data in the FIFO.

@Florob
Copy link
Contributor Author

Florob commented Jan 3, 2024

embedded-hal 1.0 is going to require variant 1 for I2C, but either variant for SPI. Relevant text quoted below.

I2C:

# Flushing

Implementations must flush the transfer, ensuring the bus has returned to an idle state before returning.
No pipelining is allowed. Users must be able to shut down the I2C peripheral immediately after a transfer
returns, without any risk of e.g. cutting short a stop condition.

(Implementations must wait until the last ACK bit to report it as an error anyway. Therefore pipelining would only
yield very small time savings, not worth the complexity)

SPI:

To improve performance, [`SpiBus`] implementations are allowed to return before the operation is finished, i.e. when the bus is still not
idle. This allows pipelining SPI transfers with CPU work.

When calling another method when a previous operation is still in progress, implementations can either wait for the previous operation
to finish, or enqueue the new one, but they must not return a "busy" error. Users must be able to do multiple method calls in a row
and have them executed "as if" they were done sequentially, without having to check for "busy" errors.

@djmdjm
Copy link
Contributor

djmdjm commented Apr 25, 2024

The implementation clears the TX FIFO just in case the user had been manually issuing commands / data with enqueue_transaction / enqueue_data. (The "if busy, return error" behavior should mean this is moot.) Considerations for the RX FIFO come into play here, too; if the user hasn't called read_data to empty the RX FIFO after those enqueue_* calls, that data remains in the FIFO when they use a higher-level transfer call.

FWIW this behaviour in a type that implements the embedded_hal blocking traits is quite unexpected and just cost me quite a bit of time in debugging. I do greatly appreciate the more granular/powerful interface above the embedded_hal basics that imxrt-hal provides, but IMO the basic trait implementations shouldn't have surprises like this.

@mciantyre
Copy link
Member

I'm sorry for the trouble. Could you share some pseudocode or a call sequence that demonstrates what you were debugging? I'd like to learn more about what went wrong.

I predominately use those granular interfaces in my projects. Through this thread and others, I accept that I'm not giving the embedded-hal traits the attention that folks need.

@djmdjm
Copy link
Contributor

djmdjm commented Apr 27, 2024

I was just doing a bunch of back-to-back spi.write(), something like:

loop {
  let mut d: [u8; 4] = [  0x12, 0x34, 0x56, 0x78 ];
  spi.write(&mut d).ok();
  let mut d: [u8; 4] = [  0x12, 0xde, 0xad, 0x00 ];
  spi.write(&mut d).ok();
}

only the 0x12 word was making it to the bus because the clear_fifos() call in write_no_read() would cancel the in-progress transfer after only a single word was sent. It was extra confusing for me because both attempted transfers had the same preamble...

@mciantyre
Copy link
Member

Thanks for sharing. I see that, no matter the busy indication, we don't need to clear the FIFOs.

#157 is my attempt at fixing the busy error for the LPSPI driver. The hardware example includes a test for overlapped writes. If folks have time to test that change, it should be backwards compatible. And if anyone wants to contribute their simpler patches to address this issue, I'm happy to prefer those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants