Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spi: A reSPIte from excessive error codes #1656

Merged

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Mar 13, 2024

Currently, the SPI API has a single big error enum which it returns from
all IPCs. This isn't all that great: for example, a NothingToRelease
error can be returned from the Spi::read IPC, which...doesn't make a
ton of sense. Additionally, many of these error variants are returned
only in the event of a programmer error in a client task, rather than
expected runtime errors, and probably should be reply-faults rather
than error codes.

Per @cbiffle on Matrix:

Re: the SPI errors, we've got

  • TaskRestarted - sure makes sense, also probably not appropriate to
    unwrap though
  • NothingToRelease - should be reply-fault, indicates that you're
    acting like you've locked the device when you haven't.
  • BadDevice - should be reply-fault, indicates that you've made up a
    device index, that's what config is for
  • BadTransferSize - this one's subtle. The max transfer is
    platform-specific, and there's currently no way to discover it. We
    could treat this as a precondition and reply-fault it, but code that
    wants to be diligent and check preconditions currently has no way to
    do so! We could also define a common max value in the API and insist
    upon it on all platforms, at which point this would be reply-fault.

Therefore, I've refactored the SpiError type to remove the
NothingToRelease and BadDevice variants, and changed the Spi IPC
interface so that the lock and release APIs only return runtime
errors on server death --- the SpiError type is now only returned by
Spi::read, Spi::write, and Spi::exchange. In situations where
BadDevice or NothingToRelease errors would previously have been
returned, the SPI server now sends a reply-fault instead.

The SPI API is a bit different from other driver APIs, as there's a
provision for running the SPI driver locally when a task has exclusive
ownership over a specific SPI bus, as well as running it as its own
driver task. In this case, the cases which would return reply-fault when
the driver is its own task now panic when the SPI interface's driver is
local to another task.

This reduces the total archive size for gimlet/rev-f.toml by 1%
(according to cargo xtask sizes), which is, honestly, not that much.
But, it still seems worthwhile.

@hawkw hawkw requested review from cbiffle and mkeeter March 13, 2024 17:53
@hawkw hawkw force-pushed the eliza/how-i-learned-to-stop-worrying-and-love-the-reply-fault branch from 7a37449 to 2cf1702 Compare March 13, 2024 18:05
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty great @hawkw. Unfortunately, we plumbed up the GwSpiError to MGS and so changing the discriminant values may result in some weird errors.

@andrewjstone
Copy link
Contributor

This looks pretty great @hawkw. Unfortunately, we plumbed up the GwSpiError to MGS and so changing the discriminant values may result in some weird errors.

Hmm, Maybe it's fine, because of the match in the From.

@hawkw
Copy link
Member Author

hawkw commented Mar 13, 2024

This looks pretty great @hawkw. Unfortunately, we plumbed up the GwSpiError to MGS and so changing the discriminant values may result in some weird errors.

This PR doesn't actually change the definition of GwSpiError at all (it's not defined in Hubris, so it can't...); it just changes the discriminant values of the Hubris types that are converted into GwSpiErrors. The actual GwSpiErrors shouldn't have changed at all, so if I understand correctly, this is fine.

I can also change the discriminants for the Hubris error enums back, if you like, but (AFAICT) it shouldn't make a difference?

@jgallagher
Copy link
Contributor

This looks pretty great @hawkw. Unfortunately, we plumbed up the GwSpiError to MGS and so changing the discriminant values may result in some weird errors.

Hmm, Maybe it's fine, because of the match in the From.

FWIW (maybe less than nothing), this is why we landed on "duplicate the types and implement From". We should be able to make changes on the hubris side and find out in the From impl whether they're compatible without having to worry about MGS getting confused.

@andrewjstone
Copy link
Contributor

This looks pretty great @hawkw. Unfortunately, we plumbed up the GwSpiError to MGS and so changing the discriminant values may result in some weird errors.

This PR doesn't actually change the definition of GwSpiError at all (it's not defined in Hubris, so it can't...); it just changes the discriminant values of the Hubris types that are converted into GwSpiErrors. The actual GwSpiErrors shouldn't have changed at all, so if I understand correctly, this is fine.

I can also change the discriminants for the Hubris error enums back, if you like, but (AFAICT) it shouldn't make a difference?

Nope, it's all good! Leave it as is. As @jgallagher reminded me, we did this on purpose :D

@hawkw hawkw merged commit d226a73 into master Mar 13, 2024
83 checks passed
hawkw added a commit that referenced this pull request Mar 25, 2024
Currently, the `gimlet_seq` IPC API has a bunch of IPCs that return
errors that aren't really necessary. The `SeqError` enum is returned by
all the sequencer's IPCs, but most of its variants are only ever
relevant to the `Sequencer::set_state` IPC. The
`Sequencer::read_fpga_regs` only ever returns a
`SeqError::ReadRegsFailed` error if the SPI driver returns an error, and
the other IPCs don't return errors at all.

This branch cleans this up a bit by removing all the error return values
from the IPC operations that don't return errors. Now that PR #1656 has
been merged, the `Spi::read` IPC now only errors if the SPI driver is
restarted or if the transfer is too large. Since
`Sequencer::read_fpga_regs` only does small reads, we can now unwrap
this and make that IPC infallible as well. And, since the
`Sequencer::read_fpga_regs` and `Sequencer::get_state` just read a value
and return it, I figured that we could make them idempotent, as well.
This makes the IPC interface much simpler and allows several callers of
it to avoid handling errors that aren't ever actually returned.

Closes #1600, which this obsoletes.
hawkw added a commit that referenced this pull request Mar 29, 2024
Currently, the `gimlet_seq` IPC API has a bunch of IPCs that return
errors that aren't really necessary. The `SeqError` enum is returned by
all the sequencer's IPCs, but most of its variants are only ever
relevant to the `Sequencer::set_state` IPC. The
`Sequencer::read_fpga_regs` only ever returns a
`SeqError::ReadRegsFailed` error if the SPI driver returns an error, and
the other IPCs don't return errors at all.

This branch cleans this up a bit by removing all the error return values
from the IPC operations that don't return errors. Now that PR #1656 has
been merged, the `Spi::read` IPC now only errors if the SPI driver is
restarted or if the transfer is too large. Since
`Sequencer::read_fpga_regs` only does small reads, we can now unwrap
this and make that IPC infallible as well. And, since the
`Sequencer::read_fpga_regs` and `Sequencer::get_state` just read a value
and return it, I figured that we could make them idempotent, as well.
This makes the IPC interface much simpler and allows several callers of
it to avoid handling errors that aren't ever actually returned.

Closes #1600, which this obsoletes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants