-
Notifications
You must be signed in to change notification settings - Fork 28
NVMe admin commands should better mind their PRPs #466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| prp: cmds::PrpIter, | ||
| data: &T, | ||
| mem: &MemCtx, | ||
| ) -> Option<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not especially fond of using Option<()> here, but it allowed the logic to be more terse for now. I'm working on some improvements to PrpIter and will revisit some of the error handling there.
gjcolombo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've tested this out with a PHD boot loop test and have gotten through about 425 iterations with no apparent boot timeouts or guest service segfaults. To put that in perspective, usually a 50-iteration test run would produce 2-3 guest boot timeouts; now we're at the point where the test appears to hit bugs in the PHD framework before it observes any guest failures.
| /// Write result data from an admin command into host memory | ||
| /// | ||
| /// Returns `Some(())` if successful, else None | ||
| fn write_admin_result<T: Copy>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be neat to enforce the packed-ness of T here, but the most promising looking crate I found to do so (repr-trait) doesn't quite look to me like it would drop into this code neatly (reading through its derives I'm not convinced it will handle the repr(C, packed(1)) repr that NVMe's structs use). Oh well--maybe another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, a write_from_ptr() interface would keep us clear of the potential UB from the slice creation. This should work for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the comment regarding the repr() expectations, so it's at least a bit more visible in the mean time.
The Identify and GetLogPage admin commands in NVMe should not assume that the output buffers provided to them in the PRPs consist of a single page-sized page-aligned entry. Guest (such as Linux) can and will issue those commands with a page offset in PRP1, splitting the output into another page. Fixes oxidecomputer#427
|
I also tested this with a reboot overnight and saw > 2100 successful test results before I turned it off. (Before, I would see about 5-10 segfaults every 300 boots). |
|
I have a PR out for updating Crucible/Propolis in Omicron. If you expect this to go back soon (and I believe you do), then I can hold that That way we won't have two PRs in Omicron to update Propolis |
Thanks for waiting. |
The Identify and GetLogPage admin commands in NVMe should not assume that the output buffers provided to them in the PRPs consist of a single page-sized page-aligned entry. Guest (such as Linux) can and will issue those commands with a page offset in PRP1, splitting the output into another page.
Fixes #427