-
Notifications
You must be signed in to change notification settings - Fork 63
Local storage 2/4: Add the new disk type #9409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add the ability to create LocalStorage type Disks. From the API this means that the DiskCreate struct has been changed into an enum, with variants for Crucible and LocalStorage. LocalStorage disks will be attached and detached just like Crucible disks, but will (currently) not support any of the other operations on a disk (snapshot, import, etc). Note that this is a breaking change to the Nexus external API - see nexus/types/src/external_api/params.rs for more on this. Note too that `content = "content"` was added to `InstanceDiskAttachment` because otherwise the type wouldn't pass the openapi lint step. A separate "local storage dataset allocation" row will be created during instance allocation in the next PR, slicing a portion of the zpool's local storage dataset. This will be delegated to the Propolis zone and used as file backed block storage for an NVMe device. Using a zvol this way was both measured to be the fastest in terms of 4k IOPs, and have the least amount of standard deviation in those IOPs measurements, according to fio runs.
the |
hawkw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this makes sense and is pretty straightforward. I had a bunch of obnoxious nitpicks, sorry! 😅
| let Some(instance) = instances.into_iter().next() else { | ||
| bail!("no instance: {} found", instance_uuid); | ||
| }; | ||
|
|
||
| let instance_name = instance.instance().name().to_string(); | ||
|
|
||
| if instance.vmm().is_some() { | ||
| let propolis_id = | ||
| instance.instance().runtime().propolis_id.unwrap(); | ||
| let my_sled_id = instance.sled_id().unwrap(); | ||
|
|
||
| let (_, my_sled) = LookupPath::new(opctx, datastore) | ||
| .sled_id(my_sled_id) | ||
| .fetch() | ||
| .await | ||
| .context("failed to look up sled")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the cases where the instance doesn't exist, or the sled doesn't exist, that's probably an invalid state. but since OMDB is a debugging tool, i think it would be nicer if we didn't bail out here and just printed that we couldn't find the instance/sled and keep going so that we can still get partial output. OMDB ought to be useable in cases where we have put nonsensical state in the database, if possible.
| // Don't unwrap this - the size of this disk is a parameter set by an | ||
| // API call, and we don't want to panic on out of range input. | ||
| let required_dataset_overhead = | ||
| external::ByteCount::try_from(overhead)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should probably add some kind of context to this error in the case that it bubbles up to the user. the TryFrom error will say "value is too small for a byte count" or "too large for a byte count", but we should tell the user that it's not the byte count they provided for the disk's size, it's something we computed based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too, but there's currently no way that this would get out to a user from a saga (without querying for specific node's outputs, which would be brittle to changes in the saga itself). The way we've structured things is that (after some validation of the parameters) the disk create saga takes in the params::DiskCreate from the user and does all the work of creating a disk in the saga, and we only return a 500 if saga execution fails.
I put an additional step in validate_disk_create_params that tries to create this record outside the saga, and can return a better error if there's a failure in 135f6a9.
| static PROJECT_NAME: &str = "springfield-squidport"; | ||
|
|
||
| fn get_project_selector() -> String { | ||
| format!("project={}", PROJECT_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turbo-nit:
| format!("project={}", PROJECT_NAME) | |
| format!("project={PROJECT_NAME}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hawkw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new error messages and such look great, thanks!
nexus/src/app/disk.rs
Outdated
| .lookup_node_output::<db::datastore::Disk>("created_disk") | ||
| .map_err(|e| Error::internal_error(&format!("{:#}", &e))) | ||
| .map_err(|e| Error::InteralError { | ||
| internal_message: format!("{:#}", &e), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| internal_message: format!("{:#}", &e), | |
| internal_message: format!("{e:#}",), |
(surprised clippy didn't whine about that, i guess the & kept it off your trail...)
nexus/src/app/disk.rs
Outdated
| internal_message: format!( | ||
| "error sending bulk write to pantry: {}", | ||
| e, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again i think clippy would likely prefer:
| internal_message: format!( | |
| "error sending bulk write to pantry: {}", | |
| e, | |
| internal_message: format!( | |
| "error sending bulk write to pantry: {e}", |
| } | ||
|
|
||
| /// Different sources for a disk | ||
| /// Different sources for a Crucible Disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Different sources for a Crucible Disk | |
| /// Different sources for a Distributed Disk |
I don't think we want to educate customers about code names generally
| /// Describe the instance's disks at creation time | ||
| #[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)] | ||
| #[serde(tag = "type", rename_all = "snake_case")] | ||
| #[serde(tag = "type", content = "content", rename_all = "snake_case")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make this adjacently tagged rather than internally tagged as it previously was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the openapi linter fails if I don't!
thread 'main' panicked at /home/james/.cargo/git/checkouts/openapi-lint-d70c5ef8a437b633/ef442ee/src/lib.rs:142:48:
not yet implemented: complex 'any' schema not handled AnySchema {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you investigate that? Can you send me the full output please?
| #[serde(tag = "type", rename_all = "snake_case")] | ||
| pub enum DiskCreate { | ||
| Crucible { | ||
| /// The common identifying metadata for the disk | ||
| #[serde(flatten)] | ||
| identity: IdentityMetadataCreateParams, | ||
|
|
||
| /// The initial source for this disk | ||
| disk_source: DiskSource, | ||
|
|
||
| /// The total size of the Disk (in bytes) | ||
| size: ByteCount, | ||
| }, | ||
|
|
||
| LocalStorage { | ||
| /// The common identifying metadata for the disk | ||
| #[serde(flatten)] | ||
| identity: IdentityMetadataCreateParams, | ||
|
|
||
| /// The total size of the Disk (in bytes) | ||
| size: ByteCount, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggestion:
- let's make this back into a
struct - identity and size is going to be common on any kind of disk
- we could have a new enum like this that we `#[serde(flatten)]:
#[serde(tag = "disk_type")]
enum DiskDetails {
InstanceLocal {
// There are no local storage parameters... ... ... ... yet.
},
#[serde(untagged)]
Distributed {
disk_type: Option<TypeThatCanOnlyBeTheStringDistributed>,
disk_source: DiskSource,
}
}- having
disk_typein the untaggedDistributedvariant allows for backward compat (although we'll probably need to update drift) - One could imagine how 3rd-party storage might fit in
- Is there some way / does there need to be some way to attach an instance to an existing local disk? Presumably one would want to do this at instance-create time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum DiskDetails
I also tried this way originally (using the same name even!), but after talking, @david-crespo and I decided on using the higher level enum to push the logical difference to the highest level possible, ending up with the change to DiskCreate shown here.
having disk_type in the untagged Distributed variant allows for backward compat (although we'll probably need to update drift)
This is backwards compatible, but for versioning I'd rather either get #9430 merged or use the new DiskCreate enum with a separate /v2/ path instead - for the Disk model there was a backwards compatible change to add the disk_type field but I can see that changing in the future too (for example maybe we change Disk from a struct to an enum instead of having Optional fields for all the Crucible stuff?).
Is there some way / does there need to be some way to attach an instance to an existing local disk?
Attach and detach just work as they're based only on disk id, and the previous PR to this changed the idea of Disk to apply to both types. Most user code won't have to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there was discussion about changing types with respect to SDK generation for other programming languages (e.g., Go) so I wanted to add some color. These rich enums with struct variants don't map well into Go and that'll likely never change given Go's stance on, well, everything.
That being said, we're already dealing with these sorts of enums in other parts of the API (e.g., DiskSource). We do that with a pattern called "fat struct" where the struct used for serialization/deserialization contains fields for all the enums variants. This works well overall but has some issues. For example, no type safety for a given variant and this pattern doesn't work if variants have the same field name of differing types. We're actively discussing how to improve the Go SDK to more reliably generate Go types from the OpenAPI specification in RFD 614.
I appreciate that people are thinking about the effects of their changes in SDK generation for other languages. I would recommend using the types that make sense for the server API code if it's going to be more correct and use established patterns on the server side. The SDK generation is a consideration, sure, but we will build more established patterns for these sorts of edge cases that don't map well into other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried this way originally (using the same name even!), but after talking, @david-crespo and I decided on using the higher level enum to push the logical difference to the highest level possible...
What was the rationale?
This is backwards compatible, but for versioning I'd rather either get #9430 merged or use the new DiskCreate enum with a separate /v2/ path instead
Why would you prefer to do that?
The v1 part of the path isn't really intended for that use, but we could add a v2 of the instance provision operation (instance_create_v2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you can make it work, I find it easier to think about a type that is cleanly A or B rather than some fields shared and some not. I also thought — it turns out wrongly — that that was more typical in our create params schemas. After seeing this comment I thought about it and realized that probably isn't true. I had Claude go through the OpenAPI schema and confirm it. So I am persuaded. Especially since it looks like the fields that go with Local are currently a subset of the ones on Distributed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL HAIL CLAUDE!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't get the disk create struct with disk details enum to pass the openapi linter either, it's also failing with not yet implemented: complex 'any' schema not handled AnySchema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're actively working on better supporting pattern 2 in the Go SDK generator so I'm onboard with that pattern overall.
Add the ability to create LocalStorage type Disks.
From the API this means that the DiskCreate struct has been changed into an enum, with variants for Crucible and LocalStorage. LocalStorage disks will be attached and detached just like Crucible disks, but will (currently) not support any of the other operations on a disk (snapshot, import, etc).
Note that this is a breaking change to the Nexus external API - see nexus/types/src/external_api/params.rs for more on this. Note too that
content = "content"was added toInstanceDiskAttachmentbecause otherwise the type wouldn't pass the openapi lint step.A separate "local storage dataset allocation" row will be created during instance allocation in the next PR, slicing a portion of the zpool's local storage dataset. This will be delegated to the Propolis zone and used as file backed block storage for an NVMe device. Using a zvol this way was both measured to be the fastest in terms of 4k IOPs, and have the least amount of standard deviation in those IOPs measurements, according to fio runs.