Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: Expand Volume (Controller only) #18359

Merged
merged 1 commit into from Sep 14, 2023
Merged

CSI: Expand Volume (Controller only) #18359

merged 1 commit into from Sep 14, 2023

Conversation

gulducat
Copy link
Member

@gulducat gulducat commented Aug 29, 2023

Addresses #10324 with declarative config (instead of a new imperative command/api route).

I still need to update docs, but to trigger a volume increase, modify the volume specification that was used initially to create/register the volume with a higher capacity_min (and optionally capacity_max), then issue create or register again.

This introduces a comparatively very slow process into the create/register flow (if expand is actually needed), and modifies the behavior of create to be more idempotent than it currently is (presently it always issues create to the controller plugin).

I hemmed and hawed a lot on where the validation logic should live, so I'd be happy to hear any suggestions to move any of it. I mainly want to avoid putting it right before writing to raft, because by that point reality (a real live disk in the world) may have already changed.

The commit history is messy right now; I intend to squash it before merging.

So far this covers ControllerExpandVolume to increase volumes outside of the instance (e.g. an AWS EBS volume), but does not yet complete the process (when required) with NodeExpandVolume on the node where the volume is mounted. I think I'll make that a separate PR.

client/csi_endpoint.go Outdated Show resolved Hide resolved
CSIControllerQuery
}

func (req *ClientCSIControllerExpandVolumeRequest) ToCSIRequest() *csi.ControllerExpandVolumeRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to perform a nil check on the req here, or is that always non-nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see some other nil checks in similar methods in this file, so maybe there's a reason for them? The only place this one is called (currently) already assumes non-nil to get req.PluginID, so I think possible-nil-ness would be bad there too, unless checked prior to that? I'm not sure how much our RPC boundaries increase the possibility of that kind of nil pointer...

nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
if vol == nil || plugin == nil || capacity == nil {
return errors.New("unexpected nil value")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expand (hehe) on this error? I wonder how these nil values can manifest, and if they are user related, whether we can provide a more actionable message.

Copy link
Member Author

@gulducat gulducat Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never happen -- this is my fear of panics showing. I actually wonder if a panic would be preferable? because at least then it'd provide a full stack trace if there is an unexpected bug...

but server panics are especially scary to me, and extra-especially in a spot like this, which might not get tripped until some poor operator is trying to do a reasonable thing but ends up taking down a whole server, which might be concurrently doing other important things. 😬

nomad/csi_endpoint.go Outdated Show resolved Hide resolved
@gulducat gulducat requested a review from tgross September 6, 2023 22:45
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Show resolved Hide resolved
@@ -228,7 +231,7 @@ func (v *CSIVolume) Get(args *structs.CSIVolumeGetRequest, reply *structs.CSIVol
return v.srv.blockingRPC(&opts)
}

func (v *CSIVolume) pluginValidateVolume(req *structs.CSIVolumeRegisterRequest, vol *structs.CSIVolume) (*structs.CSIPlugin, error) {
func (v *CSIVolume) pluginValidateVolume(vol *structs.CSIVolume) (*structs.CSIPlugin, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm looking at this signature, I'm realizing this is a very silly method name. I suspect at one time it did validate the volume somehow, but we never changed the name?

nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/csi_endpoint.go Outdated Show resolved Hide resolved
nomad/state/state_store.go Show resolved Hide resolved
nomad/structs/csi.go Outdated Show resolved Hide resolved
nomad/structs/csi.go Outdated Show resolved Hide resolved
plugins/csi/plugin.go Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

nomad/state/state_store.go Outdated Show resolved Hide resolved
the first half of volume expansion,
this allows a user to update requested capacity
("capacity_min" and "capacity_max") in a volume
specification file, and re-issue either Register
or Create volume commands (or api calls).

the requested capacity will now be "reconciled"
with the current real capacity of the volume,
issuing a ControllerExpandVolume RPC call
to a running controller plugin, if requested
"capacity_min" is higher than the current
capacity on the volume in state.

csi spec:
https://github.com/container-storage-interface/spec/blob/c918b7f/spec.md#controllerexpandvolume

note: this does not yet cover NodeExpandVolume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants