Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VolumeExpansion ONLINE/OFFLINE capability is not checked #62

Closed
xing-yang opened this issue Oct 26, 2019 · 13 comments
Closed

VolumeExpansion ONLINE/OFFLINE capability is not checked #62

xing-yang opened this issue Oct 26, 2019 · 13 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@xing-yang
Copy link
Contributor

When a CSI driver reports PluginCapability_VolumeExpansion_OFFLINE only, external-resizer still calls ControllerExpandVolume when the PVC is used by a Pod.

Instead, the extenal-resizer should check VolumeExpansion plugin capability and only invoke ControllerExpandVolume when PluginCapability_VolumeExpansion_ONLINE is supported by the CSI driver.

@xing-yang
Copy link
Contributor Author

assign @gnufied

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2020
@xing-yang
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2020
@gnufied
Copy link
Contributor

gnufied commented Feb 4, 2020

/assign @gnufied

@xing-yang
Copy link
Contributor Author

/assign @gnufied

@gnufied
Copy link
Contributor

gnufied commented Feb 4, 2020

This issue is still on our radar. The reason, we have been hesitant in fixing this is because - any checks we perform in external-resizer to determine if volume is in-use before making controller-expansion call, will be racy.

We can either perform a best-case check in external-resizer but since that will be still subject to races, the plugins must be able to handle such expansion calls and return grpc errors. So volume expansion requests will still be retried, the only difference will be which component returns errors. If external-resizer checks for volume in-use before calling ControllerExpandVolume it will return errors, but if plugin does receive the call (even if volume is in-use), it is the one that will have to return an error.

@xing-yang
Copy link
Contributor Author

That sounds good. If external-resizer can check if volume is in-use before calling ControllerExpandVolume, that should cover most cases. In the corner case if the call still reaches the plugin, the plugin will return an error.

@gnufied
Copy link
Contributor

gnufied commented Apr 3, 2020

Summary of discussion on April 2nd 2020:

We discussed the option of doing a best case check but agreed that - it could worsen the problem where a driver author might falsely (after limited testing) assume that ControllerExpandVolume will not be called if volume is in-use. In the end - driver does need to handle when this happens and return following error as per CSI spec:

| Volume in use | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the 
specified `volume_id` could not be expanded because it is currently published on a node but the 
plugin does not have ONLINE expansion capability. | Caller SHOULD ensure that volume is not 
published and retry with exponential back off. |

Elsewhere in CSI spec though - SHALL and MUST word is used to describe OFFLINE expansion. I have filed container-storage-interface/spec#423 to discuss this in CSI community meeting.

@gnufied
Copy link
Contributor

gnufied commented Apr 3, 2020

Another thing is - even checking resizing condition while doing mount or attach suffers from race conditions that are mentioned in - #62 (comment) so aren't entirely fullproof.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 1, 2020
@xing-yang
Copy link
Contributor Author

This is fixed.

@niladrih
Copy link

I think, this behaviour is still present as of kubernetes v1.27.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
Development

No branches or pull requests

5 participants