New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: corrupted mount point in csi driver node stage/publish #88569
Conversation
2bdc7e9
to
beab1ed
Compare
/milestone v1.18 |
add test fix build failure and bazel fix golint
beab1ed
to
5a6435a
Compare
/test pull-kubernetes-integration |
@@ -228,13 +228,19 @@ func (c *csiAttacher) MountDevice(spec *volume.Spec, devicePath string, deviceMo | |||
return errors.New(log("attacher.MountDevice failed, deviceMountPath is empty")) | |||
} | |||
|
|||
corruptedDir := false | |||
mounted, err := isDirMounted(c.plugin, deviceMountPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we don't even need to do os.MkdirAll(deviceMountPath, 0750)
, all leave csi driver to handle that logic:
kubernetes/pkg/volume/csi/csi_attacher.go
Lines 290 to 293 in e4a5012
if err = os.MkdirAll(deviceMountPath, 0750); err != nil { | |
return errors.New(log("attacher.MountDevice failed to create dir %#v: %v", deviceMountPath, err)) | |
} | |
klog.V(4).Info(log("created target path successfully [%s]", deviceMountPath)) |
this looks like a big behavior change, is it ok to do this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with removing the mounted check, but it requires uncertain mount fix to work reliably, so we won't be able to backport that change to versions without uncertain fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to back port this fix to old release, so shall we go with this PR now?
about removing the mounted check
, I could work out another PR that won't be back ported, is that ok? @gnufied thanks.
/assign @jsafrane |
@jsafrane could you take a look? thanks. |
BTW, kubernetes-sigs/blob-csi-driver#117 is an example fix in fuse based driver about how to handle corrupted mount point, so with these two PRs, even fuse driver daemonset is restarted, driver could also work after pod with fuse volume mount restarted. Also this PR not only mitigated fuse driver issues, e.g. for other remote network file system based csi driver, if remote server does not respond transiently, the mount point could be broken, and new pod mount on the same mount point(using same PV) will fail, this PR could also fix also those issues. |
@andyzhangx, @jsafrane, @msau42: |
@smourapina thanks, we are trying. |
I think we can merge this PR as it is so it is safe to backport and then, as a separate PR, remove mount check for releases that have reliable uncertain state of mount. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/retest
…8569-upstream-release-1.15 Automated cherry pick of #88569: fix: corrupted mount point in csi driver
…8569-upstream-release-1.16 Automated cherry pick of #88569: fix: corrupted mount point in csi driver
…8569-upstream-release-1.17 Automated cherry pick of #88569: fix: corrupted mount point in csi driver
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixed the corrupted mount point in csi driver. Detailed description of this issue could be found here: No easy way how to update CSI driver that uses fuse
"
We recommend to use DaemonSet to run CSI drivers on node. If a driver runs fuse daemon, it's almost impossible to update it, as killing a pod with the driver kills the fuse daemons too and it will kill all mounts, possibly corrupting application data.
We need a documented and supported way how to update such CSI drivers. Note that the update process can be manual or the code can live somewhere else, we just need it to to be documented and supported so people don't loose data.
"
With this PR, when fuse based CSI driver daemonset is restarted on the node, original blobfuse mount is broken, this PR would handle broken mount in both
NodeStage
andNodePublish
And I think this issue is not only related to fuse based CSI driver, there could be lots of possibilities that stage, publish mount path is broken, we should leave CSI driver itself to handle corrupted mount point, while current behavior is return error directly, there is no way to let CSI driver to handle corrupted mount point.
E.g. in flexvolume, it would leave flexvol driver to handle corrupted mount point:
kubernetes/pkg/volume/flexvolume/detacher.go
Lines 60 to 61 in e4a5012
And as I could recall, original in-tree driver could also handle corrupted mount point, and now CSI driver changed this behavior, if it's already a corrupted mount point, there is no way in CSI driver to handle this now.
Which issue(s) this PR fixes:
Fixes #70013
Special notes for your reviewer:
/assign @msau42 @davidz627 @saad-ali @gnufied
/priority important-soon
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: