Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: statle smb mount issue when smb file share is deleted and then unmount #121851

Merged
merged 1 commit into from Nov 16, 2023

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Nov 13, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

fix: statle smb mount issue when smb file share is deleted and then unmount

there is behavior change from linux kernel 5.15.0-1051-azure (or earlier version), when smb file share is deleted, the upstream mount-utils depends on following logic to check whether smb mount is in stale state, while starting from the former kernel version, it returns resource temporarily unavailable error instead of ErrNotExist, thus this PR tries to add a new judgement in the mount-utils in IsCorruptedMnt func, here EWOULDBLOCK is resource temporarily unavailable error.

{11, "EWOULDBLOCK", "resource temporarily unavailable"},
  • original logic (won't work now since it now returns it returns resource temporarily unavailable error instead of ErrNotExist in kernel)

    } else if errors.Is(err, fs.ErrNotExist) {
    err = syscall.Access(path, syscall.F_OK)
    if err == nil {
    // The access syscall says the file exists, the stat syscall says it
    // doesn't. This was observed on CIFS when the path was removed at
    // the server somehow. POSIX calls this a stale file handle, let's fake
    // that error and treat the path as existing but corrupted.
    klog.Warningf("Potential stale file handle detected: %s", path)
    return true, syscall.ESTALE
    }
    return false, nil

  • currunt error message is like following:

kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = failed to unmount target /var/lib/kubelet/pods/bf4554ca-de3d-43e3-8d93-f35f14406e4f/volumes/kubernetes.io~csi/volume1/mount: Error checking path: stat /var/lib/kubelet/pods/bf4554ca-de3d-43e3-8d93-f35f14406e4f/volumes/kubernetes.io~csi/volume1/mount: resource temporarily unavailable
  • how to repro this issue
  1. mount smb file share using csi driver or subPath volume
  2. delete remote smb file share
  3. delete pod, and unmount would be stuck forever (pod in terminating state forever)
  • workaround
    force delete the pod

  • impact

This issue does not only break CSI drivers, it also breaks on pods with subPath smb volume, error msg is like following, this requires a patch version fix of kubelet since subPath unmount does not go through CSI driver.

Error: error cleaning subPath mounts for volume "volume1" (UniqueName: "kubernetes.io/csi/644d3846-709a-4165-8b13-c1003409d588-volume1") pod "644d3846-709a-4165-8b13-c1003409d588" (UID: "644d3846-709a-4165-8b13-c1003409d588") : error processing /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath: error cleaning subpath mount /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath/0: Error checking path: stat /var/lib/kubelet/pods/644d3846-709a-4165-8b13-c1003409d588/volume-subpaths/volume1/helloworld-mount-subpath/0: resource temporarily unavailable

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix: statle smb mount issue when smb file share is deleted and then unmount

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fix: statle smb mount issue when smb file share is deleted and then unmount

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 13, 2023
@andyzhangx
Copy link
Member Author

/kind bug
/priority important-soon
/sig storage
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 13, 2023
@andyzhangx
Copy link
Member Author

/assign @jingxu97 @msau42

@andyzhangx
Copy link
Member Author

@jingxu97 @msau42 could you review this PR asap? I would like to cherrypick this fix to earlier versions, thanks.

@andyzhangx
Copy link
Member Author

@jsafrane can you take a look? thanks.

@jsafrane
Copy link
Member

/lgtm
/approve
/milestone v1.29
This is serious enough for some kernels and at the same time it should not break anything else.

@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Nov 16, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b211ee78f42277554b58605565bcd967226a3441

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2023
@k8s-ci-robot k8s-ci-robot merged commit 8509ab8 into kubernetes:master Nov 16, 2023
14 checks passed
k8s-ci-robot added a commit that referenced this pull request Dec 18, 2023
…121851-upstream-release-1.26

Automated cherry pick of #121851: fix: smb file share unavailable issue when it's deleted
k8s-ci-robot added a commit that referenced this pull request Dec 18, 2023
…121851-upstream-release-1.28

Automated cherry pick of #121851: fix: smb file share unavailable issue when it's deleted
k8s-ci-robot added a commit that referenced this pull request Dec 21, 2023
…121851-upstream-release-1.27

Automated cherry pick of #121851: fix: smb file share unavailable issue when it's deleted
@andyzhangx
Copy link
Member Author

this issue is fixed in v1.26.13, v1.27.10, v1.28.6, v1.29.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants