New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix subpath issues with orphaned pod cleanup #72291

Merged
merged 2 commits into from Dec 27, 2018

Conversation

@msau42
Copy link
Member

msau42 commented Dec 22, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
Check for existence of volume-subpaths directory before cleaning up the pod directory

Which issue(s) this PR fixes:

Fixes #72257

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixes issue where subpath volume content was deleted during orphaned pod cleanup for Local volumes that are directories (and not mount points) on the root filesystem.
@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 22, 2018

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 22, 2018

/priority critical-urgent

@@ -55,6 +55,7 @@ const (
fsckErrorsUncorrected = 4

// place for subpath mounts
// TODO: pass in directory using kubelet_getters instead

This comment has been minimized.

@msau42

msau42 Dec 22, 2018

Member

I opened up #72290 for refactoring and fixing these inter-pkg dependency issues

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Dec 24, 2018

pull-kubernetes-verify is flaky, which is tracked as #71394

/retest

@@ -99,6 +99,13 @@ func (kl *Kubelet) getPodDir(podUID types.UID) string {
return filepath.Join(kl.getPodsDir(), string(podUID))
}

// getPodVolumesDir returns the full path to the per-pod data directory under

This comment has been minimized.

@jingxu97

jingxu97 Dec 24, 2018

Contributor

modify the comments?

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

fixed

@@ -128,6 +128,18 @@ func (kl *Kubelet) cleanupOrphanedPodDirs(pods []*v1.Pod, runningPods []*kubecon
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("Orphaned pod %q found, but volume paths are still present on disk", uid))
continue
}

This comment has been minimized.

@jingxu97

jingxu97 Dec 24, 2018

Contributor

In the above podVolumesExist() checking, it calls getMountedVolumePathListFromDisk() to check whether there are mounted volumes exist. And then it calls getPodVolumePathListFromDisk to check whether there are still volume paths exist. I think there are some redundancy here.

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

it might be redundant, but I don't want to remove it as part of this bug fix, as this code is not very well tested.

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

I'll leave it comment that this may be able to be cleaned up

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

Opened up #72346

@msau42 msau42 force-pushed the msau42:fix-subpath-orphan branch from 6bfc578 to 639278a Dec 26, 2018

@@ -304,7 +311,7 @@ func (kl *Kubelet) getMountedVolumePathListFromDisk(podUID types.UID) ([]string,
return mountedVolumes, err
}
for _, volumePath := range volumePaths {
isNotMount, err := kl.mounter.IsLikelyNotMountPoint(volumePath)
isNotMount, err := kl.mounter.IsNotMountPoint(volumePath)

This comment has been minimized.

@saad-ali

saad-ali Dec 26, 2018

Member

Per documentation:

// IsNotMountPoint is more expensive than IsLikelyNotMountPoint.

How frequently is getPodVolumeSubpathsDir(...) called? Want to be careful we don't introduce a performance regression.

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

I'm going to remove this commit. Changing to IsNotMountPoint is not strictly necessary to fix this particular bug.

But we should investigate other callers of these functions to see if this may be wrong in other contexts.

This comment has been minimized.

@msau42

msau42 Dec 26, 2018

Member

Opened up #72347 to investigate all uses of IsLikelyNotMountPoint

This comment has been minimized.

@saad-ali

This comment has been minimized.

@timoha

timoha Dec 28, 2018

why is it necessary to call IsLikelyNotMountPoint function at all then? There's no protection that it won't end up removing data if mount is from the same device anyway (which was the case for the bug we hit), so if you are ok with that, it seems like calling the function just gives false sense of confidence.

This comment has been minimized.

@msau42

msau42 Dec 28, 2018

Member

IsLikelyNotMountPoint still works for most other volume types that are actually different devices.

The reason why I didn't want to change this function here is because it's being called in other paths as well, and we only wanted to do a targeted fix for this particular issue. We can do a more holistic investigation in #72346 and #72347 for further areas to cleanup.

@msau42 msau42 force-pushed the msau42:fix-subpath-orphan branch from 639278a to 3ebbbbd Dec 26, 2018

@saad-ali

This comment has been minimized.

Copy link
Member

saad-ali commented Dec 26, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 26, 2018

@saad-ali

This comment has been minimized.

Copy link
Member

saad-ali commented Dec 26, 2018

/test pull-kubernetes-local-e2e-containerized

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 26, 2018

/retest

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 26, 2018

I manually tested this by:

  1. Create a pod with a subpath volume
  2. Create a directory and random file under /var/lib/kubelet/pods/poduid/volume-subpaths
  3. Delete the pod
  4. Verify the directories and files under 2) were not deleted, and that this log message is printed:
kubelet_volumes.go:154] Orphaned pod "5799cca2-0956-11e9-b2f6-42010a800002" found, but volume subpaths are still present on disk
@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 26, 2018

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 26, 2018

@msau42: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 3ebbbbd link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Dec 26, 2018

/assign @Random-Liu
for kubelet review

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Dec 27, 2018

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 27, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, Random-Liu, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -128,6 +130,18 @@ func (kl *Kubelet) cleanupOrphanedPodDirs(pods []*v1.Pod, runningPods []*kubecon
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("Orphaned pod %q found, but volume paths are still present on disk", uid))
continue
}

// If there are any volume-subpaths, do not cleanup directories

This comment has been minimized.

@krmayankk

krmayankk Dec 27, 2018

Contributor

i do not fully follow why we should not cleanup directories if volume-subpaths exists ? May be add some additional comments to explain that

This comment has been minimized.

@msau42

msau42 Dec 27, 2018

Member

It means that all the volumes have not been cleanly unmounted yet

@k8s-ci-robot k8s-ci-robot merged commit 68451f3 into kubernetes:master Dec 27, 2018

18 of 19 checks passed

pull-kubernetes-local-e2e-containerized Job failed.
Details
cla/linuxfoundation msau42 authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details

k8s-ci-robot added a commit that referenced this pull request Jan 5, 2019

Merge pull request #72380 from msau42/automated-cherry-pick-of-#72291…
…-upstream-release-1.12

Automated cherry pick of #72291: Check for volume-subpaths directory in orpahaned pod

k8s-ci-robot added a commit that referenced this pull request Jan 5, 2019

Merge pull request #72379 from msau42/automated-cherry-pick-of-#72291…
…-upstream-release-1.13

Automated cherry pick of #72291: Check for volume-subpaths directory in orpahaned pod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment