Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check volume directories instead of mounts for cleanupOrphanedPodDirs #84206

Conversation

mucahitkurt
Copy link
Contributor

What type of PR is this?
/kind cleanup
/sig storage

What this PR does / why we need it:
Related comment; #72346 (comment)

Change podVolumesExist to check volume directories instead of mounted volumes when it's called from Kubelet.cleanupOrphanedPodDirs. Because getMountedVolumePathListFromDisk does not always produce correct results for mounted volumes and Kubelet.cleanupOrphanedPodDirs already used getPodVolumePathListFromDisk.

Which issue(s) this PR fixes:
Related issue #72346

Related issue is quite evolved to fix all podVolumesExist method usage, but we may think this PR a fix for the issue #72346 and all podVolumesExist usage may be fixed within the scope of issue #72347

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

cc @msau42 @cofyc

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Oct 22, 2019
@msau42
Copy link
Member

msau42 commented Oct 23, 2019

/assign @msau42 @jsafrane

@mucahitkurt
Copy link
Contributor Author

I'm looking into build error, when I run ./hack/update-bazel.sh it doesn't update any build file...

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from 3f0cd3c to c7dd1a9 Compare October 23, 2019 20:10
Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small readability comment from me - otherwise, defer to @jsafrane and the folks who work with volumes more :)

@@ -917,7 +917,7 @@ func (kl *Kubelet) PodResourcesAreReclaimed(pod *v1.Pod, status v1.PodStatus) bo
klog.V(3).Infof("Pod %q is terminated, but some containers have not been cleaned up: %s", format.Pod(pod), statusStr)
return false
}
if kl.podVolumesExist(pod.UID) && !kl.keepTerminatedPodVolumes {
if kl.podVolumesExist(pod.UID, false) && !kl.keepTerminatedPodVolumes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we do

checkVolumePaths := false
kl.podVolumesExist(pod.UID, checkVolumePaths)

otherwise, it can be difficult to know what the false means here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added parameter name as a comment for podVolumesExist call. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working with me on this small nit :)

This may just be my opinion, but I feel like the comment offers less advantages than the variable name (i.e. is less explicit). Are there advantages to using a comment over a variable that I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE(goland) uses similar approach to show the parameter names of the function variables and I see similar usages in kubernetes code base, using an additional variable seems little noisy to me, so I choose adding comment.

But I agree, using variable can be more explicit, I'm neutral, I'll add additional variable instead of comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha - maybe part of this is that I'm using vim haha :) Anyway, thanks!+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will let @jsafrane or someone else on sig-storage sign off :)

@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-kubemark-e2e-gce-big

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from c7dd1a9 to 2c1769b Compare October 24, 2019 19:40
@cofyc
Copy link
Member

cofyc commented Oct 25, 2019

/cc @cofyc

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from 2c1769b to 66edee8 Compare October 26, 2019 02:49
@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-verify

@jsafrane
Copy link
Member

I am not sure I understand the PR. If we expect that all volume plugins remove their directories in their TearDown (that's what cleanupOrphanedPodDirs assumes now), why we are checking mounts in all cleanupOrphanedPodDirs(..., false)` calls? Shouldn't be the check the same everywhere?

@mucahitkurt
Copy link
Contributor Author

I am not sure I understand the PR. If we expect that all volume plugins remove their directories in their TearDown (that's what cleanupOrphanedPodDirs assumes now), why we are checking mounts in all cleanupOrphanedPodDirs(..., false)` calls? Shouldn't be the check the same everywhere?

Other callers of podExists method, cleanupOrphanedPodCgroups, PodResourcesAreReclaimed, assume that all plugins unmount their directories but don't assume that all plugins remove their directories. So we are not sure to change this behavior because it can cause not deleting a pod.

But cleanupOrphanedPodDirs was checking both mounts(via podVolumeExist call) and directory existence(via getPodVolumePathListFromDisk) before this PR, but now it's only checking directory existence, so I removed a redundant check and the behavior of the method doesn't change.

@jsafrane
Copy link
Member

jsafrane commented Nov 1, 2019

cleanupOrphanedPodCgroups,PodResourcesAreReclaimed, assume that all plugins unmount their directories but don't assume that all plugins remove their directories.

Now we assume (in cleanupOrphanedPodDirs), that volume plugins do remove their directories.

I see you want be on the safe side and it's probably correct at this stage of 1.17, but should we have the same assumption everywhere in the end?

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2019
@mucahitkurt
Copy link
Contributor Author

Now we assume (in cleanupOrphanedPodDirs), that volume plugins do remove their directories.

I see you want be on the safe side and it's probably correct at this stage of 1.17, but should we have the same assumption everywhere in the end?

/lgtm
/approve

I think so, this's first suggestion of @cofyc. I think we are not confident about to depending the cleanup behavior of plugins in terms of removing directories but currently these methods already depend the cleanup behavior of plugins in terms of unmounting directories. May be @msau42 or @cofyc can express better than me.

@mucahitkurt
Copy link
Contributor Author

/assign @vishh

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from 66edee8 to 42fa439 Compare November 14, 2019 19:39
@mucahitkurt
Copy link
Contributor Author

@mucahitkurt sorry for the delay. Could you pls rebase your code? I will review it now.

Hey @jingxu97 I have rebased this PR, PTAL.

@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-bazel-test

@jingxu97
Copy link
Contributor

jingxu97 commented Oct 10, 2020

@mucahitkurt sorry for the delay. I think the main logic here is changing the ordering of checking things. There are three types of checking for pod volumes existence, check in-memory cache, check disk directories, and check whether each dir is a mount point or not.

  1. kl.volumeManager.GetMountedVolumesForPod(podUid)
    This will check in-memory cache whether volumes are still mounted. This check is fast, but might not be completely reliable when kubelet restarts and lost its cache (there is a reconstruction phase for cache from scanning dirs on disk, but I need to check whether there is guarantee when cleanupOrphanedPodDirs is called, cache is already constructed)

  2. kl.getPodVolumePathListFromDisk(podUid)
    This function directly scans the disk directory and returns a list of the volume paths for the given pod (e.g., /var/lib/kubelet/pods/pod_uid/volumes/...)

  3. kl.getMountedVolumePathListFromDisk(podUID)
    This function first calls getPodVolumePathListFromDisk to get the list of volume paths for the given pod, and check each path whether is a mount point or not using IsLikelyNotMountPoint.
    There are two things worth noting about this function

    • this check might not work in certain cases that mount point is in the root file system).
    • this check uses os.Stats which might cause process hung in case of NFS mount point when server is down. (Please check this very old and long discussion Hung volumes can wedge the kubelet #31272)
    • there is a similar function mount.IsNotMountPoint(), this function checks mount table for further verification about mount point to solve the issue with IsLikelyNotMountPoint, but it might take longer time to go through mount table.
  4. kl.podVolumesExist(podUID)
    This function is a combination of function 1 and 3 listed here, it first calls function GetMountedVolumesForPod and getMountedVolumePathListFromDisk to verify volume existence.

All these checking are used in a number of difference places. Here clearupOrphanedPodDirs, this function is being called by HandlePodCleanups, and its comment mentioned

// NOTE: This function is executed by the main sync loop, so it
// should not contain any blocking calls.

You might check the issue #31272 to see how we removed the check 3 getMountedVolumePathListFromDisk due to hung issue caused by nfs. However, I think I accidentally introduced this back again here by adding the check of mounted points in podVolumesExist. So the fix here I suggest, instead of calling podVolumesExist, call the following checks

  1. call check 1 kl.volumeManager.GetMountedVolumesForPod(podUid) for check internal cache,
  2. call check 2 kl.getPodVolumePathListFromDisk(podUid)

Pls let me know if you have questions about the comments. Thanks!

@mucahitkurt
Copy link
Contributor Author

@jingxu97, @msau42 Sorry for the delay, I'll try to check the latest comments and try to make the fix in a few days (hope before 12th Nov.)

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from 0c7105d to 01deec5 Compare November 9, 2020 15:12
@mucahitkurt
Copy link
Contributor Author

@mucahitkurt sorry for the delay. I think the main logic here is changing the ordering of checking things. There are three types of checking for pod volumes existence, check in-memory cache, check disk directories, and check whether each dir is a mount point or not.

  1. kl.volumeManager.GetMountedVolumesForPod(podUid)
    This will check in-memory cache whether volumes are still mounted. This check is fast, but might not be completely reliable when kubelet restarts and lost its cache (there is a reconstruction phase for cache from scanning dirs on disk, but I need to check whether there is guarantee when cleanupOrphanedPodDirs is called, cache is already constructed)

  2. kl.getPodVolumePathListFromDisk(podUid)
    This function directly scans the disk directory and returns a list of the volume paths for the given pod (e.g., /var/lib/kubelet/pods/pod_uid/volumes/...)

  3. kl.getMountedVolumePathListFromDisk(podUID)
    This function first calls getPodVolumePathListFromDisk to get the list of volume paths for the given pod, and check each path whether is a mount point or not using IsLikelyNotMountPoint.
    There are two things worth noting about this function

    • this check might not work in certain cases that mount point is in the root file system).
    • this check uses os.Stats which might cause process hung in case of NFS mount point when server is down. (Please check this very old and long discussion Hung volumes can wedge the kubelet #31272)
    • there is a similar function mount.IsNotMountPoint(), this function checks mount table for further verification about mount point to solve the issue with IsLikelyNotMountPoint, but it might take longer time to go through mount table.
  4. kl.podVolumesExist(podUID)
    This function is a combination of function 1 and 3 listed here, it first calls function GetMountedVolumesForPod and getMountedVolumePathListFromDisk to verify volume existence.

All these checking are used in a number of difference places. Here clearupOrphanedPodDirs, this function is being called by HandlePodCleanups, and its comment mentioned

// NOTE: This function is executed by the main sync loop, so it
// should not contain any blocking calls.

You might check the issue #31272 to see how we removed the check 3 getMountedVolumePathListFromDisk due to hung issue caused by nfs. However, I think I accidentally introduced this back again here by adding the check of mounted points in podVolumesExist. So the fix here I suggest, instead of calling podVolumesExist, call the following checks

  1. call check 1 kl.volumeManager.GetMountedVolumesForPod(podUid) for check internal cache,
  2. call check 2 kl.getPodVolumePathListFromDisk(podUid)

Pls let me know if you have questions about the comments. Thanks!

@jingxu97 ;

I keep the new function kl.podVolumePathsExist(podUID types.UID) because it uses the check 1 and check 2.

I replaced the kl.podVolumesExist(podUID types.UID) with kl.podVolumePathsExist(podUID types.UID) in cleanupOrphanedPodDirs and PodResourcesAreReclaimed.

Do you have any concern with the new function?

kl.podVolumesExist(podUID types.UID) is used in Kubelet.cleanupOrphanedPodCgroups, I keep it as is.

@jingxu97
Copy link
Contributor

@mucahitkurt Thanks for the update. The function looks good to me. The only thing is the naming, podVolumePathsExist and podVolumesExist, they are too close so it is a little bit hard to understand what's the difference. I am not good at naming either, just a suggestion, e.g.,

podVolumePathsExist --> podVolumePathsExistInCacheOrDisk
podVolumesExist --> podMountedVolumesExistInCacheOrDisk

@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from 01deec5 to d190c9c Compare November 10, 2020 05:22
@mucahitkurt
Copy link
Contributor Author

@mucahitkurt Thanks for the update. The function looks good to me. The only thing is the naming, podVolumePathsExist and podVolumesExist, they are too close so it is a little bit hard to understand what's the difference. I am not good at naming either, just a suggestion, e.g.,

podVolumePathsExist --> podVolumePathsExistInCacheOrDisk
podVolumesExist --> podMountedVolumesExistInCacheOrDisk

@jingxu97 I renamed with your suggestions, I don't have better alternatives.

@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@@ -1962,7 +1962,7 @@ func (kl *Kubelet) cleanupOrphanedPodCgroups(pcm cm.PodContainerManager, cgroupP
// parent croup. If the volumes still exist, reduce the cpu shares for any
// process in the cgroup to the minimum value while we wait. if the kubelet
// is configured to keep terminated volumes, we will delete the cgroup and not block.
if podVolumesExist := kl.podVolumesExist(uid); podVolumesExist && !kl.keepTerminatedPodVolumes {
if podVolumesExist := kl.podMountedVolumesExistInCacheOrDisk(uid); podVolumesExist && !kl.keepTerminatedPodVolumes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here cleanupOrphanedPodCgroups,I think it should be similar to cleanupOrphanedPod, so using podVolumePathsExistInCacheOrDisk. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this one to kl. kl.podVolumePathsExistInCacheOrDisk.

Right now no usage remain for the function kl.podVolumesExist or with the new name kl.podMountedVolumesExistInCacheOrDisk. Should I delete this function and all related test codes?

…pOrphanedPodDirs, cleanupOrphanedPodCgroups and PodResourcesAreReclaimed

check in-memory cache whether volumes are still mounted and check disk directory for the volume paths instead of mounted volumes check

Signed-off-by: Mucahit Kurt <mucahitkurt@gmail.com>
@mucahitkurt mucahitkurt force-pushed the refactor/remove-mount-volume-check-orphaned-pod-cleanup branch from d190c9c to 6748570 Compare November 10, 2020 14:35
@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-ubuntu-containerd

@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 11, 2020
@derekwaynecarr
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, jsafrane, mucahitkurt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2020
@msau42
Copy link
Member

msau42 commented Nov 12, 2020

/retest

1 similar comment
@msau42
Copy link
Member

msau42 commented Nov 12, 2020

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet