Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local storage teardown fix #48402

Merged

Conversation

ianchakeres
Copy link
Contributor

@ianchakeres ianchakeres commented Jul 2, 2017

What this PR does / why we need it: Local storage uses bindmounts and the method IsLikelyNotMountPoint does not detect these as mountpoints. Therefore, local PVs are not properly unmounted when they are deleted.

Which issue this PR fixes: fixes #48331

Special notes for your reviewer:

You can use these e2e tests to reproduce the issue and validate the fix works appropriately #47999

The existing method IsLikelyNotMountPoint purposely does not check mountpoints reliability (

// IsLikelyNotMountPoint determines if a directory is not a mountpoint.
), since the number of mountpoints can be large.
List() ([]MountPoint, error)

This implementation changes the behavior for local storage to detect mountpoints reliably, and avoids changing the behavior for any other callers to a UnmountPath.

Release note:

Fixes bind-mount teardown failure with non-mount point Local volumes (issue https://github.com/kubernetes/kubernetes/issues/48331).

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 2, 2017
@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 2, 2017
@ianchakeres
Copy link
Contributor Author

/sig storage

@k8s-github-robot k8s-github-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 2, 2017
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jul 2, 2017
@ianchakeres
Copy link
Contributor Author

@k8s-ci-robot ok to test

@ianchakeres
Copy link
Contributor Author

@CallMeFoxie @msau42 @saad-ali - Here is the PR to fix #48331. Please take a look and let me know if you have any questions or comments.

@CallMeFoxie
Copy link
Contributor

I0703 08:07:44.466131 22258 operation_generator.go:523] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/ae9f9214-5fb5-11e7-a549-180373f4ecc0-default-token-mhn97" (OuterVolumeSpecName: "default-token-mhn97") pod "ae9f9214-5fb5-11e7-a549-180373f4ecc0" (UID: "ae9f9214-5fb5-11e7-a549-180373f4ecc0"). InnerVolumeSpecName "default-token-mhn97". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0703 08:07:44.467018 22258 operation_generator.go:523] UnmountVolume.TearDown succeeded for volume "kubernetes.io/local-volume/ae9f9214-5fb5-11e7-a549-180373f4ecc0-test-affinity-test-1" (OuterVolumeSpecName: "test") pod "ae9f9214-5fb5-11e7-a549-180373f4ecc0" (UID: "ae9f9214-5fb5-11e7-a549-180373f4ecc0"). InnerVolumeSpecName "test-affinity-test-1". PluginName "kubernetes.io/local-volume", VolumeGidValue ""

looks good to me even with local self hosted storage, unmounts the volumes as it should back and forth.
cc @msau42 @saad-ali @jsafrane

if notMnt {
glog.Warningf("Warning: %q is not a mountpoint, deleting", mountPath)
return os.Remove(mountPath)
}

// Unmount the mount path
glog.Warningf("Warning: %q is a mountpoint, unmounting", mountPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that lower log level should be used, it's part of a normal operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane - would you like me to lower the log level for the previous statement L107 too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, at 107 something else happened - something unmounted a fs and left a stray directory there, e.g. kubelet crashed after unmount and before os.Remove. It's quite exceptional event.

Here on line 112 it's just normal operation, nothing suspicious.

@jsafrane jsafrane added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 3, 2017
@jsafrane
Copy link
Member

jsafrane commented Jul 3, 2017

lgtm to me, however I let @msau42 or @rootfs say the final word.


// IsNotMountPoint determines if a directory is a mountpoint.
// It should return ErrNotExist when the directory does not exist.
// This method uses the List() of all mountpoints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that in particular this is can detect bind mounts of directories.

if mountPointsErr != nil {
return notMnt, mountPointsErr
}
deletedFile := fmt.Sprintf("%s\\040(deleted)", file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing is implementation specific. It should be done in the the mounter implementation instead.

@@ -269,5 +269,5 @@ func (u *localVolumeUnmounter) TearDown() error {
// TearDownAt unmounts the bind mount
func (u *localVolumeUnmounter) TearDownAt(dir string) error {
glog.V(4).Infof("Unmounting volume %q at path %q\n", u.volName, dir)
return util.UnmountPath(dir, u.mounter)
return util.UnmountMountPoint(dir, u.mounter, true) /* extensiveMountPointCheck = true */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During mount we also call IsLikelyNotMountPoint. Need to also evaluate if that call needs to be updated too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified all references of IsLikelyNotMountPoint to use IsNotMountPoint inside local.go.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 5, 2017
@ianchakeres
Copy link
Contributor Author

/retest

@ianchakeres ianchakeres force-pushed the local-storage-teardown-fix branch 5 times, most recently from cd8f0b9 to 6a246da Compare July 5, 2017 16:38
// It should return ErrNotExist when the directory does not exist.
// This method uses the List() of all mountpoints
// It is more extensive than IsLikelyNotMountPoint
// and it detects bind mounts in linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a comment indicating that compared to IsLikelyNotMountPoint(...), this will have a higher perf impact if the machine has a lot of mounts (since it lists all the mounts), and should not be used in critical paths unless absolutely necessary.

// IsNotMountPoint is more extensive than IsLikelyNotMountPoint
// It uses IsMountPointMatch to detect matching dir in List()
// It detects bind mounts in linux
IsNotMountPoint(file string) (bool, error)
// IsLikelyNotMountPoint determines if a directory is a mountpoint.
// It should return ErrNotExist when the directory does not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on the existing method sucks. Now that we're adding a similar method, let's update both to make it very crisp and clear 1) what each does, 2) what the differences between them are, and 3) when a caller should use one vs the other, and 4) what the pitfalls of each are.


// UnmountMountPoint is a common unmount routine that unmounts the given path and
// deletes the remaining directory if successful.
// if extensiveMountPointCheck is true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add comment about perf implications about enabling this option.

@@ -75,23 +75,41 @@ func SetReady(dir string) {
// UnmountPath is a common unmount routine that unmounts the given path and
// deletes the remaining directory if successful.
func UnmountPath(mountPath string, mounter mount.Interface) error {
return UnmountMountPoint(mountPath, mounter, false) /* extensiveMountPointCheck = false */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit:

I prefer

UnmountMountPoint(mountPath, mounter, false /* extensiveMountPointCheck */)

since it's less verbose and doesn't require you to update the value and comment if you change the value.

@saad-ali
Copy link
Member

saad-ali commented Jul 5, 2017

Paging @pmorie since he's familiar with this some of the mounter code (nsenter_mount in particular).

@saad-ali saad-ali requested a review from pmorie July 5, 2017 17:11
@saad-ali saad-ali added this to the v1.7 milestone Jul 5, 2017
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2017
@msau42
Copy link
Member

msau42 commented Jul 10, 2017

/lgtm

@saad-ali saad-ali added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2017
@saad-ali
Copy link
Member

Also, please add more descriptive release note (explain what exact bug this is fixing).

Updated release note on behalf of @ianchakeres with link to bug being fixed.

@saad-ali
Copy link
Member

/test pull-kubernetes-e2e-kops-aws

@wojtek-t
Copy link
Member

@saad-ali - thanks! Will create a cherrypick PR once this one is merged (but I don't have any more concerns).

@ianchakeres
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws

@ianchakeres
Copy link
Contributor Author

@saad-ali - I'm hoping that #48403 will fix pull-kubernetes-e2e-kops-aws errors we're seeing.

@ianchakeres
Copy link
Contributor Author

This PR #48762 should help with docker timeout in e2e-kops-aws.

@fejta
Copy link
Contributor

fejta commented Jul 11, 2017

/test pull-kubernetes-e2e-kops-aws

Added IsNotMountPoint method to mount utils (pkg/util/mount/mount.go)
Added UnmountMountPoint method to volume utils (pkg/volume/util/util.go)
Call UnmountMountPoint method from local storage (pkg/volume/local/local.go)
IsLikelyNotMountPoint behavior was not modified, so the logic/behavior for UnmountPath is not modified
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2017
@ianchakeres
Copy link
Contributor Author

@saad-ali @msau42 - I just squashed the commits.

@msau42
Copy link
Member

msau42 commented Jul 11, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ianchakeres, msau42, saad-ali
We suggest the following additional approver: smarterclayton

Assign the PR to them by writing /assign @smarterclayton in a comment when ready.

Associated issue: 48331

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 03360d7 into kubernetes:master Jul 12, 2017
@wojtek-t wojtek-t added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 12, 2017
@wojtek-t
Copy link
Member

#48809 constains a cherrypick of it

k8s-github-robot pushed a commit that referenced this pull request Jul 18, 2017
#48524-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #48402 #48524 upstream release 1.7

Cherry pick of #48524 and #48402 on release-1.7.

#48524 : fix udp service blackhole problem when number of backends changes from 0 to non-0
#48402 : Local storage teardown fix
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes 1.7.0 local storage teardown failed