Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected deleting of contents of mount points due to symbolic link … #77507

Merged
merged 1 commit into from May 19, 2019

Conversation

@cuericlee
Copy link
Contributor

commented May 6, 2019

…of KubeletRunDirectory

symbolic link of KubeletRunDirectory cause unexpected deleting of contents of mount points

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #
In case KubeletRunDirectory holds a symbolic link, evaluate it

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: fix a bug related to volume unmount if the kubelet run directory is a symbolic link
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

Hi @cuericlee. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2019

/test

@neolit123
Copy link
Member

left a comment

thanks for the PR @cuericlee
added some comments.

@kubernetes/sig-cluster-lifecycle-pr-reviews
/priority awaiting-more-evidence

@@ -179,20 +179,27 @@ func (r *Reset) Run(out io.Writer, client clientset.Interface, cfg *kubeadmapi.I

// Try to unmount mounted directories under kubeadmconstants.KubeletRunDirectory in order to be able to remove the kubeadmconstants.KubeletRunDirectory directory later
fmt.Printf("[reset] Unmounting mounted directories in %q\n", kubeadmconstants.KubeletRunDirectory)
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", kubeadmconstants.KubeletRunDirectory)

// In case KubeletRunDirectory is symbolic link, kubeadm failed to find the exact mount points, so it likely clean up content from mount points under KubeletRunDirectory

This comment has been minimized.

Copy link
@neolit123

neolit123 May 6, 2019

Member

please change this comment to:

// In case KubeletRunDirectory holds a symbolic link, evaluate it

This comment has been minimized.

Copy link
@cuericlee

cuericlee May 7, 2019

Author Contributor

sure

} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident
dirsToClean = append(dirsToClean, absoluteKubeletRunDirectory)
}

This comment has been minimized.

Copy link
@neolit123

neolit123 May 6, 2019

Member

this } seems out of place?

also please change your text editor to use TABs instead of spaces.

This comment has been minimized.

Copy link
@cuericlee

cuericlee May 7, 2019

Author Contributor

sure

}
} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident
dirsToClean = append(dirsToClean, absoluteKubeletRunDirectory)

This comment has been minimized.

Copy link
@neolit123

neolit123 May 6, 2019

Member

i guess this makes some sense, but then again why is KubeletRunDirectory a symbolic link on your system?

This comment has been minimized.

Copy link
@cuericlee

cuericlee May 7, 2019

Author Contributor

This is issue reported by user, who intend to replace the /var/lib/kubelet with additional disk space. It cause this issue while kubeadm reset remove the data from the mount points. It looks like risky operation, but we have to prevent such condition by any case.

This comment has been minimized.

Copy link
@neolit123

neolit123 May 7, 2019

Member

/ok-to-test

i'm not sure this is a setup we should support, but i personally don't mind the symbolic link evaluation....let's leave this for others to comment.


klog.V(1).Infof("[reset] Executing command %q", umountDirsCmd)
umountOutputBytes, err := exec.Command("sh", "-c", umountDirsCmd).Output()
if err != nil {
klog.Errorf("[reset] Failed to unmount mounted directories in %s: %s\n", kubeadmconstants.KubeletRunDirectory, string(umountOutputBytes))
}
} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident

This comment has been minimized.

Copy link
@neolit123

neolit123 May 6, 2019

Member
// Only clean absoluteKubeletRunDirectory if umountDirsCmd passed without error

This comment has been minimized.

Copy link
@cuericlee

cuericlee May 7, 2019

Author Contributor

sure

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

/test

@neolit123

This comment has been minimized.

Copy link
Member

commented May 7, 2019

pull-kubernetes-verify — Job failed.

this job is failing because you are using spaces, instead of tabs in your text editor.
see here: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/77507/pull-kubernetes-verify/1125770764637704193

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

fix gofmt
/retest

var absoluteKubeletRunDirectory string
absoluteKubeletRunDirectory, err = filepath.EvalSymlinks(kubeadmconstants.KubeletRunDirectory)
if err != nil {
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directories: %v\n", err)

This comment has been minimized.

Copy link
@neolit123

neolit123 May 9, 2019

Member

directories -> directory

@neolit123

This comment has been minimized.

Copy link
Member

commented May 9, 2019

@cuericlee

please add a release note under Does this PR introduce a user-facing change?:

kubeadm: fix a bug related to volume unmount if the kubelet run directory is a symbolic link

waiting on more opinions by @kubernetes/sig-cluster-lifecycle-pr-reviews
i'm personally not a big fan of using /var/run/kubelet as a symbolic link. explanation from @cuericlee is here:
#77507 (comment)

@rosti
Copy link
Member

left a comment

Thanks for tackling this @cuericlee !

This can be avoided by replacing the symlink with a binding mount. However, since symlinks are popular and easy way to do this and not handling things properly is causing data loss, then it's best to merge the fix when it's ready.

if err != nil {
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directories: %v\n", err)
}
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", absoluteKubeletRunDirectory)

This comment has been minimized.

Copy link
@rosti

rosti May 9, 2019

Member

If absoluteKubeletRunDirectory is empty string (which will be if EvalSymlinks returns error), the awk command will return all mount paths in the system. With that in mind, and the fact that we are running as root, we'll get everything dismounted by accident.
Hence, the whole unmount dirs stuff in here needs to be performed in an else clause appended to the if that checks for error in EvalSymlinks.

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2019

/retest

@neolit123

This comment has been minimized.

Copy link
Member

commented May 12, 2019

@cuericlee
thanks for the update, please squash your commits to one and add a release note as i said here:
#77507 (comment)

thanks
/approve
/hold

@neolit123

This comment has been minimized.

Copy link
Member

commented May 12, 2019

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cuericlee, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2019

squash is done
/retest

if err != nil {
klog.Errorf("[reset] Failed to unmount mounted directories in %s: %s\n", kubeadmconstants.KubeletRunDirectory, string(umountOutputBytes))
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directory and skip umount and clean: %v\n", err)

This comment has been minimized.

Copy link
@neolit123

neolit123 May 13, 2019

Member

please change this to:

klog.Errorf("[reset] Failed to evaluate the %q directory. Skipping its unmount and cleanup: %v", kubeadmconstants.KubeletRunDirectory, err)

This comment has been minimized.

Copy link
@cuericlee

cuericlee May 14, 2019

Author Contributor

kubeadmconstants.KubeletRunDirectory is print out on line 181. But it is fine, I could refine log error message for easy pd accordingly.
fmt.Printf("[reset] Unmounting mounted directories in %q\n", kubeadmconstants.KubeletRunDirectory)

kubeadm: fix a bug related to volume unmount if the kubelet run direc…
…tory is a symbolic link

unexpected deleting of contents of mount points due to symbolic link of KubeletRunDirectory

@cuericlee cuericlee force-pushed the cuericlee:patch-1 branch from 73bdae7 to 56ce743 May 14, 2019

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2019

/retest

// Only unmount mount points which start with "/var/lib/kubelet" or absolute path of symbolic link, and avoid using empty absoluteKubeletRunDirectory
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", absoluteKubeletRunDirectory)
klog.V(1).Infof("[reset] Executing command %q", umountDirsCmd)
umountOutputBytes, err := exec.Command("sh", "-c", umountDirsCmd).Output()

This comment has been minimized.

Copy link
@neolit123

neolit123 May 14, 2019

Member

this might need a CombinedOutput instead, but since it was part of the old code let's leave it as is...

@neolit123

This comment has been minimized.

Copy link
Member

commented May 14, 2019

/lgtm

@cuericlee

This comment has been minimized.

Copy link
Contributor Author

commented May 19, 2019

@neo

/lgtm

@neolit123 anything is needed from me to make merge happen? It looks still hold.

@neolit123

This comment has been minimized.

Copy link
Member

commented May 19, 2019

/hold cancel

@fejta-bot

This comment has been minimized.

Copy link

commented May 19, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit c854f72 into kubernetes:master May 19, 2019

20 checks passed

cla/linuxfoundation cuericlee authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.