Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected deleting of contents of mount points due to symbolic link … #77507

Merged
merged 1 commit into from
May 19, 2019

Conversation

cuericlee
Copy link
Contributor

@cuericlee cuericlee commented May 6, 2019

…of KubeletRunDirectory

symbolic link of KubeletRunDirectory cause unexpected deleting of contents of mount points

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #
In case KubeletRunDirectory holds a symbolic link, evaluate it

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: fix a bug related to volume unmount if the kubelet run directory is a symbolic link

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @cuericlee. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 6, 2019
@k8s-ci-robot k8s-ci-robot requested review from kad and luxas May 6, 2019 12:56
@cuericlee
Copy link
Contributor Author

/test

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR @cuericlee
added some comments.

@kubernetes/sig-cluster-lifecycle-pr-reviews
/priority awaiting-more-evidence

@@ -179,20 +179,27 @@ func (r *Reset) Run(out io.Writer, client clientset.Interface, cfg *kubeadmapi.I

// Try to unmount mounted directories under kubeadmconstants.KubeletRunDirectory in order to be able to remove the kubeadmconstants.KubeletRunDirectory directory later
fmt.Printf("[reset] Unmounting mounted directories in %q\n", kubeadmconstants.KubeletRunDirectory)
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", kubeadmconstants.KubeletRunDirectory)

// In case KubeletRunDirectory is symbolic link, kubeadm failed to find the exact mount points, so it likely clean up content from mount points under KubeletRunDirectory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this comment to:

// In case KubeletRunDirectory holds a symbolic link, evaluate it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident
dirsToClean = append(dirsToClean, absoluteKubeletRunDirectory)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this } seems out of place?

also please change your text editor to use TABs instead of spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

}
} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident
dirsToClean = append(dirsToClean, absoluteKubeletRunDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this makes some sense, but then again why is KubeletRunDirectory a symbolic link on your system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is issue reported by user, who intend to replace the /var/lib/kubelet with additional disk space. It cause this issue while kubeadm reset remove the data from the mount points. It looks like risky operation, but we have to prevent such condition by any case.

Copy link
Member

@neolit123 neolit123 May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

i'm not sure this is a setup we should support, but i personally don't mind the symbolic link evaluation....let's leave this for others to comment.

@k8s-ci-robot k8s-ci-robot added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 6, 2019

klog.V(1).Infof("[reset] Executing command %q", umountDirsCmd)
umountOutputBytes, err := exec.Command("sh", "-c", umountDirsCmd).Output()
if err != nil {
klog.Errorf("[reset] Failed to unmount mounted directories in %s: %s\n", kubeadmconstants.KubeletRunDirectory, string(umountOutputBytes))
}
} else {
// Once success to unmount mounted directories then removeAll contents of /var/lib/kubelet avoid to remove moounted filesystem by accident
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Only clean absoluteKubeletRunDirectory if umountDirsCmd passed without error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@cuericlee
Copy link
Contributor Author

/test

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 7, 2019
@neolit123
Copy link
Member

pull-kubernetes-verify — Job failed.

this job is failing because you are using spaces, instead of tabs in your text editor.
see here: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/77507/pull-kubernetes-verify/1125770764637704193

@cuericlee
Copy link
Contributor Author

cuericlee commented May 8, 2019

fix gofmt
/retest

var absoluteKubeletRunDirectory string
absoluteKubeletRunDirectory, err = filepath.EvalSymlinks(kubeadmconstants.KubeletRunDirectory)
if err != nil {
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directories: %v\n", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directories -> directory

@neolit123
Copy link
Member

@cuericlee

please add a release note under Does this PR introduce a user-facing change?:

kubeadm: fix a bug related to volume unmount if the kubelet run directory is a symbolic link

waiting on more opinions by @kubernetes/sig-cluster-lifecycle-pr-reviews
i'm personally not a big fan of using /var/run/kubelet as a symbolic link. explanation from @cuericlee is here:
#77507 (comment)

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this @cuericlee !

This can be avoided by replacing the symlink with a binding mount. However, since symlinks are popular and easy way to do this and not handling things properly is causing data loss, then it's best to merge the fix when it's ready.

if err != nil {
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directories: %v\n", err)
}
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", absoluteKubeletRunDirectory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If absoluteKubeletRunDirectory is empty string (which will be if EvalSymlinks returns error), the awk command will return all mount paths in the system. With that in mind, and the fact that we are running as root, we'll get everything dismounted by accident.
Hence, the whole unmount dirs stuff in here needs to be performed in an else clause appended to the if that checks for error in EvalSymlinks.

@cuericlee
Copy link
Contributor Author

/retest

@neolit123
Copy link
Member

@cuericlee
thanks for the update, please squash your commits to one and add a release note as i said here:
#77507 (comment)

thanks
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2019
@neolit123
Copy link
Member

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cuericlee, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 12, 2019
@cuericlee
Copy link
Contributor Author

squash is done
/retest

if err != nil {
klog.Errorf("[reset] Failed to unmount mounted directories in %s: %s\n", kubeadmconstants.KubeletRunDirectory, string(umountOutputBytes))
klog.Errorf("[reset] Failed to evaluate KubeletRunDirectory directory and skip umount and clean: %v\n", err)
Copy link
Member

@neolit123 neolit123 May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this to:

klog.Errorf("[reset] Failed to evaluate the %q directory. Skipping its unmount and cleanup: %v", kubeadmconstants.KubeletRunDirectory, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeadmconstants.KubeletRunDirectory is print out on line 181. But it is fine, I could refine log error message for easy pd accordingly.
fmt.Printf("[reset] Unmounting mounted directories in %q\n", kubeadmconstants.KubeletRunDirectory)

…tory is a symbolic link

unexpected deleting of contents of mount points due to symbolic link of KubeletRunDirectory
@cuericlee
Copy link
Contributor Author

/retest

// Only unmount mount points which start with "/var/lib/kubelet" or absolute path of symbolic link, and avoid using empty absoluteKubeletRunDirectory
umountDirsCmd := fmt.Sprintf("awk '$2 ~ path {print $2}' path=%s/ /proc/mounts | xargs -r umount", absoluteKubeletRunDirectory)
klog.V(1).Infof("[reset] Executing command %q", umountDirsCmd)
umountOutputBytes, err := exec.Command("sh", "-c", umountDirsCmd).Output()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might need a CombinedOutput instead, but since it was part of the old code let's leave it as is...

@neolit123
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 14, 2019
@cuericlee
Copy link
Contributor Author

@neo

/lgtm

@neolit123 anything is needed from me to make merge happen? It looks still hold.

@neolit123
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit c854f72 into kubernetes:master May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants