Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make /var/lib/kubelet as shared during startup #45724

Merged
merged 1 commit into from Sep 2, 2017

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented May 12, 2017

This is part of kubernetes/community#589 kubernetes/community#659

We'd like kubelet to be able to consume mounts from containers in the future, therefore kubelet should make sure that /var/lib/kubelet has shared mount propagation to be able to see these mounts.

On most distros, root directory is already mounted with shared mount propagation and this code will not do anything. On older distros such as Debian Wheezy, this code detects that /var/lib/kubelet is a directory on / which has private mount propagation and kubelet bind-mounts /var/lib/kubelet as rshared.

Both "regular" linux mounter and NsenterMounter are updated here.

@kubernetes/sig-storage-pr-reviews @kubernetes/sig-node-pr-reviews
@vishh

Release note:

Kubelet re-binds /var/lib/kubelet directory with rshared mount propagation during startup if it is not shared yet.

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 12, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 12, 2017
@@ -861,6 +861,12 @@ func RunKubelet(kubeFlags *options.KubeletFlags, kubeCfg *componentconfig.Kubele
}
}

// set up /var/lib/kubelet sharing
err = kubeDeps.Mounter.MakeShared(kubeCfg.RootDirectory)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope RunKubelet is the right place to do this startup initialization. Feel free to suggest a better one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering to myself, setupDataDirs looks like a better place.

@jsafrane jsafrane force-pushed the mount-propagation2 branch 5 times, most recently from defadd4 to 27389e0 Compare May 12, 2017 14:01
@jsafrane
Copy link
Member Author

tested with KUBE_NODE_OS_DISTRIBUTION=gci go run hack/e2e.go -v --up and KUBE_NODE_OS_DISTRIBUTION=debian. The patch does not do anything on GCI and it bind-mounts /var/lib/kubelet on Debian and makes it rshared.

Surprisingly, something already bind-mounts /var/lib/kubelet as shared on GCI even though it's not necessary, / is already shared.

@jsafrane
Copy link
Member Author

@k8s-bot pull-kubernetes-federation-e2e-gce test this

Copy link
Contributor

@vishh vishh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits.

@@ -55,6 +55,9 @@ type Interface interface {
// GetDeviceNameFromMount finds the device name by checking the mount path
// to get the global mount path which matches its plugin directory
GetDeviceNameFromMount(mountPath, pluginDir string) (string, error)
// MakeShared checks that given path is on a mount with 'shared' mount
// propagation. If not, it bind-mounts the path as shared.
MakeShared(path string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Call it MakeRShared to reflect the fact that it shares recursively?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

if err != nil {
return []mountInfo{}, err
}
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could make this a generic utility for reading many kernel APIs that return data more than a page in size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (with some refactoring of listProcMounts to use the same function).

}

glog.V(2).Infof("Bind-mounting %q with shared mount propagation", path)
// mount --bind /var/lib/kubelet /var/lib/kubelet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the syscall package? Why exec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? syscall.Exec will replace /usr/bin/kulebet in current process with /bin/mount and it won't ever return. We need fork()+exec() and that's exactly what os.Exec does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Reworked to use syscall.Mount.

// doMakeShared is common implementation of MakeShared on Linux. It checks if
// path is shared and bind-mounts it as shared if needed. mountCmd and mountArgs
// are expected to contain mount-like command, doMakeShared will add '--bind
// <path> <path>' and '--make-shared <path>' to mountArgs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make-rshared

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func TestIsSharedSuccess(t *testing.T) {
successMountInfo :=
`62 0 253:0 / / rw,relatime shared:1 - ext4 /dev/mapper/ssd-root rw,seclabel,data=ordered
76 62 8:1 / /boot rw,relatime shared:29 - ext4 /dev/sda1 rw,seclabel,data=ordered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case with /var/lib/kubelet explicitly marked as shared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test with /var/lib/foo that is a mount point and is shared.

@jsafrane
Copy link
Member Author

@vishh, thanks for the review, I'll update the PR tomorrow.

I played with this PR and with #41683 to make kubelet mount stuff from containers and it looks like /var/lib/kubelet is not enough and https://github.com/kubernetes/kubernetes/pulls/jsafrane is still needed on machines without systemd. See kubernetes/community#648 for details.

@ncdc
Copy link
Member

ncdc commented May 22, 2017

/unassign
/assign @derekwaynecarr @sjenning

@k8s-ci-robot k8s-ci-robot assigned derekwaynecarr and sjenning and unassigned ncdc May 22, 2017
@sjenning
Copy link
Contributor

Doesn't seem like this would work for nsenter mounter i.e. containerized kubelet. The container itself will have already bind mounted /var/lib/kubelet as private. nsentering the host and changing the mount propagation after container start isn't going to change the fact that the containerized kubelet has a private /var/lib/kubelet right?

}

type mountInfo struct {
root, mountPoint string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not seeing where root is ever used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@sjenning
Copy link
Contributor

I'm also not seeing the code for "if / is already rshared, do nothing". Really the rule should be:

for dir in /var/lib/kubelet /var/lib /var /; do
  if dir exists in mountpoints
    if dir is private; then
      remount /var/lib/kubelet as rshared
    fi
    return
  fi
done

If you get to the end without a remount, it means the /var/lib/kubelet is under / and is reshared already. You only need to remount if /var/lib/kubelet or one of its parents directories is 1) is a mountpoint and 2) is private. I think 😄

@jsafrane
Copy link
Member Author

/retest
#51692

}

// mount --make-rshared /var/lib/kubelet
if err := syscall.Mount(path, path, "" /*fstype*/, syscall.MS_SHARED|syscall.MS_REC, "" /*data*/); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a remount flag (MS_REMOUNT) required here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, remount is not required. mount --make-rshared uses only the fags above as implied by the man page and checked with strace.

@vishh
Copy link
Contributor

vishh commented Aug 31, 2017

I'd like to see a node e2e test that verifies this feature. This may prevent regressions. e2es can land even after code freeze I think.

@jsafrane
Copy link
Member Author

I have e2e test for whole propagation in my queue, waiting for #46444 to merge. And if it runs on wheezy we can be quite sure that this PR works as expected.

@vishh
Copy link
Contributor

vishh commented Aug 31, 2017

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2017
@jsafrane
Copy link
Member Author

jsafrane commented Sep 1, 2017

/assign @smarterclayton @thockin
for approval. PTAL.

@jsafrane
Copy link
Member Author

jsafrane commented Sep 1, 2017

/retest

@smarterclayton
Copy link
Contributor

/approve

@jsafrane
Copy link
Member Author

jsafrane commented Sep 1, 2017

/approve no-issue
(just checking... :-)

@smarterclayton
Copy link
Contributor

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, sjenning, smarterclayton, vishh

Associated issue requirement bypassed by: smarterclayton

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2017
@abgworrall abgworrall modified the milestone: v1.8 Sep 2, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 917f9f0 into kubernetes:master Sep 2, 2017
k8s-github-robot pushed a commit that referenced this pull request Sep 2, 2017
Automatic merge from submit-queue (batch tested with PRs 45724, 48051, 46444, 51056, 51605)

Mount propagation in kubelet

Together with #45724 it implements mount propagation as proposed in kubernetes/community#659

There is:

- New alpha annotation that allows user to explicitly set propagation mode for each `VolumeMount` in pod containers (to be replaced with real `VolumeMount.Propagation` field during beta) + validation + tests. "Private" is the default one (= no change to existing pods).

  I know about proposal for real API fields for alpha feature in https://docs.google.com/document/d/1wuoSqHkeT51mQQ7dIFhUKrdi3-1wbKrNWeIL4cKb9zU/edit, but it seems it's not implemented yet. It would save me quite lot of code and ugly annotation.

- Updated CRI API to transport chosen propagation to Docker.

- New `kubelet --experimental-mount-propagation` option to enable the previous bullet without modifying types.go (worked around with changing `KubeletDeps`... not nice, but it's better than adding a parameter to `NewMainKubelet` and removing it in the next release...)

```release-note
kubelet has alpha support for mount propagation. It is disabled by default and it is there for testing only. This feature may be redesigned or even removed in a future release.
```

@derekwaynecarr @dchen1107 @kubernetes/sig-node-pr-reviews
@k8s-ci-robot
Copy link
Contributor

@jsafrane: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws d950010 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot pushed a commit that referenced this pull request Sep 6, 2017
…lement mount.Interface again.

This was broken in #45724
k8s-github-robot pushed a commit that referenced this pull request Sep 6, 2017
Automatic merge from submit-queue

Make *fakeMountInterface in container_manager_unsupported_test.go implement mount.Interface again.

This was broken in #45724

**Release note**:
```release-note
NONE
```
/sig storage
/sig node

/cc @jsafrane, @vishh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet