New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Containerized subpath #63143
Containerized subpath #63143
Conversation
/test pull-kubernetes-cross |
/assign |
pkg/util/mount/mount.go
Outdated
@@ -112,6 +112,11 @@ type Interface interface { | |||
// It returns the same path as it gets unless kubelet runs in a container. | |||
// Then it returns "/rootfs/" + <path> | |||
KubeletPath(path string) string | |||
// EvalSymlinks returns the path name after the evaluation of any symbolic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns the path name on the host
pkg/kubelet/kubelet_pods.go
Outdated
@@ -175,7 +174,8 @@ func makeMounts(pod *v1.Pod, podDir string, container *v1.Container, hostName, h | |||
return nil, cleanupAction, fmt.Errorf("unable to provision SubPath `%s`: %v", mount.SubPath, err) | |||
} | |||
|
|||
fileinfo, err := os.Lstat(hostPath) | |||
kubeletHostPath := mounter.KubeletPath(hostPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exposing KubeletPath as an interface, would it be better to expose Lstat as an interface, and have EvalSymlinks return container path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the code here is wrong. It resolves any symlinks in the hostPath
in kubelet, which is not possible.
And that's Lstat is not easy - we need to resolve the symlinks on the host and then do os.Lstat("/rootfs/" + resolved_path)
in kubelet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by some refactoring.
7ee1f9c
to
7e7ed01
Compare
7e7ed01
to
a5d741f
Compare
Pushed 2nd version. kubelet_pods.go now passes all paths as paths on the host to mounter and mounter translates them by itself when it's needed. I changed semantics of @andyzhangx, I touched the Windows part a bit, please review. |
e683d06
to
7026920
Compare
@kubernetes/sig-storage-pr-reviews, any review is highly appreciated, we don't want more regressions in subpath! |
7026920
to
8f69f4b
Compare
pkg/util/mount/nsenter_mount.go
Outdated
@@ -33,7 +33,7 @@ import ( | |||
|
|||
const ( | |||
// hostProcMountsPath is the default mount path for rootfs | |||
hostProcMountsPath = "/rootfs/proc/1/mounts" | |||
HostProcMountsPath = "/rootfs/proc/1/mounts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not need to be public if we introduce Nsenter.HostPath()
as in https://github.com/kubernetes/kubernetes/pull/62903/files#diff-f6ffeca7a3b942208d644eaa4ec99642R129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see it being used in this file.
pkg/util/mount/mount_windows.go
Outdated
func (mounter *Mounter) SafeMakeDir(pathname string, base string, perm os.FileMode) error { | ||
return doSafeMakeDir(pathname, base, perm) | ||
func (mounter *Mounter) SafeMakeDir(subdir string, base string, perm os.FileMode) error { | ||
fullPath := filepath.Join(base, subdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does windows not need to eval symlinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed
pkg/util/mount/nsenter_mount.go
Outdated
} | ||
defer syscall.Close(fd) | ||
|
||
glog.Infof("JSAF before prepare") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up
pkg/util/mount/nsenter_mount.go
Outdated
return false, err | ||
} | ||
kubeletPath := mounter.getKubeletPath(evaluatedPath) | ||
_, err = os.Lstat(kubeletPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was wrong with mounter.Lstat()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked mounter.Lstat()
to return NotExist
when the target does not exist (it returned some indistinguishable error before).
hack/local-up-cluster.sh
Outdated
@@ -800,7 +800,7 @@ function start_kubelet { | |||
all_kubelet_flags+=(--containerized) | |||
|
|||
docker run --rm --name kubelet \ | |||
--volume=/:/rootfs:ro,rslave \ | |||
--volume=/:/rootfs:rw,rslave \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean existing deployments will break if they upgrade to this fix and don't change this setting?
Previously, safemkdir still worked for non-hostpath types because it uses /var/lib/kubelet/.... instead of /rootfs/var/lib/kubelet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, it will break.
I could pass podDir
into SafeMakeDir()
as a new parameter. NsenterMounter.SafeMakeDir
could then check if the directory to create is subdir of podDir
and add /rootfs
only when it's not.
Any better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new argument to NewNsenterMounter
so the mounter knows what's location of /var/lib/kubelet
and can do a shortcut in SafeMakeDir
. That was the easiest way I could find out. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I removed changes in hack/local-up-cluster.sh
pkg/util/mount/nsenter_mount.go
Outdated
} | ||
return strconv.Atoi(string(matches[1])) | ||
kubeletBase := mounter.getKubeletPath(evaluatedBase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the paths need to be cleaned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to doSafeMakeDir
, just to be sure.
|
||
// Check it's not already bind-mounted | ||
// prepareSubpathTarget creates target for bind-mount of subpath. It returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment this is also used by nsenter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
pkg/util/mount/nsenter_mount.go
Outdated
defer syscall.Close(fd) | ||
|
||
glog.Infof("JSAF before prepare") | ||
alreadyMounted, bindPathTarget, err := prepareSubpathTarget(mounter, subpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be done first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it does not really matter if we do safeOpenSubPath
or prepareSubpathTarget
first, I wanted the defer
that cleans up volumes to affect as little code as possible.
} | ||
defer syscall.Close(fd) | ||
|
||
alreadyMounted, bindPathTarget, err := prepareSubpathTarget(mounter, subpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this first?
pkg/util/mount/nsenter_mount.go
Outdated
// path: /mnt/volume/non/existing/directory. /mnt/volume exists and | ||
// non/existing/directory does not exist. It resolves symlinks in /mnt/volume | ||
// to say /mnt/foo and returns /mnt/foo/non/existing/directory. | ||
func (mounter *NsenterMounter) evalHostSymlinks(path string, mustExist bool) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added a similar function in https://github.com/kubernetes/kubernetes/pull/62903/files#diff-f6ffeca7a3b942208d644eaa4ec99642R127 as you and @msau42 suggested. How about merging #62903 first, then use Nsenter.HostPath
to get the path name on the host after evaluating symlinks on the host (and adding mustExist
parameter)?
@jsafrane if you rebase to master, we can try the new e2e job |
fdb50b5
to
1e65c09
Compare
if we support the containerized kubelet, it seems like we should exercise it in CI, especially given the complexity around how it has to be handled |
// that's under user control. User must not be able to use symlinks to | ||
// escape the volume to create directories somewhere else. | ||
SafeMakeDir(subdir string, base string, perm os.FileMode) error | ||
// Will operate in the host mount namespace if kubelet is running in a container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird that a seemingly generic interface in a generic lib (pkg/util) has any comprehension of "kubelet" at all. Can we decouple these layered ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet is the only user of this API and it makes the whole picture more visible. It would be harder to go through all the layers and see how they fit together if they contain just generic comments.
We could move the big picture to a separate file / documentation, however, these tend to rot.
I filled #64603 for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SafeFormatAndMount is a very delicate and tricky operation, and kops uses it (and this package). Another vote from me for keeping the layers well-factored here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ongoing work to refactor the mount library to separate out k8s specific operations (like subpath processing), and general mounting/formatting utilities. #68513
I'm very grumpy that pkg/util contains the word "kubelet" at all. How can we fix this? I will approve the PR since it fixes a bug and doesn't make things worse, but that's a genericity problem. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, msau42, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/kind bug |
@jsafrane: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
/status approved-for-milestone |
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process @brendandburns @dchen1107 @jsafrane @lavalamp @msau42 @smarterclayton @thockin Pull Request Labels
|
Automatic merge from submit-queue (batch tested with PRs 63348, 63839, 63143, 64447, 64567). If you want to cherry-pick this change to another branch, please follow the instructions here. |
@msau42 I'm running kubelet under rkt using the kubelet wrapper, with kube 1.11.0.
My systemd unit does not currently have rootfs mounted into the kubelet container, so I have attempted to add it in an attempt to take advantage of this patch. When doing so, kubelet then refuses to start with:
I'm wondering if I'm missing something obvious here? I am able to mount standard host path volumes fine, but when using the prometheus operator (which enforces a subPath be used), it will not work. Is there any more detailed info on how I should configure this? I'd imagine (hope!) that tectonic doesn't suffer from this same problem, however I'm unsure where to dig into to find this out, and I'd prefer not to have to spin up an entire tectonic cluster just to find a couple of arguments! 😄 |
@munnerz, list of directories that must be in the container with kubelet is here: kubernetes/hack/local-up-cluster.sh Lines 825 to 832 in 7786bd8
Please report if there is anything missing. |
@jsafrane
I switched to a non root user, and it got passed. |
This is part of upstream PRs kubernetes#62903 and kubernetes#63143. Origin-commit: dbee4f01db7eba2aff89e8cdde8b06de4bc159f0
What this PR does / why we need it:
Containerized kubelet needs a different implementation of
PrepareSafeSubpath
than kubelet running directly on the host.On the host we safely open the subpath and then bind-mount
/proc/<pidof kubelet>/fd/<descriptor of opened subpath>
.With kubelet running in a container,
/proc/xxx/fd/yy
on the host contains path that works only inside the container, i.e./rootfs/path/to/subpath
and thus any bind-mount on the host fails.Solution:
/var/lib/kubelet/pods/<uid>/volume-subpaths/<name of container>/<id of mount>
. This is potentially unsafe, because user can change the subpath source to a link to a bad place (say/run/docker.sock
) just before the bind-mount.Which issue(s) this PR fixes
Fixes #61456
Special notes for your reviewer:
The PR contains some refactoring of
doBindSubPath
to extract the common code. NewdoNsEnterBindSubPath
is added for the nsenter related parts.Release note: