New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passthrough readOnly to subpath #63045

Merged
merged 1 commit into from May 8, 2018

Conversation

@msau42
Member

msau42 commented Apr 24, 2018

What this PR does / why we need it:
If a volume is mounted as readonly, or subpath volumeMount is configured as readonly, then the subpath bind mount should be readonly.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #62752

Special notes for your reviewer:

Release note:

Fixes issue where subpath readOnly mounts failed
@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

/assign @jsafrane

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

I am still working on e2e tests.

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

cc @andyzhangx
probably something similar needs to be done for windows

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

Hm this is still not working:

Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: I0424 01:12:55.957265    1327 mount_linux.go:143] Mounting cmd (systemd-run) with arguments ([--description=Kubernetes transient mount for /var/lib/
kubelet/pods/f15795a3-475b-11e8-b4d0-42010a800002/volume-subpaths/test-volume/test-container-subpath-emptydir-k7t9/0 --scope -- mount -o remount,ro /proc/1327/fd/34 /var/lib/kubelet/pods/f15795a3-475b-11e8-b4d0-
42010a800002/volume-subpaths/test-volume/test-container-subpath-emptydir-k7t9/0])
Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: E0424 01:12:55.964589    1327 mount_linux.go:148] Mount failed: exit status 32
Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: Mounting command: systemd-run
Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/f15795a3-475b-11e8-b4d0-42010a800002/volume-subpaths/test-vol
ume/test-container-subpath-emptydir-k7t9/0 --scope -- mount -o remount,ro /proc/1327/fd/34 /var/lib/kubelet/pods/f15795a3-475b-11e8-b4d0-42010a800002/volume-subpaths/test-volume/test-container-subpath-emptydir-k
7t9/0
Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: Output: Running scope as unit: run-rcf0ee9fe5efe41e9a1bb0a8a752b6333.scope
Apr 24 01:12:55 e2e-test-msau-minion-group-nchn kubelet[1327]: mount: /var/lib/kubelet/pods/f15795a3-475b-11e8-b4d0-42010a800002/volume-subpaths/test-volume/test-container-subpath-emptydir-k7t9/0 is busy
@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

pushed some e2e tests, but they're all failing

looks like bind mounting read only is not working. will debug more tomorrow.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Apr 24, 2018

@msau42 Actually readonly flag is not respected for windows dir mount, still in investigation, anyway I filed a issue: #63051

@@ -274,7 +276,7 @@ func IsNotMountPoint(mounter Interface, file string) (bool, error) {
// The list equals:
// options - 'bind' + 'remount' (no duplicate)
func isBind(options []string) (bool, []string) {
bindRemountOpts := []string{"remount"}
bindRemountOpts := []string{"bind", "remount"}

This comment has been minimized.

@msau42

msau42 Apr 24, 2018

Member

This is what I had to do to make it work for the case that the volume is writable, but the subpath is read only.

The original behavior of bind mount is: If bind mount is read only, and the underlying volume is writable, then this remount will also remount the original volume as read only. For this fix, if you also pass in the "bind" option to the remount, then it will make only the bind mount read only, and will not change the original volume mount.

We need to be sure this behavior change is ok for normal bind mounts that volume plugins do.

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

just to confirm, previously we were still bind-mounting right? just not explicitly passing in this option, which as far as we know only changes behavior in this specific case?

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

Yes, when we do a bind mount, we actually make two mount calls:

mount -o bind .....
mount -o remount,<more options like ro> ....

It's a limitation of the bind mount operation that it ignores the other options on the first bind mount. background

I think this is a safe change to make and should not break anyone. The difference with this change is now the underlying volume won't be changed to read only. It will actually allow pods to share the same volume, one with write permissions and one with read only. Before, I think this would have resulted in a mount error.

This comment has been minimized.

@davidz627

davidz627 Apr 26, 2018

Collaborator

So this change actually "sort of" deviates from the article you linked, because it looks like in theory we should be able to get a read-only bindmount by doing (according to this article linked):

mount -o bind /vital_data /untrusted_container/vital_data
mount -o remount,ro /untrusted_container/vital_data

which we had before. However, that actually has the behavior of modifying the original mount point to also be read only, so what you've done is changed the command to the following to fix that:

mount -o bind /vital_data /untrusted_container/vital_data
mount -o bind,remount,ro /untrusted_container/vital_data

Let me know if that's a flawed understanding.

This comment has been minimized.

@davidz627

davidz627 Apr 26, 2018

Collaborator

Interesting thing I found:
https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount

With Linux, the simple way:

mount --bind /some/where /mnt/readonly
mount -o remount,ro,bind /mnt/readonly
This leaves a short interval of time during which /mnt/readonly is read-write. If this is a security concern, first create the bind mount in a directory that only root can access, make it read-only, then move it to a public mount point. In the snippet below, note that it's important that /root/private (the directory above the mount point) is private; the original permissions on /root/private/mnt are irrelevant since they are hidden behind the mount point.

mkdir -p /root/private/mnt
chmod 700 /root/private
mount --bind /some/where /root/private/mnt
mount -o remount,ro,bind /root/private/mnt
mount --move /root/private/mnt /mnt/readonly

This comment has been minimized.

@msau42

msau42 Apr 26, 2018

Member

Ah ah yes you're right. I totally forgot to explain the issue I hit when it wasn't specifying the "bind" option in the remount.

Because subpath is a subdirectory of the volume, when you do the remount on the base volume as readonly, that remount will fail because of a "device is busy" error. And I think that is because the subpath is a nested mount inside the base volume.

Sorry no, that doesn't make sense. I think it's related to the fact that we have the file opened. The original error I got is complaining that the bind path target is busy, so it didn't even get to the underlying volume yet:

mount: /var/lib/kubelet/pods/f15795a3-475b-11e8-b4d0-42010a800002/volume-subpaths/test-volume/test-container-subpath-emptydir-k7t9/0 is busy

This comment has been minimized.

@davidz627

davidz627 Apr 26, 2018

Collaborator

Interesting, that's not the problem I expected, might be worth putting a comment above this line with a quick explanation or link to a description of why its necessary. Could be easily confused for a bug or misunderstood in the future.

This comment has been minimized.

@msau42

msau42 Apr 26, 2018

Member

Added comment

@@ -56,7 +56,7 @@ const (
// syscall.Openat flags used to traverse directories not following symlinks
nofollowFlags = unix.O_RDONLY | unix.O_NOFOLLOW
// flags for getting file descriptor without following the symlink
openFDFlags = unix.O_NOFOLLOW | unix.O_PATH
openFDFlags = unix.O_RDONLY | unix.O_NOFOLLOW | unix.O_PATH

This comment has been minimized.

@msau42

msau42 Apr 24, 2018

Member

This may not be needed. Haven't tested without it yet

This comment has been minimized.

@msau42

msau42 Apr 24, 2018

Member

yup removed

@@ -271,6 +271,18 @@ var _ = utils.SIGDescribe("Subpath", func() {
}
testSubpathReconstruction(f, pod, true)
})
It("should support readOnly specified in the volumeMount", func() {

This comment has been minimized.

@msau42

msau42 Apr 24, 2018

Member

Also need test cases for when the volume source is read only. For inline volumes and PV volumes.

This comment has been minimized.

@msau42

msau42 Apr 24, 2018

Member

added

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

I got it working, see comments above.

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Apr 24, 2018

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

/retest

@davidz627

This comment has been minimized.

Collaborator

davidz627 commented Apr 24, 2018

/assign

@msau42 msau42 changed the title from WIP: passthrough readOnly to subpath to passthrough readOnly to subpath Apr 24, 2018

@msau42

This comment has been minimized.

Member

msau42 commented Apr 24, 2018

containerized kubelet still needs to be tested
/hold

@msau42

This comment has been minimized.

Member

msau42 commented Apr 25, 2018

Opened #63156 for gce pd test failure.

/retest

@@ -66,7 +66,7 @@ func TestBindMount(t *testing.T) {
expectedArgs = []string{"-t", fsType, "-o", "bind", sourcePath, destinationPath}
case 2:
// mount -t fstype -o "remount,opts" source target
expectedArgs = []string{"-t", fsType, "-o", "remount," + strings.Join(mountOptions, ","), sourcePath, destinationPath}
expectedArgs = []string{"-t", fsType, "-o", "bind,remount," + strings.Join(mountOptions, ","), sourcePath, destinationPath}

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

is this related to the readOnly fix?

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

Yes, I had to add this option because: #63045 (comment)

@@ -274,7 +276,7 @@ func IsNotMountPoint(mounter Interface, file string) (bool, error) {
// The list equals:
// options - 'bind' + 'remount' (no duplicate)
func isBind(options []string) (bool, []string) {
bindRemountOpts := []string{"remount"}
bindRemountOpts := []string{"bind", "remount"}

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

just to confirm, previously we were still bind-mounting right? just not explicitly passing in this option, which as far as we know only changes behavior in this specific case?

{
name: "subpath-file-readonly",
prepare: func(base string) ([]string, string, string, error) {
volpath, _ := getTestPaths(base)

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

In these test cases the volpath is not readOnly right? Should we add two more tests here that test ReadOnly Subpath Mount for dir/file for when volPath is readOnly?

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

The bindSubpath function doesn't actually check the permissions on the volume if it's read only or not. It only uses determines what to do based on the Subpath.ReadOnly flag.

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

but could it have different behavior whether the permissions on the volume are read only or not? Almost like doing the same test but in a "different environment"

This comment has been minimized.

@msau42

msau42 Apr 26, 2018

Member

added

}
}
if expectedReadOnly != foundReadOnly {
return fmt.Errorf("expected readOnly %v for mount point %v", expectedReadOnly, bindPathTarget)

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

expected... "got..."? Doesn't seem strictly necessary in this case but it might make it slightly clearer

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

yeah I didn't put it in because it's just a bool. But it would make it more clear

// Set volume source to read only
pod.Spec.Volumes[0].VolumeSource = *roVol
clearSubpathPodCommands(pod)

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

not super clear on why we need to do this

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

I'm reusing the same pod spec as the first pod, so I need to reset the pod state to not do write commands.

setWriteCommand(filePathInVolume, &pod.Spec.Containers[1])
By("Running a pod to initialize the volume data")
testReadFile(f, filePathInSubpath, pod, 0)

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

why are we testReadFile-ing here?

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

I need a way to initialize the contents of the test file, so I'm launching the test pod first with volume writable. Then I launch the test pod again with volume read only.

I wanted to reuse the same read/write test program as the other test cases, but I could try to make this more explicit.

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

yea it's not super clear here, if the functions are being reused for different purposes maybe the right move here is to rename the function?

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

updated in new commit

@@ -718,6 +785,15 @@ func (s *gcepdSource) createVolume(f *framework.Framework) volInfo {
}
}
func (s *gcepdSource) readOnlyVolume() *v1.VolumeSource {
return &v1.VolumeSource{
PersistentVolumeClaim: &v1.PersistentVolumeClaimVolumeSource{

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

should this be GCEPDSource? Or maybe this should have a gcepdPVCSource type receiver. Or maybe the naming is just inconsistent

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

It's GCE PD represented through PVC. I can rename it to gcepdPVCSource

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

ack

@@ -718,6 +785,15 @@ func (s *gcepdSource) createVolume(f *framework.Framework) volInfo {
}
}
func (s *gcepdSource) readOnlyVolume() *v1.VolumeSource {

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

these functions are "creating" a readOnlyVolume right? The naming is a little ambiguous, at first I thought it was an "isReadOnlyVolume" type function but it doesn't seem like that.

This comment has been minimized.

@msau42

msau42 Apr 25, 2018

Member

They create the volumespec with readOnly set, but don't actually create any volumes.

This comment has been minimized.

@davidz627

davidz627 Apr 25, 2018

Collaborator

createReadOnlyVolumeSource?

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 26, 2018

@msau42

This comment has been minimized.

Member

msau42 commented Apr 27, 2018

I manually tested emptydir on a local containerized kubelet setup. I couldn't run the e2es because kubectl logs was not working properly in the containerized setup.

@msau42

This comment has been minimized.

Member

msau42 commented Apr 27, 2018

/hold cancel

@msau42

This comment has been minimized.

Member

msau42 commented May 1, 2018

This is ready for review. @jsafrane ptal

@jsafrane

This comment has been minimized.

Member

jsafrane commented May 4, 2018

/approve
Thanks!

@jsafrane

This comment has been minimized.

Member

jsafrane commented May 4, 2018

/retest

1 similar comment
@msau42

This comment has been minimized.

Member

msau42 commented May 7, 2018

/retest

@msau42

This comment has been minimized.

Member

msau42 commented May 7, 2018

/assign @Random-Liu

@Random-Liu

This comment has been minimized.

Member

Random-Liu commented May 7, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented May 7, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidz627, jsafrane, msau42, Random-Liu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

fejta-bot commented May 8, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

fejta-bot commented May 8, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented May 8, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented May 8, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit e6b6e5c into kubernetes:master May 8, 2018

15 of 16 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation msau42 authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

k8s-merge-robot added a commit that referenced this pull request May 9, 2018

Merge pull request #63547 from msau42/automated-cherry-pick-of-#63045-…
…upstream-release-1.8

Automatic merge from submit-queue.

[1.8] Automated cherry pick of #63045: passthrough readOnly to subpath

cherry pick of #63045: passthrough readOnly to subpath

**Release note**:

```release-note
Fixes issue where subpath readOnly mounts failed
```

k8s-merge-robot added a commit that referenced this pull request May 15, 2018

Merge pull request #63545 from msau42/automated-cherry-pick-of-#63045-…
…upstream-release-1.10

Automatic merge from submit-queue.

[1.10] Automated cherry pick of #63045: passthrough readOnly to subpath

cherry pick of #63045: passthrough readOnly to subpath

**Release note**:

```release-note
Fixes issue where subpath readOnly mounts failed
```

k8s-merge-robot added a commit that referenced this pull request May 15, 2018

Merge pull request #63546 from msau42/automated-cherry-pick-of-#63045-…
…upstream-release-1.9

Automatic merge from submit-queue.

[1.9] Automated cherry pick of #63045: passthrough readOnly to subpath

cherry pick of #63045: passthrough readOnly to subpath

**Release note**:

```release-note
Fixes issue where subpath readOnly mounts failed
```

k8s-merge-robot added a commit that referenced this pull request Jun 5, 2018

Merge pull request #64351 from msau42/fix-readonly
Automatic merge from submit-queue (batch tested with PRs 62266, 64351, 64366, 64235, 64560). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bind mount subpath with same read/write settings as underlying volume

**What this PR does / why we need it**:
#63045 broke two scenarios:
* If volumeMount path already exists in container image, container runtime will try to chown the volume
* In SELinux system, we will try to set SELinux labels when starting the container

This fix makes it so that the subpath bind mount will inherit the read/write settings of the underlying volume mount. It does this by using the "bind,remount" mount options when doing the bind mount.

The underlying volume mount is ro when the volumeSource.readOnly flag is set. This is for persistent volume types like PVC, GCE PD, NFS, etc.  When this is set, we won't try to configure SELinux labels.  Also in this mode, subpaths have to already exist in the volume, we cannot make new directories on a read only volume.

When volumeMount.readOnly is set, the container runtime is in charge of making the volume in the container readOnly, but the underlying volume mount on the host can be writable. This can be set for any volume type, and is permanently set for atomic volume types like configmaps, secrets.  In this case, SELinux labels will be applied before the container runtime makes the volume readOnly.  And subpaths don't have to exist.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64120

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixes issue for readOnly subpath mounts for SELinux systems and when the volume mountPath already existed in the container image.
```

@pawelprazak pawelprazak referenced this pull request Jun 27, 2018

Closed

subPath volume mount umbrella issue #61563

6 of 8 tasks complete

@msau42 msau42 deleted the msau42:fix-subpath-readonly branch Nov 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment