Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless mode doesn't work on Google Container-Optimized OS kernel (CONFIG_SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS?) #879

Closed
AkihiroSuda opened this issue Mar 15, 2019 · 25 comments · Fixed by #3097
Labels
area/gke Google Kubernetes Engine area/rootless rootless mode help wanted

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Mar 15, 2019

~ $ cat Dockerfile
FROM alpine
~ $ export BUILDKIT_HOST=tcp://127.0.0.1:1234
~ $ buildctl b --frontend dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.0s (2/2) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                       0.0s
 => => transferring dockerfile: 49B                                                                                                                                                                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                            0.0s
error: failed to solve: rpc error: code = Unknown desc = failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount290620720: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 Options:[rbind ro]}]: operation not permitted

But unshare -rm mount works 🤔

~ $ unshare -mr
buildkitd-649b4db5d4-jskbq:/home/user# mount --rbind -o ro /home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 /home/user/.local/tmp/buildkit-mount710693070

$ kubectl get nodes -o wide
NAME                                        STATUS    ROLES     AGE       VERSION         EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
*****************************************   Ready     <none>    19m       v1.12.5-gke.5   **************   Container-Optimized OS from Google   4.14.89+         docker://17.3.2
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  selector:
    matchLabels:
      app: buildkitd
  template:
    metadata:
      labels:
        app: buildkitd
      annotations:
        container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
        container.seccomp.security.alpha.kubernetes.io/buildkitd: unconfined
    spec:
      containers:
      - image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
        args:
        - --addr
        - tcp://0.0.0.0:1234
        - --oci-worker-no-process-sandbox
        name: buildkitd
        ports:
        - containerPort: 1234
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  ports:
  - port: 1234
    protocol: TCP
  selector:
    app: buildkitd
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 15, 2019

Note: the same step (w/ --oci-worker-snapshotter=native) succeeds with the following envs:

  • Docker for Mac 2.0.3.0 (Build 31778, Kube 1.13.0, Docker 18.09.3)
  • minikube v0.35.0
  • AKS v1.12.6 (kernel 4.15.0-1037-azure, Ubuntu 16.04.5, MS-Moby 3.0.4)

@AkihiroSuda
Copy link
Member Author

wondering this might be related to ChromiumOS LSM, but not sure https://chromium.googlesource.com/chromiumos/third_party/kernel/+/HEAD/security/chromiumos

@tonistiigi
Copy link
Member

@AkihiroSuda just to be clear, it does not work without setting securityContext in GKE?

@AkihiroSuda
Copy link
Member Author

No, even privileged: true does not work with rootless image.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  selector:
    matchLabels:
      app: buildkitd
  template:
    metadata:
      labels:
        app: buildkitd
    spec:
      containers:
      - image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
        args:
        - --addr
        - tcp://0.0.0.0:1234
        name: buildkitd
        ports:
        - containerPort: 1234
        securityContext:
          privileged: true

With rootful image, it works. (tested both overlay and native for rootful)

@tonistiigi
Copy link
Member

@AkihiroSuda So is this a regression in v0.4 ?

@AkihiroSuda
Copy link
Member Author

No, even v0.3.0-rootless w/ securityContext: privileged does not work now.

This is rather likely to be a regression in GKE, although I don't have any evidence that v0.3.0-rootless had been working on GKE.

@AkihiroSuda
Copy link
Member Author

v0.4.0-rootless (both overlay and native; both w/ and w/o privileged) works with GKE Ubuntu nodes (kernel 4.15.0-1026-gcp #27-Ubuntu, kube v1.11.7-gke.4, Ubuntu 18.04.1, docker://17.3.2).

Seems an issue on Google COS.

@AkihiroSuda AkihiroSuda changed the title Rootless mode doesn't work on GKE Rootless mode doesn't work on GKE Container-Optimized OS kernel Mar 15, 2019
@AkihiroSuda AkihiroSuda changed the title Rootless mode doesn't work on GKE Container-Optimized OS kernel Rootless mode doesn't work on Google Container-Optimized OS kernel Mar 15, 2019
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Mar 15, 2019

strace:

buildkit (fails) (containerd/containerd#1373)

[pid 15561] mkdirat(AT_FDCWD, "/home/user/.local/tmp/buildkit-mount226977687", 0700) = 0
[pid 15561] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f2848, MS_RDONLY|MS_BIN
D|MS_REC, NULL) = 0
[pid 15561] mount("", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f284e, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = -1 EPERM (Operation not permitted)

mount -o rbind,ro (succeeds)

[pid 17658] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", NULL, MS_RDONLY|MS_BIND|MS_REC|MS_SILENT, NULL)
= 0

likely to be related to SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/Kconfig#21
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/lsm.c#133

@AkihiroSuda AkihiroSuda changed the title Rootless mode doesn't work on Google Container-Optimized OS kernel Rootless mode doesn't work on Google Container-Optimized OS kernel (CONFIG_SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS?) Mar 15, 2019
@AkihiroSuda AkihiroSuda added the area/gke Google Kubernetes Engine label May 8, 2019
cirocosta pushed a commit to concourse/hush-house that referenced this issue Jul 14, 2019
From moby/buildkit#879, it seems like GKE's
container-optimized instances introduce trouble for running
rootless containers - adding an ubuntu pool to test it out.

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
cirocosta pushed a commit to concourse/hush-house that referenced this issue Jul 14, 2019
Continuing with the explorations on how the use of rootless containers
might be affected by `gke`'s COS base images (see
moby/buildkit#879), now we have a worker runs
on top of ubuntu that can be targetted via the `ubuntu` tag.

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
@meysholdt
Copy link

I just tried with the COS nodes of 1.15.4-gke.18 and the regressions seems to be still there :(

@JesterOrNot
Copy link

Any updates on this issue?

@AkihiroSuda
Copy link
Member Author

Needs help from Google

@JesterOrNot
Copy link

So can anything be done?

@AkihiroSuda
Copy link
Member Author

Maybe https://github.com/AkihiroSuda/containerd-fuse-overlayfs can be a solution, but blocked due to go mod hell
#1297

@JesterOrNot
Copy link

Can I do anything to help?

@AkihiroSuda
Copy link
Member Author

Another way is to replace the failing mount flags
with what "unshare -rm mount" example in the top comment of this issue uses.

This needs more investigation and help is appreciated, thanks.

@JesterOrNot
Copy link

So you want to change the error? (sorry I'm new)

@AkihiroSuda
Copy link
Member Author

"unshare -rm mount" example doesn't produce any error, and we want to avoid BuildKit error by using the same mount flags

@AkihiroSuda
Copy link
Member Author

I assumed fuse-overlayfs snapshotter may work, but seems not 😢

$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=.
[+] Building 0.2s (2/2) FINISHED                                                                                                
 => [internal] load build definition from Dockerfile                                                                       0.2s
 => => transferring dockerfile: 109B                                                                                       0.2s
 => [internal] load .dockerignore                                                                                          0.2s
 => => transferring context: 2B                                                                                            0.2s
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount998042514: [{Type:bind Source:/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/1/fs Options:[rbind ro]}]: operation not permitted

@AkihiroSuda
Copy link
Member Author

Not only the issue in snapshotter

$ git diff
diff --git a/vendor/github.com/containerd/containerd/mount/mount_linux.go b/vendor/github.com/containerd/containerd/mount/mount_linux.go
index a7edd455..526640be 100644
--- a/vendor/github.com/containerd/containerd/mount/mount_linux.go
+++ b/vendor/github.com/containerd/containerd/mount/mount_linux.go
@@ -93,7 +93,10 @@ func (m *Mount) Mount(target string) error {
        const broflags = unix.MS_BIND | unix.MS_RDONLY
        if oflags&broflags == broflags {
                // Remount the bind to apply read only.
-               return unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+               unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+               // DO-NOT-MERGE:
+               // ignore err here to avoid hitting https://github.com/moby/buildkit/issues/879#issuecomment-473396544
+               // How can we ensure target to be read-only?
        }
        return nil
 }

$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=
.
[+] Building 6.1s (5/6)                                                                                                         
 => [internal] load build definition from Dockerfile                                                                       0.2s
 => => transferring dockerfile: 109B                                                                                       0.2s
 => [internal] load .dockerignore                                                                                          0.2s
 => => transferring context: 2B                                                                                            0.1s
 => [internal] load metadata for docker.io/library/alpine:latest                                                           3.3s
 => [1/3] FROM docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d            2.1s
 => => resolve docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d            0.0s
 => => sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 1.64kB / 1.64kB                             0.0s
 => => sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45 528B / 528B                                 0.0s
 => => sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9 2.80MB / 2.80MB                             1.2s
 => => sha256:e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a 1.51kB / 1.51kB                             0.0s
 => => unpacking docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d          0.1s
 => ERROR [2/3] RUN apk add --no-cache figlet                                                                              0.1s
------
 > [2/3] RUN apk add --no-cache figlet:
#5 0.084 container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.g
o:58: mounting \\\"/home/user/.local/share/buildkit/runc-native/executor/resolv.conf\\\" to rootfs \\\"/home/user/.local/share/b
uildkit/runc-native/executor/c9qbj5rmvwnjixos72ek7k7ko/rootfs\\\" at \\\"/home/user/.local/share/buildkit/runc-native/executor/c
9qbj5rmvwnjixos72ek7k7ko/rootfs/etc/resolv.conf\\\" caused \\\"operation not permitted\\\"\""
------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c apk add --no-cache figlet]: buildki
t-runc did not terminate successfully

YoussB pushed a commit to concourse/infrastructure that referenced this issue May 13, 2020
From moby/buildkit#879, it seems like GKE's
container-optimized instances introduce trouble for running
rootless containers - adding an ubuntu pool to test it out.

Signed-off-by: Ciro S. Costa <cscosta@pivotal.io>
@dinvlad
Copy link

dinvlad commented Aug 6, 2021

Any updates on this?

@ei-grad
Copy link

ei-grad commented Sep 7, 2022

Using an idea from bottlerocket-os/bottlerocket#1934 I added an emptyDir volume to /home/user/.local/share/buildkit and it worked.

@AkihiroSuda
Copy link
Member Author

@ei-grad
On GCOS kernel? 👀

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Sep 7, 2022

Isn't this VOLUME working by default? 🤔

VOLUME /home/user/.local/share/buildkit

@ei-grad
Copy link

ei-grad commented Sep 8, 2022

On GCOS kernel

Yes, latest GKE with cos_containerd image. I got fully functional rootless buildkit with resource definitions from examples/kubernetes with only added a emptyDir/hostPath volumeMount for /home/user/.local/share/buildkit.

Isn't this VOLUME working by default? 🤔

Yes, and that's the problem - default volumes are mounted with nosuid,nodev flags, which cause Permission denied error trying to remount this volume without this flags. See details in an excellent investigation from @bcressey there in linked bottlerocket issue.

@ei-grad
Copy link

ei-grad commented Sep 9, 2022

I got fully functional rootless buildkit

Just to clarify - using the native snapshotter. Using the overlayfs/fuse-overlayfs on GCOS requies priveleged: true since the kernel is 5.10 😢.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gke Google Kubernetes Engine area/rootless rootless mode help wanted
Projects
None yet
7 participants
@ei-grad @dinvlad @meysholdt @tonistiigi @AkihiroSuda @JesterOrNot and others