Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build inside sysbox container results in "Error processing tar file(exit status 1): operation not permitted" #254

Closed
ctalledo opened this issue Apr 9, 2021 · 11 comments
Assignees
Labels
bug Something isn't working
Projects

Comments

@ctalledo
Copy link
Member

ctalledo commented Apr 9, 2021

When running Docker inside a sysbox container, a docker build may fail as follows:

~ # docker build -t myimage .                                                                                                                                                                                                                                                                                                 
Sending build context to Docker daemon  3.072kB                                                                                                                
Step 1/2 : FROM ubuntu:18.04                                                                                                                                   
18.04: Pulling from library/ubuntu                                                                                                                             
6e0aa5e7af40: Pull complete                                                                                                                                                                                                                                                                                                   
d47239a868b3: Pull complete                                                                                                                                    
49cbb10cca85: Extracting [==================================================>]     189B/189B                                                                                                                                                                                                                                  
failed to register layer: Error processing tar file(exit status 1): operation not permitted                

The sysbox container in this case was based on the docker:18.04-dind image (i.e., dockerd 18.04 runs inside the sysbox container).

In addition, the error occurred when building a Dockerfile that started with the ubuntu 18.04 base image (e.g., "FROM ubuntu:18.04"). The error was not seen when using other base images (e.g., alpine).

The error does not reproduce when using dockerd >= 19.03 inside the sysbox container.

Sysbox version was 0.3.0.

@ctalledo ctalledo added the bug Something isn't working label Apr 9, 2021
@ctalledo ctalledo self-assigned this Apr 9, 2021
@ctalledo ctalledo added this to To do in Sysbox Dev via automation Apr 9, 2021
@ctalledo
Copy link
Member Author

ctalledo commented Apr 9, 2021

Debugging this, I can see that the inner Docker build fails due to a tar operation that gets an EPERM on the following syscall:

2917246 fchmodat(AT_FDCWD, "/run/systemd", 0755) = 0
2917246 utimensat(AT_FDCWD, "/run/systemd", [{tv_sec=1616711581, tv_nsec=0} /* 2021-03-25T22:33:01+0000 */, {tv_sec=1616711581, tv_nsec=0} /* 2021-03-25T22:33:01+0000 */], 0) = 0
2917246 read(0, "run/systemd/.wh..wh..opq\0\0\0\0\0\0\0\0"..., 512) = 512
2917246 lstat("/run/systemd", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
2917246 lstat("/run/systemd/.wh..wh..opq", 0xc420264108) = -1 ENOENT (No such file or directory)
2917246 setxattr("/run/systemd", "trusted.overlay.opaque", "y", 1, 0) = -1 EPERM (Operation not permitted)

In other words, it appears the Linux kernel is not allowing non-root programs to set the trusted.overlay.opaque attribute, even if the process is root inside a Linux user-namespace.

For now the work-around is to use an inner Dockerd whose version is >= 19.03. However a proper fix is needed in Sysbox because I hit a similar error recently when running Podman inside a sysbox container (experimentally) and trying to run an inner container based on a Ubuntu image too.

@ctalledo
Copy link
Member Author

@pwurbs reports that a similar error also occurs when the inner Docker is docker:19.03.15-dind-alpine3.13:

(the following text is copied from issue #380):

"I started successfully a pod with docker:dind image (docker:19.03.15-dind-alpine3.13)
Trying "docker pull nginx" in this container results in this error:
failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo655593762") failed: failed to rmdir /rdwoo655593762/m/d: remove /rdwoo655593762/m/d: operation not permitted

This is the Docker version info from within the container:

Server: Docker Engine - Community
Engine:
Version: 19.03.15
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 99e3ed8
Built: Sat Jan 30 03:18:13 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.9
GitCommit: ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683

These versions are a bit different from your ubuntu-bionic-systemd-docker image.
I am not sure, if this issue is K8S / RKE related. I only wanted to let you know..."

@ctalledo
Copy link
Member Author

Hi @pwurbs,

Interesting that you hit this when the inner Docker uses docker:19.03.15-dind-alpine3.13, as I my prior investigation found that this problem should be fixed in Docker 19.03.

Could you retry with a docker dind image using Docker 20+ please?

Thanks!

@pwurbs
Copy link

pwurbs commented Aug 27, 2021

Hi @ctalledo,
I tried again with docker:20.10.8-dind to pull nginx image.
Now, there is another error message::
failed to register layer: ApplyLayer exit status 1 stdout: stderr: unlinkat /etc/.pwd.lock: operation not permitted

@ctalledo
Copy link
Member Author

Hi @pwurbs,

Thanks again.

I tried to reproduce this on my Ubuntu-Focal development machine and was not able to.

Here are the steps I used:

  1. Launch the Docker DinD container with image docker:19.03.15-dind-alpine3.13:
docker network create some-network

docker run --runtime=sysbox-runc   \
 --name dind-syscont -d  \
 --network some-network \
 --network-alias docker \    
 -e DOCKER_TLS_CERTDIR=/certs \    
 -v dind-syscont-certs-ca:/certs/ca \    
 -v dind-syscont-certs-client:/certs/client \
 docker:19.03.15-dind-alpine3.13
  1. Launch the Docker CLI container:
docker run -it --rm \
   --network some-network \
   -e DOCKER_TLS_CERTDIR=/certs \
   -v dind-syscont-certs-client:/certs/client:ro \
   docker:latest sh
  1. From inside the Docker CLI container, pull the nginx image:
/ # docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

/ # docker image ls
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE

/ # docker pull nginx
Using default tag: latest
latest: Pulling from library/nginx
e1acddbe380c: Pull complete 
e21006f71c6f: Pull complete 
f3341cc17e58: Pull complete 
2a53fa598ee2: Pull complete 
12455f71a9b5: Pull complete 
b86f2ba62d17: Pull complete 
Digest: sha256:4d4d96ac750af48c6a551d757c1cbfc071692309b491b70b2b8976e102dd3fef
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest
/ #

I'll try now from within a sysbox-powered pod.

@ctalledo
Copy link
Member Author

I was able to repro the problem when deploying the docker-dind and docker-cli containers in a Kubernetes cluster, as follows:

  1. docker-dind container spec (uses sysbox runtime):
apiVersion: v1
kind: Pod
metadata:
  name: dind-pod
  annotations:
    io.kubernetes.cri-o.userns-mode: "auto:size=65536"
spec:
  runtimeClassName: sysbox-runc
  containers:
  - name: dind
    image: docker:19.03.15-dind-alpine3.13
    command: ["sh", "-c", "dockerd -H tcp://0.0.0.0:2375 > /var/log/dockerd.log 2>&1"]
    ports:
    - containerPort: 2375
  restartPolicy: Never
  1. docker-cli container spec (uses runc runtime):
apiVersion: v1
kind: Pod
metadata:
  name: docker-cli-pod
spec:
  containers:
  - name: docker-cli
    image: docker:latest
    command: ["sh", "-c", "sleep infinity"]
  restartPolicy: Never
  1. Apply both of the yamls above

  2. Exec into the docker-cli-pod and do the following:

$ kubectl exec -it docker-cli-pod -- /bin/sh
/ # export DOCKER_HOST="tcp://<IP>:2375"
/ # docker pull nginx

Using default tag: latest                                                                                                                                                                                                                                                                                                     
latest: Pulling from library/nginx                                                                                                                             
e1acddbe380c: Pull complete                                                                                                                                    
e21006f71c6f: Extracting [==================================================>]   26.6MB/26.6MB                                                                                                                                                                                                                                
f3341cc17e58: Download complete                                                                                                                                                                                                                                                                                               
2a53fa598ee2: Download complete                                                                                                                                
12455f71a9b5: Download complete                                                                                                                                
b86f2ba62d17: Download complete                                                                                                                                                                                                                                                                                               
failed to register layer: Error processing tar file(exit status 1): replaceDirWithOverlayOpaque("/docker-entrypoint.d") failed: createDirWithOverlayOpaque("/rdwoo330054920") failed: failed to rmdir /rdwoo330054920/m/d: remove /rdwoo330054920/m/d: operation not permitted 

A couple of interesting data points:

  • I don't see this when pulling the nginx image inside a pod based on the nestybox/ubuntu-bionic-systemd-docker. That image also carries docker 19.03, so it's strange that the problem does not show up there.

  • Similarly, I don't see it when pulling the nginx image inside a pod using nestybox/ubuntu-focal-systemd-docker. That image carries docker 20.10.7.

I'll dig a bit deeper to see what's going on ...

@ctalledo
Copy link
Member Author

Stracing the docker pull nginx shows that the problem is caused by the same problem described in this comment above.

The strace shows:

29601 newfstatat(AT_FDCWD, "/docker-entrypoint.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0                                                                                                                                                                                                        
29601 newfstatat(AT_FDCWD, "/docker-entrypoint.d/.wh..wh..opq", 0xc0005e32e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)                                                                                                                                                                                    
29601 setxattr("/docker-entrypoint.d", "trusted.overlay.opaque", "y", 1, 0) = -1 EPERM (Operation not permitted)                                                                                                                                                                                                    

Process 29601 is docker-untar.

I need to understand why the kernel returns EPERM when a process inside the Sysbox container tries to do a setxattr of trusted.overlay.opaque on a file.

@ctalledo
Copy link
Member Author

ctalledo commented Aug 30, 2021

I need to understand why the kernel returns EPERM when a process inside the Sysbox container tries to do a setxattr of trusted.overlay.opaque on a file.

I investigated and it looks like the trusted.overlay.opaque extended attribute can't be set or read by a process in a user-namespace other that the initial user-ns (even if the process has all capabilities set). It's a limitation of the kernel, likely done for security reasons.

Thus, solving this issue will require that Sysbox perform syscall trapping of the *xattr() family of syscalls, at a minimum when these operate on the trusted.overlay.opaque attribute. Sysbox can do this securely because it knows the processes inside a sys container are in a file-system jail, so they will only be able to set this attribute on files inside this jail.

I'm half-way through implementing this solution, expect to have it sometime this week.

@ctalledo
Copy link
Member Author

ctalledo commented Sep 4, 2021

Update: was able to fix the trusted.overlay.opaque problem seen with Docker 19.03, as well as the problem seen with Docker 20.01 (" unlinkat /etc/.pwd.lock: operation not permitted").

The fix is in my local branch, and took a bit longer to implement than expected. If no further hiccups are found, expect to commit fix early next week.

ctalledo added a commit that referenced this issue Sep 7, 2021
These tests verify that sysbox performs syscall interception
of the *xattr() syscalls correctly, particularly to enable
sys container processes to set the 'trusted.overlay.opaque'
extended attribute.

See Sysbox issue #254 for more info on why this is necessary.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
@ctalledo
Copy link
Member Author

ctalledo commented Sep 7, 2021

Quick update: I've uploaded the PRs for the fix, expect to commit Wednesday or Thursday this week.

ctalledo added a commit that referenced this issue Sep 12, 2021
These tests verify that sysbox performs syscall interception
of the *xattr() syscalls correctly, particularly to enable
sys container processes to set the 'trusted.overlay.opaque'
extended attribute.

See Sysbox issue #254 for more info on why this is necessary.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
ctalledo added a commit that referenced this issue Sep 13, 2021
These tests verify that sysbox performs syscall interception
of the *xattr() syscalls correctly, particularly to enable
sys container processes to set the 'trusted.overlay.opaque'
extended attribute.

See Sysbox issue #254 for more info on why this is necessary.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
ctalledo added a commit that referenced this issue Sep 13, 2021
These tests verify that sysbox performs syscall interception
of the *xattr() syscalls correctly, particularly to enable
sys container processes to set the 'trusted.overlay.opaque'
extended attribute.

See Sysbox issue #254 for more info on why this is necessary.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
@ctalledo
Copy link
Member Author

PR was merged, issue is now fixed in Sysbox upstream. Will be present in the next Sysbox release (i.e., after v0.4.0).

We will also update the sysbox-deploy-k8s daemonset to include the fix shortly. I will post an update here once that is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Sysbox Dev
  
Done
Development

No branches or pull requests

2 participants