Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting a volume with rootless with always assign ownership to root #45919

Closed
jorgenpt opened this issue Jul 10, 2023 · 10 comments
Closed

Mounting a volume with rootless with always assign ownership to root #45919

jorgenpt opened this issue Jul 10, 2023 · 10 comments
Labels

Comments

@jorgenpt
Copy link

Description

When mount a volume for a docker run running against a rootless docker, it will always assign root ownership to the volume. This is independent of the USER in the Dockerfile, the --user passed to docker run, and the ownership of the directory. When it's mounted on a non rootless docker, the original uid ownership will be persisted.

Reproduce

mkdir -p tmp && ls -ld tmp && docker run --user $(id -u):$(id -g) -it --rm -v ./tmp:/tmp/test alpine:latest stat -c "%u" /tmp/test

Expected behavior

$(id -u), e.g. 1001, but instead it produces 0 when run against rootless docker.

docker version

Client: Docker Engine - Community
 Version:           24.0.4
 API version:       1.43
 Go version:        go1.20.5
 Git commit:        3713ee1
 Built:             Fri Jul  7 14:50:52 2023
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.4
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.5
  Git commit:       4ffc614
  Built:            Fri Jul  7 14:50:52 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
 rootlesskit:
  Version:          1.1.1
  ApiVersion:       1.1.1
  NetworkDriver:    slirp4netns
  PortDriver:       builtin
  StateDir:         /tmp/rootlesskit3945660597
 slirp4netns:
  Version:          1.0.1
  GitCommit:        6a7b16babc95b6a3056b33fb45b74a6f62262dd4

docker info

Client: Docker Engine - Community
 Version:    24.0.4
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.19.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 27
  Running: 9
  Paused: 0
  Stopped: 18
 Images: 21
 Server Version: 24.0.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  rootless
  cgroupns
 Kernel Version: 6.1.21-v8+
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 3.705GiB
 Name: ilum
 ID: c59b3fbc-1b13-49ab-a44a-bd044d4d486b
 Docker Root Dir: /home/jorgen/.local/share/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpu shares support
WARNING: No cpuset support
WARNING: No io.weight support
WARNING: No io.weight (per device) support
WARNING: No io.max (rbps) support
WARNING: No io.max (wbps) support
WARNING: No io.max (riops) support
WARNING: No io.max (wiops) support

Additional Info

If you pass in --user $(id -u):$(id -g) there's effectively no way to reliably mount a directory that that user can write to without marking the directory as a+w.

@jorgenpt jorgenpt added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Jul 10, 2023
@neersighted
Copy link
Member

That's not a volume, but a bind mount. Can you please share the subuid/subgid maps (/etc/subuid, /etc/subgid), as well as the UID/GID that owns ./tmp outside the container? This most likely is a result of an unmapped UID/GID range.

@neersighted neersighted added status/more-info-needed and removed kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels Jul 10, 2023
@jorgenpt
Copy link
Author

@neersighted

subuid:

pi:100000:65536
jorgen:165536:65536

subgid:

pi:100000:65536
jorgen:165536:65536

./tmp is owned by jorgen:jorgen (1001:1001)

@neersighted
Copy link
Member

neersighted commented Jul 10, 2023

Okay, so what's going on here is that UIDs do not match inside and outside the container. When you specify a /etc/subuid map, you're creating a range of UIDs that a certain user is allowed to map inside a userns. The issue here is that the user that owns ./tmp is 1001:1001 in the root userns. Your container is running as 166537:166537 in the root userns, which corresponds (1001 + 165536) to 1001:1001 inside the container userns.

From the perspective of the container userns, ./tmp is owned by a user outside the known range of mapped UIDs. Therefore, it is mapped to 0 (root).

In order to achieve what you intend here, you would need to align the UID/GID of a user expected to own the files outside the container with the effective UID/GID that will be observed in the host userns.

It's also worth considering if you need a bind mount, given the name (./tmp) it sounds like a tmpfs might be appropriate; likewise a volume could be used to encapsulate the files such that UID/GID mapping will be entirely contained.

We don't currently have anything along the lines of Podman's different UID/GID mapping modes (https://www.redhat.com/sysadmin/rootless-podman-user-namespace-modes; it generates the subuid/subgid maps on the fly) and I'm not an expert in how hard it would be to implement some/all of them.

@AkihiroSuda do we have a tracking issue for dynamic uid/gid mapping that's appropriate for features equivalent to those Podman options?

@jorgenpt
Copy link
Author

jorgenpt commented Jul 10, 2023

@neersighted, in this case ./tmp was just to provide a minimal repro case of it, the real use case is more complex. I could probably use a named volume instead of a bind mount though -- what governs the permissions on the volume's mount in a rootless scenario? Is it expected to be mounted to a directory in the container and will adopt the permissions of that directory, or is there some kind of automatic process for it?

My ideal case would be to not require -u to be specified at all, but simply have a persistent storage that can be written to by my non-root user in the container. :)

As an aside, why does the bind mount not translate the uid/gid from the host userns to the container userns?

@neersighted
Copy link
Member

neersighted commented Jul 10, 2023

It should be created with ownership corresponding to USER; the backing directory in the host mountns/userns will be immaterial as your subuid/subgid map will give you permissions to create those files with the high UID/GID anywere your user running dockerd can write (i.e. {data-root}/volumes).

If you're okay with running with root inside the container and rootless outside the container, you could rather straightforwardly give your outer (dockerd) user a high UID and use it as the base of the subuid/subgid maps, such that root (0:0) inside the container is your user's UID/GID outside the container.

With regard to your last point, Linux Containers are a technology built on top of primitives provided by the Linux kernel. What you are asking after is ID-mapped mounts, a feature recently merged (https://lwn.net/Articles/837566/; https://lwn.net/Articles/896255/) and not yet supported widely in the ecosystem (opencontainers/runc#2821). I suggest you add your 👍 (but not a "me too" comment) and watch related issues.

@jorgenpt
Copy link
Author

Thank you for responding & clarifying everything for me, I really appreciate it. It feels pretty unintuitive when I'm new to Docker, but at least I understand why and what a more robust path is. :)

@neersighted
Copy link
Member

I appreciate your understanding! Moby (Docker) is not a monolith, but in fact is part of an ecosystem and stands on the shoulders of giants. While we do not have full control over every aspect of the technology in this repository, building on standard abstractions provided by the Linux kernel is part of the magic, and is indeed what allows us to work "everywhere" (most modern systems, including embedded), in a consistent and reliable fashion that has enabled the widespread adoption of containers.

Likewise, the increasing modularization and re-use of components/code, while confusing to those who are not yet used to working in/around the ecosystem, has enabled the diverse set of tooling, opinions, and higher level constructs that have grown up in the last decade, including Podman, Kubernetes, and alternate managed services like ECS, in competition with or as alternatives to solutions provided here, like the 'heavy' daemon or Docker Swarm.

I'm going to close this issue for now, but if you have further thoughts, please feel free to continue the discussion here for the benefit of those who might discover this later.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2023
@ajschmidt8
Copy link

@neersighted, I appreciate all of your input here. I'm experiencing a related issue and hoping you can help. I'm trying to get your suggestion below working:

If you're okay with running with root inside the container and rootless outside the container, you could rather straightforwardly give your outer (dockerd) user a high UID and use it as the base of the subuid/subgid maps, such that root (0:0) inside the container is your user's UID/GID outside the container.

I'm using Docker-in-Docker.

Control

My control test is running these two commands:

docker run --name=dind -e DOCKER_HOST="unix:///run/user/1000/docker.sock" --rm -it --privileged docker:dind-rootless
docker exec -it dind sh -c 'mkdir -p ~/xdir && ls -ld ~/xdir && docker run --user $(id -u):$(id -g) -it --rm -v ~/xdir:/tmp/test alpine:latest stat -c "%u" /tmp/test'

This results in the following output, which was reported in the original comment

drwxr-sr-x    2 rootless rootless      4096 Oct 18 21:00 /home/rootless/xdir
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
96526aa774ef: Pull complete 
Digest: sha256:eece025e432126ce23f223450a0326fbebde39cdf496a85d8c016293fc851978
Status: Downloaded newer image for alpine:latest
0

Attempt 1

The first attempt I made was to try building the official dind-rootless image (this Dockerfile) with the patch below to match the dockerd uid with the base of the subuid/subgid maps:

diff --git a/24/dind-rootless/Dockerfile b/24/dind-rootless/Dockerfile
index 766214d..a1237fb 100644
--- a/24/dind-rootless/Dockerfile
+++ b/24/dind-rootless/Dockerfile
@@ -15,7 +15,7 @@ RUN mkdir /run/user && chmod 1777 /run/user
 
 # create a default user preconfigured for running rootless dockerd
 RUN set -eux; \
-	adduser -h /home/rootless -g 'Rootless' -D -u 1000 rootless; \
+	adduser -h /home/rootless -g 'Rootless' -D -u 100000 rootless; \
 	echo 'rootless:100000:65536' >> /etc/subuid; \
 	echo 'rootless:100000:65536' >> /etc/subgid

The image builds successfully, but I get the following error message when I try to run it:

docker run --rm -it --privileged rootless-test
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
Device "ip_tables" does not exist.
ip_tables              36864  2 iptable_nat,iptable_filter
x_tables               65536  9 iptable_nat,iptable_filter,xt_nat,xt_tcpudp,xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat,ip_tables
modprobe: can't change directory to '/lib/modules': No such file or directory
[rootlesskit:parent] error: failed to setup UID/GID map: newuidmap 57 [0 100001 1 1 100000 65536] failed: newuidmap: write to uid_map failed: Invalid argument
: exit status 1

Attempt 2

The next thing I tried was extending the image to alter the subuid/subgid maps like this:

FROM docker:dind-rootless

# the subid file looks like this:
# dockremap:165536:65536
# rootless:1000:65536

COPY subid /etc/subuid
COPY subid /etc/subgid

Again, the image builds successfully, but I get a similar error message when I try to run it:

docker run --rm -it --privileged rootless-test2
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
Device "ip_tables" does not exist.
ip_tables              36864  2 iptable_nat,iptable_filter
x_tables               65536  9 iptable_nat,iptable_filter,xt_nat,xt_tcpudp,xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat,ip_tables
modprobe: can't change directory to '/lib/modules': No such file or directory
[rootlesskit:parent] error: failed to setup UID/GID map: newuidmap 58 [0 1000 1 1 1000 65536] failed: newuidmap: write to uid_map failed: Invalid argument

Do you have any thoughts? Is what I'm trying to achieve here possible? Appreciate all the context you've provided so far.

@neersighted
Copy link
Member

Adjusting the maps inside the container is not enough; the dind container's UID mapping is bounded by the subuid/subgid of the root namespace (the "host").

You might want to review https://man7.org/linux/man-pages/man7/user_namespaces.7.html to understand which errors are returned when and what the semantics for uidmapping are.

@devnoname120
Copy link

@neersighted opencontainers/runc#3717 is merged and is now available in 1.2.0-rc.1. Any chance you could revisit implementing this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants