Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capabilities not honoured when image was built under userns #41723

Closed
EricMountain opened this issue Nov 27, 2020 · 0 comments · Fixed by #41724
Closed

Capabilities not honoured when image was built under userns #41723

EricMountain opened this issue Nov 27, 2020 · 0 comments · Fixed by #41724

Comments

@EricMountain
Copy link
Contributor

Description

When images are built by a deamon running with user namespaces, extended attribute capabilities are saved in v3 format, thereby including the UID of the root user of the user namespace.

Running such images inside another runtime, either with no user-namespacing, or with a different user-namespace setup for the UID of the root user, the effective bit is ignored by execve(2) as the UID does not match.

For images built with the userns feature to be portable across runtimes, we need capabilities to be saved in v2 format in the image layer archives.

Steps to reproduce the issue:

A reproducer can be found here.

Describe the results you received:

End of the reproducer output:

...
    default: + docker run --rm capabilities-built-with-no-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000400
    default: CapEff:	0000000000000400
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000
    default: + docker run --rm capabilities-built-with-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000000
    default: CapEff:	0000000000000000
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000

In the output above, the 2nd set of Cap* should match the first set. execve(2) has ignored the effective bit on sleep-test in the 2nd case because the root UID stored in the extended attribute does not match the runtime.

Describe the results you expected:

Expect to be able to run images built on user-namespaced environments in non-user-namespaced environments (or with a different user-namespace owner UID) and have the effective bit for capabilities honoured by execve(2).

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:20 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

This is for the non-user-namespaced configuration, however the user-namespaced configuration is identical, with userns added to the Security Options section:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-54-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 981.1MiB
 Name: ubuntu-focal
 ID: K6QA:QH7R:K6QB:MYYB:URWF:RO3V:E6A5:KCJ4:TSRS:5HPK:ZKAC:53UN
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

N/A

EricMountain added a commit to EricMountain/moby that referenced this issue Dec 7, 2020
Capabilities are serialised in VFS_CAP_REVISION_3 when an image is
built in a user-namespaced daemon, instead of VFS_CAP_REVISION_2.

This adds a test for this, though it's currently wired to fail if
the capabilities are serialised in VFS_CAP_REVISION_2 instead in this
situation, since this is unexpected.

Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>
EricMountain added a commit to EricMountain/moby that referenced this issue Dec 7, 2020
Capabilities are serialised in VFS_CAP_REVISION_3 when an image is
built in a user-namespaced daemon, instead of VFS_CAP_REVISION_2.

This adds a test for this, though it's currently wired to fail if
the capabilities are serialised in VFS_CAP_REVISION_2 instead in this
situation, since this is unexpected.

Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>
EricMountain added a commit to EricMountain/moby that referenced this issue Dec 18, 2020
Capabilities are serialised in VFS_CAP_REVISION_3 when an image is
built in a user-namespaced daemon, instead of VFS_CAP_REVISION_2.

This adds a test for this, though it's currently wired to fail if
the capabilities are serialised in VFS_CAP_REVISION_2 instead in this
situation, since this is unexpected.

Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants