Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.10] --cap-add=SYS_ADMIN change of behavior? #20082

Closed
beetree opened this issue Feb 6, 2016 · 14 comments · Fixed by #20245
Closed

[1.10] --cap-add=SYS_ADMIN change of behavior? #20082

beetree opened this issue Feb 6, 2016 · 14 comments · Fixed by #20245
Labels
area/security/seccomp priority/P3 Best effort: those are nice to have / minor issues.
Milestone

Comments

@beetree
Copy link

beetree commented Feb 6, 2016

Hey,

I'm running an image with systemd. I pass --cap-add=SYS_ADMIN --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro to docker run. I've been running it in 1.9.1 without problems, but now that I update to 1.10 it throws the error (when trying to use systemctl):

Error getting authority: Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)
Failed to connect to bus: No such file or directory

Passing --privileged instead of --cap-add=SYS_ADMIN solves the problem in 1.10.

Here's the base info on the 1.10 that throws the errors:

# docker info
Containers: 202
 Running: 146
 Paused: 0
 Stopped: 56
Images: 181
Server Version: 1.10.0
Storage Driver: devicemapper
 Pool Name: docker-9:2-4982975-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 32.21 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 85.32 GB
 Data Space Total: 214.7 GB
 Data Space Available: 129.4 GB
 Metadata Space Used: 73.48 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.074 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.99 (2015-06-20)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.19.8-031908-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 125.9 GiB
Name: Ubuntu-1510-wily-64-minimal
ID: QX4T:NA5A:DJDM:LTE4:7KV3:SF2D:I4GW:6HTE:BEZT:DPT4:K4ZH:GYYH
Debug mode (server): true
 File Descriptors: 892
 Goroutines: 1208
 System Time: 2016-02-07T00:03:34.918834404+01:00
 EventsListeners: 0
 Init SHA1: 0fab8563cbfa5ba7c182919f38b1fac541d116d0
 Init Path: /usr/lib/docker/dockerinit
 Docker Root Dir: /var/lib/docker
WARNING: No swap limit support

# docker version
Client:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:41:30 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:41:30 2016
 OS/Arch:      linux/amd64

# uname -a
Linux Ubuntu-1510-wily-64-minimal 3.19.8-031908-generic #201505110938 SMP Mon May 11 13:39:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The systemd image Dockerfile is:

FROM ubuntu:16.04

ENTRYPOINT ["/lib/systemd/systemd"]

Here's a full session to recreate the issue (in 1.10):

root@Ubuntu-1510-wily-64-minimal ~/test # cat > Dockerfile
FROM ubuntu:16.04

ENTRYPOINT ["/lib/systemd/systemd"]
root@Ubuntu-1510-wily-64-minimal ~/test # docker build -t testimage .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
16.04: Pulling from library/ubuntu
8a2df099fc1a: Already exists
09aa8e119200: Already exists
21a4b8922479: Already exists
a3ed95caeb02: Already exists
Digest: sha256:c6e64f3be4e674287d36998e3f087c077ebc97c7ff4f335ea33f50240e091ee5
Status: Downloaded newer image for ubuntu:16.04
 ---> 71aa5f3f90dc
Step 2 : ENTRYPOINT /lib/systemd/systemd
 ---> Running in fcba35eff7e4
 ---> 1c988734e844
Removing intermediate container fcba35eff7e4
Successfully built 1c988734e844
root@Ubuntu-1510-wily-64-minimal ~/test # docker run -d --cap-add=SYS_ADMIN --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro testimage
de3c3e2f082b1b1d01503a5192e40478bcf2a74290f3783434fed61507550a70
root@Ubuntu-1510-wily-64-minimal ~/test # docker exec -it de3c3e2f082b1b1d01503a5192e40478bcf2a74290f3783434fed61507550a70 /bin/bash
root@de3c3e2f082b:/# systemctl
Failed to connect to bus: No such file or directory

And here is the same with --privileged:

root@Ubuntu-1510-wily-64-minimal ~/test # docker run -d --privileged --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro testimage
1d53e1c9bc24e8432454b92d431d6c7282c67d3b67d39225ee9c1f661047b677
root@Ubuntu-1510-wily-64-minimal ~/test # docker exec -it 1d53e1c9bc24e8432454b92d431d6c7282c67d3b67d39225ee9c1f661047b677 /bin/bash
root@1d53e1c9bc24:/# systemctl
UNIT                                                                                                     LOAD   ACTIVE     SUB       DESCRIPTION
proc-sys-fs-binfmt_misc.automount                                                                        loaded active     waiting   Arbitrary Executable File Formats File System Automount Point
...

And the SYS_ADMIN for 1.9.1:

root@ubuntu:~/test# docker run -d --cap-add=SYS_ADMIN --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro testimage
9dda9ff848660d6d7e0eacae26ff1c6a6555265019b1a4482e1523940ee1f056
root@ubuntu:~/test# docker exec -it 9dda9ff848660d6d7e0eacae26ff1c6a6555265019b1a4482e1523940ee1f056 /bin/bash
root@9dda9ff84866:/# systemctl
UNIT                              LOAD   ACTIVE SUB       DESCRIPTION
-.mount                           loaded active mounted   /
dev-hugepages.mount               loaded active mounted   Huge Pages File Systemdev-mqueue.mount                  loaded active mounted   POSIX Message Queue File System
...

I realize that this may not be a bug in Docker, and I realize that running systemd in a docker container isn't really the ideal usage of containers. That said, I would really appreciate any help!

EDIT:
Adding full background information on the 1.9.1 setup:

root@ubuntu:~/test# docker info
Containers: 51
Images: 502
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-8:1-1053028-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 2.147 GB
 Backing Filesystem:
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 16.02 GB
 Data Space Total: 214.7 GB
 Data Space Available: 21.73 GB
 Metadata Space Used: 26.9 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.121 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.90 (2014-09-01)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.8-031908-generic
Operating System: Ubuntu 15.04
CPUs: 24
Total Memory: 7.795 GiB
Name: ubuntu
ID: O6JD:MGK4:2SWN:D2TC:SIRV:53OE:IKCM:2C37:YZDF:2XOO:HQBF:UXZY
Username: eleet
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
root@ubuntu:~/test# docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:16:54 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:16:54 UTC 2015
 OS/Arch:      linux/amd64
root@ubuntu:~/test# uname -a
Linux ubuntu 3.19.8-031908-generic #201505110938 SMP Mon May 11 13:39:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

EDIT2:
Same issue using ubuntu 15.10 as base image (on docker 1.10 with SYS_ADMIN).

/beetree

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 7, 2016

Docker 1.10 includes a default seccomp profile which is likely blocking the syscalls you are wanting to make.
What you'll have to do is find out what syscalls it is making and generate your own custom seccomp profile to allow those syscalls... your you can --security-opt seccomp:unconfined

@thaJeztah
Copy link
Member

Perhaps we should document this; it is a change in behavior

@thaJeztah
Copy link
Member

ping @jfrazelle I'll assign this one to you; to decide if the default profile needs changes, or the documentation.

@jessfraz
Copy link
Contributor

jessfraz commented Feb 8, 2016

This is behaving as expected

On Sunday, February 7, 2016, Sebastiaan van Stijn notifications@github.com
wrote:

ping @jfrazelle https://github.com/jfrazelle I'll assign this one to
you; to decide if the default profile needs changes, or the documentation.


Reply to this email directly or view it on GitHub
#20082 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@thaJeztah
Copy link
Member

@jfrazelle if it's indeed a change in behavior due to the seccomp profiles, I think we should add a note to the docs, explaining that besides adding capabilities, they may have to pass a custom seccomp profile?

@jessfraz
Copy link
Contributor

jessfraz commented Feb 8, 2016

Ya agreed will do

On Sunday, February 7, 2016, Sebastiaan van Stijn notifications@github.com
wrote:

@jfrazelle https://github.com/jfrazelle if it's indeed a change in
behavior due to the seccomp profiles, I think we should add a note to the
docs, explaining that besides adding capabilities, they may have to
pass a custom seccomp profile?


Reply to this email directly or view it on GitHub
#20082 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@icecrime icecrime added the priority/P2 Normal priority: default priority applied. label Feb 8, 2016
@tiborvass tiborvass added priority/P3 Best effort: those are nice to have / minor issues. area/security/seccomp and removed priority/P2 Normal priority: default priority applied. labels Feb 10, 2016
@jessfraz
Copy link
Contributor

docs are updated here #20245

@beetree
Copy link
Author

beetree commented Feb 14, 2016

Confirmed working.

To wrap up, anyone previously using --cap-add=SYS_ADMIN --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro to run systemd in a container should now be using --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --cap-add=SYS_ADMIN --security-opt=seccomp:unconfined to achieve the same thing.

Unclear to me if this has any serious security implications, but my guess is that it is as secure as in 1.9.1.

/b3

@jessfraz
Copy link
Contributor

It's the same as 1.9.1 but better would be to make a custom seccomp profile
w mount added, based off our default profile

On Saturday, February 13, 2016, beetree notifications@github.com wrote:

Confirmed working.

To wrap up, anyone previously using --cap-add=SYS_ADMIN
--volume=/sys/fs/cgroup:/sys/fs/cgroup:ro to run systemd in a container
should now be using --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro
--cap-add=SYS_ADMIN --security-opt=seccomp:unconfined to achieve the same
thing.

Unclear to me if this has any serious security implications, but my guess
is that it is as secure as in 1.9.1.

/b3


Reply to this email directly or view it on GitHub
#20082 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@rhatdan
Copy link
Contributor

rhatdan commented Feb 24, 2016

With SELinux I attempted to avoid control of capability checks and leave this to capabilities. I am not sure if this is possible with seccomp. Do we have any idea what seccomp block was blocking this access from happening?

Turning off all of seccomp for any use of capabilities is a usability problem.

@eparis
Copy link
Contributor

eparis commented Feb 24, 2016

Is the problem the capability, or the fact that systemd inside the container is trying to do mount virtual filesystems and seccomp is blocking it. I'm guessing the capability thing is (largely) unrelated... Only option if you need to call mount inside the container is to have seccomp allow the mount syscall....

@rhatdan
Copy link
Contributor

rhatdan commented Feb 25, 2016

Right, I don't think the mount syscall should be blocked by default if it is blocked by a different security mechanism. I just think that if we have multiple security mechanisms blocking access, users are not going to easily figure out why, and they will start killing all security.

--privileged

Since we don't have FRIENDLY eperm, there is no easy way to know mount is blocked by SELinux, Capabilities or SECCOMP.

@kaorukobo
Copy link

@beetree I tried the following options on my docker 1.12.3, but systemctl still saysFailed to connect to bus...

To wrap up,.... systemd in a container should now be using --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --cap-add=SYS_ADMIN --security-opt=seccomp:unconfined to achieve the same thing.

@rhatdan
Copy link
Contributor

rhatdan commented Mar 3, 2017

@beetree Did you mount tmpfs on /run?

docker run -ti --tmpfs /run -v /sys/fs/cgroup:/sys/fs/cgroup:ro INITCONTIANER

Should work, (You must have container=oci or something in environment).

Or just use projectatomic/docker with oci-systemd-hook and it will work with no options required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security/seccomp priority/P3 Best effort: those are nice to have / minor issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants