Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELinux options user/role/type break pod-wide SELinux level #41370

Open
jean-edouard opened this issue Aug 18, 2020 · 2 comments
Open

SELinux options user/role/type break pod-wide SELinux level #41370

jean-edouard opened this issue Aug 18, 2020 · 2 comments

Comments

@jean-edouard
Copy link

Description
This bug was noticed in k8s environments, but should also apply to anything that has a notion of pods.
A k8s issue was filed there:
kubernetes/kubernetes#90759
It was recently fixed in cri-o:
cri-o/cri-o#4071
In moby, I suspect the issue is in or around generateSecurityOpt()

More information can be found under the k8s issue, but here's a short summary:
On an SELinux-enabled setup, when creating a basic non-privileged pod (sandbox in moby?) with multiple containers and no SELinux option,
2 random categories X and Y are calculated, and all processes of all containers run as the same system_u:system_r:container_t:s0:cX,cY.
That's all good.
Now if an SELinux option is set on the pod, one would expect the same behavior as above, except with that option overridden.
For example, if type is set to my_custom_container_t, all containers should run as system_u:system_r:my_custom_container_t:s0:cX,cY.
And it almost works, except that every container now gets a different category pair! (level)

Steps to reproduce the issue:

  1. Create a pod/sandbox of containers without any SELinux options, notice all containers get the same SELinux level (s0:cX,cY)
  2. Create a pod/sandbox of containers with an SELinux option set, like type: "container_t", notice all containers get a different SELinux level

Describe the results you received:
Within the same pod, every container gets a different set of categories

Describe the results you expected:
All containers of a given pod should have the same categories

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:46:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:53 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 43
  Running: 40
  Paused: 0
  Stopped: 3
 Images: 36
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
  selinux
 Kernel Version: 5.7.11-100.fc31.x86_64
 Operating System: Fedora 31 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.51GiB
 Name: xxx
 ID: NE7A:HDUJ:42LA:NFW4:LFZT:Y5T3:DSOW:R4QE:HB5B:2OIP:XPO2:A752
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: xxx
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  172.30.0.0/16
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
Bare-metal

@thaJeztah
Copy link
Member

Hm, right, so the design for --security-opt label=xxx looks to be that when a user provided a custom label, that label is used verbatim, and no further processing is done;

moby/daemon/create.go

Lines 239 to 245 in 5c10ea6

for _, opt := range hostConfig.SecurityOpt {
con := strings.Split(opt, "=")
if con[0] == "label" {
// Caller overrode SecurityOpts
return nil, nil
}
}

At a glance, my concern would be that changing that behaviour would be a breaking change 🤔

That said, I'm not sure what is set as label in the kubernetes/pod case, for example, in the docker run case, the label is copied verbatim from what I provided:

docker create --name=test --security-opt label=foobar hello-world

docker inspect --format='{{json .HostConfig.SecurityOpt}}' test
["label=foobar"]

What does your situation show? Does the container's configuration have the "faulty" label set? (if so, it could be that the problem is higher up)

@jean-edouard
Copy link
Author

Hm, right, so the design for --security-opt label=xxx looks to be that when a user provided a custom label, that label is used verbatim, and no further processing is done;

That is true for this function. However, at a later time I believe, the missing label bits will be automatically added.
For example, a container started with --security-opt "label=type:custom_container_t" will still get a user, role and level added and its final label will look like:
system_u:system_r:custom_container_t:c1,c2

Forcing a type like that should not impact the other options, and it effectively doesn't impact user and role, but does make categories container-specific, which doesn't make much sense...

moby/daemon/create.go

Lines 239 to 245 in 5c10ea6

for _, opt := range hostConfig.SecurityOpt {
con := strings.Split(opt, "=")
if con[0] == "label" {
// Caller overrode SecurityOpts
return nil, nil
}
}

At a glance, my concern would be that changing that behaviour would be a breaking change

Could you please be more specific on what would break?

That said, I'm not sure what is set as label in the kubernetes/pod case, for example, in the docker run case, the label is copied verbatim from what I provided:

docker create --name=test --security-opt label=foobar hello-world

docker inspect --format='{{json .HostConfig.SecurityOpt}}' test
["label=foobar"]

Seem that "label=foobar" is not syntactically valid (https://docs.docker.com/engine/reference/run/#security-configuration)
I believe user/role/type/level have to be explicitly specified.
However, if it is, this should indeed be the final verbatim label of the container (i.e. ProcessLabel=foobar).

What does your situation show? Does the container's configuration have the "faulty" label set? (if so, it could be that the problem is higher up)

Creating a k8s pod with 2 containers shows 3 lines in docker ps. I assume one of them is the sandbox.

If I start a pod with no SELinux option, the docker inspect command above returns:

docker inspect --format='{{json .HostConfig.SecurityOpt}}' 52209dfb9a4a
["seccomp=unconfined"]

docker inspect --format='{{json .HostConfig.SecurityOpt}}' 70bd872231db
["seccomp=unconfined","label=user:system_u","label=role:system_r","label=type:container_t","label=level:s0:c494,c667"]

docker inspect --format='{{json .HostConfig.SecurityOpt}}' 9886b796e542
["seccomp=unconfined","label=user:system_u","label=role:system_r","label=type:container_t","label=level:s0:c494,c667"]

If I start a pod with "user" forced to "system_u" (the default value), I get:

docker inspect --format='{{json .HostConfig.SecurityOpt}}' 4b9af4f8bcfd
["label=user:system_u","seccomp=unconfined"]

docker inspect --format='{{json .HostConfig.SecurityOpt}}' fc156f05b021
["label=user:system_u","seccomp=unconfined"]

docker inspect --format='{{json .HostConfig.SecurityOpt}}' 6af446c0ea81
["label=user:system_u","seccomp=unconfined"]

However, the ProcessLabel (which was the same for all containers in the previous example) ends up being in this case:

docker inspect --format='{{json .ProcessLabel}}' 4b9af4f8bcfd
"system_u:system_r:container_t:s0:c413,c531"

docker inspect --format='{{json .ProcessLabel}}' fc156f05b021
"system_u:system_r:container_t:s0:c141,c1004"

docker inspect --format='{{json .ProcessLabel}}' 6af446c0ea81
"system_u:system_r:container_t:s0:c286,c362"

rhrazdil pushed a commit to rhrazdil/kubevirtci that referenced this issue Feb 9, 2021
Intent of this change is to switch from using docker container engine to crio.
The PR keeps the doors open for switching back to docker or any other CE if needed.

There are two reasons to switch to crio:
docker deprecation in k8s
docker issue with selinux categories moby/moby#41370

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
rhrazdil pushed a commit to rhrazdil/kubevirtci that referenced this issue Feb 9, 2021
Intent of this change is to switch from using docker container engine to crio.

There are two reasons to switch to crio:
docker deprecation in k8s
docker issue with selinux categories moby/moby#41370

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
rhrazdil pushed a commit to rhrazdil/kubevirtci that referenced this issue Feb 25, 2021
Intent of this change is to switch from using docker container engine to crio.

There are two reasons to switch to crio:
docker deprecation in k8s
docker issue with selinux categories moby/moby#41370

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
kubevirt-bot pushed a commit to kubevirt/kubevirtci that referenced this issue Feb 25, 2021
* Move k8s 1.20 to crio CE

Intent of this change is to switch from using docker container engine to crio.

There are two reasons to switch to crio:
docker deprecation in k8s
docker issue with selinux categories moby/moby#41370

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>

* Use custom repository mirror

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>

* Symlink docker to podman

Since docker is used outside kubevirtci, we still need to support it.
crictl however doesn't provide the same functiontality as docker.

In this commit, I'm linking docker to podman instead, which is designed
to have the same API.

Signed-off-by: Radim Hrazdil <rhrazdil@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 19, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
That workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 19, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 19, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 19, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 22, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 22, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Aug 31, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Sep 1, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
jean-edouard added a commit to jean-edouard/kubevirt that referenced this issue Sep 2, 2022
A bug in docker (moby/moby#41370) forces
us to implement a workaround in the SELinux options of virt-launcher.
This workaround (disabling categories on non-compute containers)
weakens the security of the project for all when it is only needed
for a very specific use-case: deployments on clusters
that have docker nodes with SELinux enforced on the system and
enabled in the docker configuration (it is disabled by default).

That bug was filed more than 2 years ago, so it is time to disable
the workaround by default, with a feature gate for those who need it.

Signed-off-by: Jed Lejosne <jed@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants