Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document kubeadm usage with SELinux #279

Open
luxas opened this issue May 29, 2017 · 75 comments

Comments

@luxas
Copy link
Member

commented May 29, 2017

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

COMMUNITY REQUEST

Versions

All

We need e2e tests that ensure kubeadm works with SELinux on CentOS/Fedora (#215) and CoreOS (#269)

We might be able to add a job for it on kubernetes-anywhere @pipejakob ?

IIUC kubeadm is broken with SELinux enabled right now. The problem is that we don't have one (AFAIK) very experienced with SELinux in the kubeadm team (at least nobody has had time to look into it yet)

AFAIK, the problem is often when mounting hostPath volumes...

To get closer to production readiness, we should fix this and add a testing suite for it.
We should also work with CNI network providers to make sure they adopt the right SELinux policies as well.

Anyone want to take ownership here? I'm not very experienced with SELinux, so I'm probably gonna focus on other things.

@dgoodwin @aaronlevy @coeki @rhatdan @philips @bboreham @mikedanese @pipejakob

@luxas

This comment has been minimized.

Copy link
Member Author

commented May 29, 2017

@pipejakob I added the kind/postmortem label as it's in the same theme, we broke SELinux users again without noticing it...

@rhatdan

This comment has been minimized.

Copy link

commented May 30, 2017

I don't work with kubadmin but would be very willing to help whoever takes this on.

@luxas

This comment has been minimized.

Copy link
Member Author

commented May 30, 2017

@rhatdan Great! What I'm looking for is persons that are familiar with SELinux and willing to help.
I might be able to coordinate the work though.

A rough todo list would look like:

  • Make kubeadm work with SELinux enabled in v1.7
  • Make an e2e suite of CentOS/Fedora nodes that will notify us if there is a regression.
  • Look into the CoreOS issue and how the SELinux setup between CentOS and CoreOS differs.

@rhatdan Let's first try and get it working in v1.7, can be done in #215

@roberthbailey

This comment has been minimized.

Copy link
Member

commented May 30, 2017

@coeki

This comment has been minimized.

Copy link

commented May 31, 2017

I will take for now, since I raised it. I'll have some updates soon, @rhatdan, please advise me ;)

@timothysc

This comment has been minimized.

Copy link
Member

commented Jul 11, 2017

@luxas , @jasonbrooks - does this still exist in fedora?

I think folks have patched policies on other channels.

/cc @eparis

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 11, 2017

@timothysc I haven't tried w/ 1.7 yet, but w/ 1.6, CentOS worked w/ selinux but Fedora 25 didn't. I'll test w/ 1.7

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 11, 2017

for reference, I just ran kubeadm 1.7 on f26 in permissive mode, and these are the denials I got:

[root@fedora-1 ~]# ausearch -m avc -ts recent
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:321): avc:  denied  { read } for  pid=2885 comm="kube-apiserver" name="apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:03:50 2017
type=AVC msg=audit(1499792630.959:322): avc:  denied  { open } for  pid=2885 comm="kube-apiserver" path="/etc/kubernetes/pki/apiserver.crt" dev="dm-0" ino=16820634 scontext=system_u:system_r:container_t:s0:c171,c581 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:331): avc:  denied  { read } for  pid=2945 comm="kube-controller" name="sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1
----
time->Tue Jul 11 13:04:18 2017
type=AVC msg=audit(1499792658.917:332): avc:  denied  { open } for  pid=2945 comm="kube-controller" path="/etc/kubernetes/pki/sa.key" dev="dm-0" ino=16820637 scontext=system_u:system_r:container_t:s0:c755,c834 tcontext=unconfined_u:object_r:cert_t:s0 tclass=file permissive=1

On CentOS 7, same thing, no denials.

@rhatdan

This comment has been minimized.

Copy link

commented Jul 11, 2017

You are volume mounting in content from the host into a container. If you want an SELinux confined process inside the container to be able to read the content, it has to have an SELinux label that the container is allowed to read.

Mounting the object with :Z or :z would fix the issue. Note either of these would allow the container to write these objects. If you want to allow the container to read without writing then you could change the content on the host to something like container_share_t.

@luxas

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2017

kubernetes/kubernetes#48607 will also help here as it starts making mounting everything but etcd read-only...

@timothysc

This comment has been minimized.

Copy link
Member

commented Jul 12, 2017

@luxas @jasonbrooks - someone want to tinker with adjusting the manifests ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ) ?

@luxas

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2017

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 12, 2017

@rhatdan It looks like :Z is only used if the pod provides an selinux label. In my initial tests, container_runtime_t seems to work -- would that be an appropriate label? And then, I'm assuming in a system w/o selinux, this would just be ignored?

@rhatdan

This comment has been minimized.

Copy link

commented Jul 12, 2017

Yes it will be ignored by non SELinux systems. RUnning an app as container_runtime_t, basically provides no SELinux confinement, since it is supposed to be the label of container runtimes like docker and CRI-O. If you are running the kublet as this, that is probably fairly accurate.

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 12, 2017

Right now, we're running the etcd container as spc_t -- would it be better to run that one as container_runtime_t too?

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 12, 2017

It looks like this does it:

diff --git a/cmd/kubeadm/app/master/manifests.go b/cmd/kubeadm/app/master/manifests.go
index 55fe560c46..228f935cdd 100644
--- a/cmd/kubeadm/app/master/manifests.go
+++ b/cmd/kubeadm/app/master/manifests.go
@@ -96,6 +96,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(int(cfg.API.BindPort), "/healthz", api.URISchemeHTTPS),
                        Resources:     componentResources("250m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeControllerManager: componentPod(api.Container{
                        Name:          kubeControllerManager,
@@ -105,6 +106,7 @@ func WriteStaticPodManifests(cfg *kubeadmapi.MasterConfiguration) error {
                        LivenessProbe: componentProbe(10252, "/healthz", api.URISchemeHTTP),
                        Resources:     componentResources("200m"),
                        Env:           getProxyEnvVars(),
+                        SecurityContext: &api.SecurityContext{SELinuxOptions: &api.SELinuxOptions{Type: "container_runtime_t",}},
                }, volumes...),
                kubeScheduler: componentPod(api.Container{
                        Name:          kubeScheduler,

Would this be something to submit as PRs to the 1.7 branch and to master, or just to master? The source moved around a bit in master, the patch above is to the 1.7 branch.

@rhatdan

This comment has been minimized.

Copy link

commented Jul 13, 2017

I would actually prefer that it run as spc_t, or as a confined domain(container_t). etcd should be easily be able to be confined by SELinux.

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 13, 2017

I think spc_t should work. I tried w/ container_t and that didn't work. audit2allow says it needs:

allow container_t cert_t:file { open read };

@rhatdan

This comment has been minimized.

Copy link

commented Jul 13, 2017

Could we relabel the certs directory with container_file_t or container_share_t. then it would work.

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 13, 2017

kubeadm creates an /etc/kubernetes/pki dir when you run kubeadm init, but when you kubeadm reset, it only empties that dir. If we created the pki dir when the rpm is installed, we could do the labeling at that point, by modding the spec file.

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 13, 2017

For etcd, the container would need allow container_t container_var_lib_t:file { create lock open read unlink write }; for /var/lib/etcd on the host.

@jasonbrooks

This comment has been minimized.

Copy link

commented Jul 14, 2017

I'm trying to figure out if it's legitimate to chcon directories in the rpm spec file -- I see many instances of it in github (https://github.com/search?l=&p=1&q=chcon+extension%3Aspec) but I can't tell whether that's considered good packaging practice or not. We could either change kubeam to run the components as spc_t, unconfined, or we could leave kubeadm alone and chcon the pki dir.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 4, 2019

but I suspect most CNI plugins will be eager to accept selinux patches ;)

yes, hopefully.

The selinux context is the same as if it were running as root; however, we should be careful because I suspect that this could change in the future. Right now the only difference is it's pid 1001 instead of 0. As a result, changing these containers to run as non-root shouldn't break this fix as spc_t's privileges are a strict superset of container_t's privileges.

ok, thanks. that's good to know.

@rcythr

This comment has been minimized.

Copy link

commented Jul 5, 2019

Here's the patch I cooked up so far. Seems to generate the right manifests. rcythr/kubernetes@fb6c024

I'll create a PR once I've tested a couple other things and I've after signed the CLA -- I currently have a support ticket in with CNCF due to a mail issue.

The three things I'm currently wanting to test:

  1. I want to look into incorporating some of the changes by @randomvariable to more tightly confine some of the containers, where possible. I believe his change will allow us to tightly confine kube-scheduler and etcd; however, because we cannot relabel /etc/ssl/certs or /etc/pki without breaking the system, we cannot confine kube-apiserver or kube-controller-manager tighter than spc_t. We either need to create a custom type for them, stop using these two system directories, or just live with the spc_t type.
  2. Looking over the history of this ticket, I noticed some chatter about issues with spc_t on CoreOS. I want to do some testing to see if that's still a problem, or what was going on there.
  3. While I believe this is only tangentially related to PR, I want to test flannel and weave net under selinux enforcing.
@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

Thanks for taking this on @rcythr .

Could be wrong, but I don't think I've had issues with CNIs once they've had the privileged flag on.

Just as a dump of stuff I've had to do at least for Fedora 28, so maybe these are now incorporated into container-selinux:

Create the following modules:

Allows Cilium and Weave Scope to work

module container_bpf 1.0;

require {
  type spc_t;
  type container_runtime_t;
  class bpf { map_create prog_load prog_run map_write map_read };
}

#============= spc_t ==============
allow container_runtime_t self:bpf { map_create prog_load prog_run map_write map_read };
allow container_runtime_t self:bpf map_create;
allow container_runtime_t spc_t:bpf { map_read map_write };
allow spc_t container_runtime_t:bpf { map_create prog_load prog_run map_write map_read };
allow spc_t self:bpf { map_create prog_load prog_run map_write map_read };

Allow containers to load certificates

module container_cert 1.0;

require {
  type container_t;
  type cert_t;
  class file { open read };
  class dir { read };
  class lnk_file { read };
}

allow container_t cert_t:file { open read };
allow container_t cert_t:lnk_file { read };
allow container_t cert_t:dir { read };
@rosti

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

The thing, that I am concerned the most is the testing situation here. Currently we have no way to automatically perform a smoke test of SELinux support with kubeadm and k8s as a whole. kind/kinder base their node images on ubuntu 19.04. This, of course, does not have SELinux enabled in it by default.
We need to have an ability to spin up a testing cluster with kinder on a Fedora/CentOS base node image with SELinux enabled. Once we can do that we can maintain a working state of SELinux support through the k8s test-grid.

If we don't handle the testing situation properly, we might end up in a "works today, but might not work tomorrow" situation and angry users over claimed SELinux support in our documentation.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

100% agreed.

I'm tempted to say we hold off until CentOS 8 goes GA so we have a consistent baseline wrt to the kernel version across distros.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

If we don't handle the testing situation properly, we might end up in a "works today, but might not work tomorrow" situation and angry users over claimed SELinux support in our documentation.

if selinux is something that strictly requires e2e tests, we cannot support it today.

problem is that the ecosystem has some many distros and flavors that we cannot test them all.
if we don't want to maintain code for selinux in kubeadm, we can still have a guide in our setup with a disclaimer "may not work" and this was my initial proposal.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

If we sort out #1379, this would take us a long way towards enabling users who want to have a stricter SELinux setup.

I will gather instructions on how you can do SELinux in an unsupported fashion today - either as a blogpost or a doc that can go on docs.k8s.io.

Additionally, @TheFoxAtWork did raise SELinux at CNCF SIG Security this week, so wondering if there's broader interest in getting this working.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

problem is that the ecosystem has some many distros and flavors that we cannot test them all.

One thing that helps here is that the container-selinux package is shared across all of the distros, so maybe ok to add only one. Suggestion is to add CentOS 8 because it'll be on Linux 4.18, which is slightly ahead of Ubuntu 18.04 but not massive enough to start exhibiting other bugs. Amazon Linux 2 is also an option if we're testing on it elsewhere.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

problem is that the ecosystem has some many distros and flavors that we cannot test them all.

One thing that helps here is that the container-selinux package is shared across all of the distros, so maybe ok to add only one. Suggestion is to add CentOS 8 because it'll be on Linux 4.18, which is slightly ahead of Ubuntu 18.04 but not massive enough to start exhibiting other bugs. Amazon Linux 2 is also an option if we're testing on it elsewhere.

i guess my point was more about the fact that selinux is not something that is really supported on Ubuntu and AppArmor is the alternative for it.

https://security.stackexchange.com/a/141716

Now practically SElinux works better with Fedora and RHEL as it comes preshipped while AA works better on Ubuntu and SUSE which means it would be better to learn how to use SElinux on the former distros than going through the hassel of making AA work on them and vice versa.

this is the distro flavor mess that i don't want to get kubeadm into.

the kubeadm survey told us that 65% of our users use Ubuntu, so technically we should be prioritizing apparmor. has anyone tried kubeadm with apparmor?
xref https://kubernetes.io/docs/tutorials/clusters/apparmor/

If we sort out #1379, this would take us a long way towards enabling users who want to have a stricter SELinux setup.

yes, it feels to me we should just document some basic details and punt the rest to #1379.
but i'm also seeing demand for static pod configuration enhancements in the next v1betaX, because RN we cannot persist securitycontext modifications after upgrade.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

has anyone tried kubeadm with apparmor?

AppArmor is much more lightweight than SELinux and has a different security model. It's pretty much on by default these days for Docker on Ubuntu.

this is the distro flavor mess that i don't want to get kubeadm into.

Given AppArmor can't be used on CentOS and equivalents (other than AL2 which supports both), we're already there in saying some percentage of users can't make use of a Linux Security Module in a supported fashion.

@rcythr

This comment has been minimized.

Copy link

commented Jul 5, 2019

this is the distro flavor mess that i don't want to get kubeadm into.

I definitely understand the desire to not get into the mess of distros and options -- it's a combinatorial explosion of test configurations. Personally, I believe if kubeadm was at least compatible with selinux it would have a larger share of non-ubuntu users, but I have no proof of that beyond the fact I'm one of those people. However, If the only distro/cri combination that's tested is ubuntu with docker, then that's really the only supported distro/cri.

If you don't want to support other configurations that's your choice, but at least be clear about that in the documentation and close this issue now. Telling centos/rhel/fedora users to disable selinux for their entire system because figuring out (and testing) the policy for an application is annoying is the equivalent to telling them to disable their firewall because figuring out (and testing) the rules is annoying.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

@randomvariable

AppArmor is much more lightweight than SELinux and has a different security model. It's pretty much on by default these days for Docker on Ubuntu.

actually, i think it's already running in the prow/kubekins image.

@rcythr

If the only distro/cri combination that's tested is ubuntu with docker, then that's really the only supported distro/cri.

currently we are testing containerd and docker on Ubuntu.

Telling centos/rhel/fedora users to disable selinux for their entire system because figuring out (and testing) the policy for an application is annoying is the equivalent to telling them to disable their firewall because figuring out (and testing) the rules is annoying.

we need help with the selinux details. we already tell "CentOS, RHEL or Fedora" users to disable selinux completely:
https://github.com/kubernetes/website/blob/master/content/en/docs/setup/production-environment/tools/kubeadm/install-kubeadm.md#installing-kubeadm-kubelet-and-kubectl

this isn't desired and i still think we should have a document or a paragraph with some guiding steps.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

Filed containerd/cri#1195 to log the fact that there's still work to be done to complete SELinux support in ContainerD.

@yann-soubeyrand

This comment has been minimized.

Copy link

commented Jul 6, 2019

  1. I want to look into incorporating some of the changes by @randomvariable to more
    tightly confine some of the containers, where possible. I believe his change will allow us
    to tightly confine kube-scheduler and etcd; however, because we cannot relabel
    /etc/ssl/certs or /etc/pki without breaking the system, we cannot confine kube-apiserver
    or kube-controller-manager tighter than spc_t. We either need to create a custom type
    for them, stop using these two system directories, or just live with the spc_t type.

Not using system directories or creating custom types for components needing access to these system directories would allow us to further tighten the various components and avoid using spc_t type completely. This is clearly the best solution IMHO.

@rcythr

This comment has been minimized.

Copy link

commented Jul 6, 2019

  1. I want to look into incorporating some of the changes by @randomvariable to more
    tightly confine some of the containers, where possible. I believe his change will allow us
    to tightly confine kube-scheduler and etcd; however, because we cannot relabel
    /etc/ssl/certs or /etc/pki without breaking the system, we cannot confine kube-apiserver
    or kube-controller-manager tighter than spc_t. We either need to create a custom type
    for them, stop using these two system directories, or just live with the spc_t type.

Not using system directories or creating custom types for components needing access to these system directories would allow us to further tighten the various components and avoid using spc_t type completely. This is clearly the best solution IMHO.

I agree. That's why I wanted to test it out and get it working.

Based on feedback from @neolit123, I doubt we'll see built-in changes to the code to automatically handle selinux compatibility anytime soon. Instead I'm going to make a doc page that describes how to use kubeadm on systems with selinux. It'll be a few steps longer than the usual kubeadm init process, but it should help anyone who wants to use kubeadm on selinux systems immediately.

On that page I'll present three options:

  1. Disable selinux: This is the least confined option, but the most supported. It may be necessary for some CNI plugins or user workloads.
  2. Keep using system directories /etc/pki and /etc/ssl/certs and use spt_t to avoid problems.
  3. Custom directories, and chcon relabeling and pretty tight confinement.
@qpehedm

This comment has been minimized.

Copy link

commented Jul 10, 2019

Hi, just wanted to comment that we have been running with selinux enabled for almost a year together with kubeadm and kubernetes+docker+calico on Centos7. Most workloads has no issues, had some issues with concourse only that i can recall.

However we would like to do better enforcement though at some point, currently we run with
'chcon -Rt container_file_t' on below directories and make sure they are created before running kubeadm init (including /etc/kubernetes/pki/etcd)

/var/lib/etcd (etcd datadir)
/etc/kubernetes/pki (`certificatesDir' in kubeadm.conf)
/etc/cni/net.d
/opt/cni/bin/

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019 — with Octobox

@qpehedm If you could apply a JSONPatch to the static pod manifests, and also make sure kubeadm runs restorecon for each file/directory it writes, would that be sufficient to allow you to set stricter confinement?

@qpehedm

This comment has been minimized.

Copy link

commented Jul 11, 2019

@randomvariable Yes to add securitycontext with the spc_t for the k8s static pods as suggested is likely better than our current solution. I suppose the same needs to be done for CNI plugins or other infrastructure containers. Even better would be dedicated labels for this purpose, so that they only get permissions for the specific files needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.