Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm doest't configure kube-scheduler pods to mount k8s cert directories #80063

Open
vkhromov opened this issue Jul 11, 2019 · 16 comments

Comments

Projects
None yet
4 participants
@vkhromov
Copy link

commented Jul 11, 2019

What happened:
kube-scheduler failed to start with certificate/key files from /etc/kubernetes/pki/ directory.

What you expected to happen:
kube-scheduler starts successfully and uses provided certificate/key files from /etc/kubernetes/pki/ directory.

How to reproduce it (as minimally and precisely as possible):

  • Generate /etc/kubernetes/pki/scheduler.crt and /etc/kubernetes/pki/scheduler.key files containing certificate and key.
  • Add --tls-cert-file=/etc/kubernetes/pki/scheduler.crt and --tls-private-key-file=/etc/kubernetes/pki/scheduler.key args to a kube-scheduler pod.
  • kube-scheduler fails to start printing error about above files not being available.

Anything else we need to know?:
nope

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.1-dirty", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"dirty", BuildDate:"2019-07-01T19:30:04Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    AWS
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
  • Kernel (e.g. uname -a):
    Linux kubestage1-uswest1adevc 4.4.0-1085-aws #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    kubeadm
  • Network plugin and version (if this is a network-related bug):
    N/A
  • Others:

@kubernetes/sig-cluster-lifecycle-kubeadm

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 11, 2019

/sig cluster-lifecycle

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

Could you explain the use case here a bit more?

kubeadm generates a kubeconfig for kube-scheduler as /etc/kubernetes/scheduler.conf. This includes the CA authority data for the API server and kube-scheduler's client certificate and private key material.

What was the scenario in which you need a separately created TLS key-pair specifically for kube-scheduler? For kube-controller-manager, we mint an additional key pair as it acts as a CA for other services, but that is not the case for kube-scheduler.

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

@randomvariable

kubeadm generates a kubeconfig for kube-scheduler as /etc/kubernetes/scheduler.conf. This includes the CA authority data for the API server and kube-scheduler's client certificate and private key material.

Exactly.
But please note, that /etc/kubernetes/scheduler.conf is for client credentials, and --tls-cert-file and --tls-private-key-file args are for server credentials.

What was the scenario in which you need a separately created TLS key-pair specifically for kube-scheduler? For kube-controller-manager, we mint an additional key pair as it acts as a CA for other services, but that is not the case for kube-scheduler.

We use an external CA from which we're issuing very short-living PKI credentials.
We store all PKI credentials in external files, since it's much easier to update them in such setup.

At the moment, all control plane components besides kube-scheduler already have that k8s certificates directory mounted, it's only kube-scheduler which doesn't do that. So, to use credentials from external files we better mount the certificates directory the same way, as the rest of control plane components do that.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

But please note, that /etc/kubernetes/scheduler.conf is for client credentials, and --tls-cert-file and --tls-private-key-file args are for server credentials.

could you clarify the use case for serving the scheduler?

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

@neolit123

I'm not sure that I understand you.
kube-scheduler does listen to some address/port by default, plese see
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/

--bind-address ip     Default: 0.0.0.0
--
 | The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces).
...
--secure-port int     Default: 10259
--
  | The port on which to serve HTTPS with authentication and authorization.If 0, don't serve HTTPS at all.

, and the rest of document.

Also,

$ sudo lsof -Pan -iTCP -sTCP:LISTEN -p $(pgrep scheduler)
COMMAND     PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
kube-sche 17275 root    3u  IPv6 254051105      0t0  TCP *:10251 (LISTEN)
kube-sche 17275 root    5u  IPv4 254051111      0t0  TCP 127.0.0.1:10259 (LISTEN)

Beside that, we need that directory mounted for scheduler to access the clients credentials as well.

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

I probably need to add, that we don't use kubeconfigs generation phase from kubeadm, but instead of that generate kubeconfigs for all control plane components by other way, so all of them use external credential files in the end.
There are no embedded credentials in our kubeconfig files.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

I probably need to add, that we don't use kubeconfigs generation phase from kubeadm, but instead of that generate kubeconfigs for all control plane components by other way, so all of them use external credential files in the end.

yes, that is an important detail.

We use an external CA from which we're issuing very short-living PKI credentials.

so basically you need to rotate the cert and key for the scheduler faster and that is why you need to mount the directory in question? but right now the scheduler should be self-signing it's TLS and that's the use case i don't understand - why feed user signed certs to it?

are you also signing your kubelet serving certs?

Add --tls-cert-file=/etc/kubernetes/pki/scheduler.crt and --tls-private-key-file=/etc/kubernetes/pki/scheduler.key args to a kube-scheduler pod.

your already have to patch the pod manifest to add the extra flags, so why not include the extra volumes there as well? WRT to the PR, what are the benefits of mounting the volumes for users that don't have the same use case?

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

There's nothing being served by kube-scheduler other than the healthcheck and metrics, which is why we never set up a serving certificate. On the other hand, controller-manager is a certificate authority, so we mint certificates for it.

With regards to #80064, I have two concerns:

  • It adds the mount to /etc/kubernetes/pki, but there's no certificate generation
  • /etc/kubernetes/pki has the API server and etcd certificates in there, and they shouldn't be unnecessarily mounted into kube-scheduler (we may do that today for the other components, but we need to stop doing that).

You can leverage the extraArgs parameter in a kubeadm configuration today to achieve what you want without making this change for everyone, by using a configuration like this:

---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
scheduler:
  extraArgs:
  - tls-cert-file: /etc/kubernetes/pki/kube-scheduler.crt
  - tls-private-key-file: /etc/kubernetes/pki/kube-scheduler.key
  extraVolumes:
  - name: "tls-cert"
    hostPath: "/etc/kubernetes/pki/kube-scheduler.crt"
    mountPath: "/etc/kubernetes/pki/kube-scheduler.crt"
    readOnly: true
    pathType: FileOrCreate
  - name: "tls-private-key"
    hostPath: "/etc/kubernetes/pki/kube-scheduler.key"
    mountPath: "/etc/kubernetes/pki/kube-scheduler.key"
    readOnly: true
    pathType: FileOrCreate
@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

+1 on extraVolumes:
the cluster configuration already supports this.
otherwise, if we enable the volumes by default it makes sense to add the flags too.

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

so basically you need to rotate the cert and key for the scheduler faster and that is why you need to mount the directory in question?

Not only to do that faster, but also to avoid regenerating of the kubeconfig.

but right now the scheduler should be self-signing it's TLS and that's the use case i don't understand - why feed user signed certs to it?

Could you please describe of how the scheduler could self-sign anything in the case of the external CA and how other components could deal with those self-signed credentials?

your already have to patch the pod manifest to add the extra flags, so why not include the extra volumes there as well? WRT to the PR, what are the benefits of mounting the volumes for users that don't have the same use case?

The rest of control plane components already have that directory mounted.
Only scheduler don't because, as I suspect, of a bug in the original code implementation.
This is very important for this issue.

The benefit for users is unification of all control plane components configurations.

Could you please try to apply your reasons to the rest of control plane components to see that by reasoning this way we probably should remove mounting of that directory from the rest of control plane components? :)

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

@randomvariable thanks for the example of how that could be done by using extraVolumes configuration. I definitely could workaround the issue that way.
The question is why do we need to make the scheduler a special case where we're mounting that certs directory to all other control plane components, but cannot do the same for the scheduler.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

Only scheduler don't because, as I suspect, of a bug in the original code implementation.

No, this is by design:
API server needs tls-cert-file and tls-private-key-file because it's the API everything talks to.

We currently mount /etc/kubernetes/pki in kube-controller-manager because one of its components is acting as a certificate authority. It therefore has its own CA certificates for doing that. These are not serving certificates, but used as part of:

- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt

kube-controller-manager still uses a generated kubeconfig, /etc/kubernetes/controller-manager.conf with CA data embedded.

So scheduler is on its own - it only talks to the API server, doesn't serve anything other than metrics and healthcheck, and is not a certificate authority, so we never gave it its own serving certificate.

We provide extraVolumes as that escape hatch for customisation. If we mounted /etc/kubernetes/pki, and say for example, kube-scheduler was compromised - the attacker would then get access to the certificates for etcd and API server and the cluster would compromised. It's actually more secure not to give it that directory, and we believe the risk of an attacker getting hold of metrics or the healthcheck data to be irrelevant.

@vkhromov

This comment has been minimized.

Copy link
Author

commented Jul 12, 2019

@randomvariable , just wondering, are there any plans to narrow access of control plane components to files in /etc/kubernetes/pki?
E.g. at the moment world-listening apiserver have access to ca.key file while it doesn't need it, which is much more dangerous than localhost-listening scheduler.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

that is true.

we are currently doing an audit of the control-plane manifests mounts, also hopefully not running them as root eventually:
kubernetes/kubeadm#1367

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

just wondering, are there any plans to narrow access of control plane components to files in /etc/kubernetes/pki?

In addition to kubernetes/kubeadm#1367 there's some interest in this in relation to SELinux, as per kubernetes/kubeadm#279 and applying independent multi-level security categories to each file.

@randomvariable

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

@vkhromov is this ticket ok to close given there's an escape hatch for you to use with your certificate generation processes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.