Securing the cluster created by Juju #47835

ktsakalozos · 2017-06-21T10:01:50Z

What this PR does / why we need it: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

Securing Juju kubernetes dashboard

k8s-ci-robot · 2017-06-21T10:01:57Z

Hi @ktsakalozos. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Cynerva · 2017-06-21T15:45:50Z

cluster/juju/layers/kubernetes-e2e/README.md

@@ -21,7 +21,8 @@ and then relate the `kubernetes-e2e` charm.
 ```shell
 juju deploy kubernetes-core
 juju deploy cs:~containers/kubernetes-e2e
-juju add-relation kubernetes-e2e kubernetes-master
+juju add-relation kubernetes-my-worker:kube-api-endpoint kubernetes-master:kube-api-endpoint
+juju add-relation kubernetes-my-worker:kube-control kubernetes-master:kube-control


kubernetes-my-worker -> kubernetes-worker? Pretty sure the worker in this example is coming from juju deploy kubernetes-core, hence it would get the default name.

Err, backing up -- this should be relating e2e to master, not worker to master

Yeap, my mistake. Thank you

lazypower · 2017-06-21T14:49:09Z

cluster/juju/layers/kubernetes-e2e/reactive/kubernetes_e2e.py

@@ -44,9 +44,11 @@ def messaging():

    missing_services = []
    if not is_state('kubernetes-master.available'):
-        missing_services.append('kubernetes-master')
+        missing_services.append('kubernetes-master(http)')


I think HTTP here is the interface name, not the relation name. That's unclear. Perhaps scope it properly in juju terms?

kubernetes-master:http

lazypower · 2017-06-21T14:49:25Z

cluster/juju/layers/kubernetes-e2e/reactive/kubernetes_e2e.py

    if not is_state('certificates.available'):
        missing_services.append('certificates')
+    if not is_state('kubeconfig.ready'):
+        missing_services.append('kubernetes-master(kube-control)')


Same here, change the output to kubernetes-master:kube-control

Cynerva · 2017-06-21T16:54:35Z

cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py

    set_state('authentication.setup')


 @when_not('leadership.is_leader')
-@when_not('authentication.setup')
 def setup_non_leader_authentication():


Could removing the @when_not('authentication.setup') decorator here result in a 5 minute delay?:

setup_non_leader_authentication handler runs

password_changed handler runs, removes authentication.setup state

reactive does not dispatch setup_non_leader_authentication handler again until the next hook

I'm having trouble reasoning through the specific outcome of that, but we've had similar problems with delays caused by using is_state in combination with state-based dispatch.

That's a good catch! The password update flow should should go like this:

User sets new passowrd

Leader grabs the password and propagates to non-leaders

Leader restarts

Non-leaders get the new password and restart.

You will notice I added the 'leadership.is_leader' on the password_changed() so that only the leader acts on password change.

To your question, I think the change of leadership data will trigger the reactive framework on the non-leaders in a timely fashion so there will not be the 5 minute wait.

Cynerva · 2017-06-21T18:10:35Z

cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py

    controller_opts.add('service-account-private-key-file', service_key)

+    remove_state('kubernetes-master.components.started')


Should setup_leader_authentication remove this state too?

Yes we can do that.

On the leader we have two triggers that cause a reconfiguration of the authentication 1) charm upgrade 2) password change. In both triggers we do

remove_state('authentication.setup') remove_state('kubernetes-master.components.started')

On the non-leaders the reconfig of the authorisation is triggered after detecting a change on the leader. This is why I had the remove state there.

To make it homogeneous I added the remove state on the setup_leader_authentication and removed it from the upgrade and change password triggers.

Cynerva · 2017-06-21T18:25:43Z

cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py

+        if not db.get(save_salt):
+            db.set(save_salt, token)
+        else:
+            return db.get(save_salt)


Given that we're storing passwords in unitdata, what happens if we lose the leader unit? Will a password/token change when we wouldn't want it to?

It would, this isn't getting syndicated.

I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.

The last commit should address this. I got rid of the local db and everything is read from the token files after they have been synced with the leader

Cynerva · 2017-06-21T18:30:15Z

cluster/juju/layers/kubernetes-worker/reactive/kubernetes_worker.py

      'cni.available', 'kubernetes-worker.restart-needed')
-def start_worker(kube_api, kube_control, cni):
+def start_worker(kube_api, kube_control, auth_control, cni):


This surprises me -- are we getting two copies of the kube-control interface here?

We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.

Cynerva · 2017-06-21T18:35:16Z

cluster/juju/layers/kubernetes-worker/reactive/kubernetes_worker.py

    kubelet_opts.add('v', '0')
    kubelet_opts.add('address', '0.0.0.0')
    kubelet_opts.add('port', '10250')
    kubelet_opts.add('cluster-dns', dns['sdn-ip'])
    kubelet_opts.add('cluster-domain', dns['domain'])
-    kubelet_opts.add('anonymous-auth', 'false')
+    kubelet_opts.add('anonymous-auth=false', None)


What happened here? If kubelet_opts.add('anonymous-auth', 'false') didn't work then we should probably fix that in FlagManager rather than work around it here.

I misunderstood how the snap options work. anonymous-auth=false will be ignored because it is not part of the config params. However, if I set the anonymous-auth to false it will not make it to the arguments list and it will take the default value of true. Opened this issue:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/314

I am going to remove the anonymous-auth option for now so as not to cause any confusion.

lazypower

Thanks for the submission konstantinos. Overall I think this is a great first-draft submission of the fix but there are some serious questions posed here relating to state distribution and upgrade paths for existing users who have certificate auth enabled (all of our existing deployments to date).

We'll need to sort those satisfactorily before I'm comfortable acking this change.

The majority of what's here is excellent though and I look forward to giving this a second round review once the requested modifications have been made.

lazypower · 2017-06-21T20:28:41Z

cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py

+        if not db.get(save_salt):
+            db.set(save_salt, token)
+        else:
+            return db.get(save_salt)


I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.

lazypower · 2017-06-21T20:32:08Z

cluster/juju/layers/kubernetes-e2e/reactive/kubernetes_e2e.py

    # Set permissions on the ubuntu users kubeconfig to ensure a consistent UX
    cmd = ['chown', 'ubuntu:ubuntu', kubeconfig_path]
    check_call(cmd)
-
+    messaging()


Dont do this. You're making an explicit invocation of a method that is controlled by a decorator.

What I suggest is to move the messaging method into a non-decorated space in the source file, and write a new method declaration for invoking the messaging() method during the @when('kubernetes.e2e.installed') state is present, and you can then do this messaging() invocation inline like you have it (for guarantee of it being invoked when you expect it to be).

lazypower · 2017-06-21T20:54:42Z

cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py

@@ -781,7 +830,6 @@ def configure_master_services():
    api_opts.add('service-cluster-ip-range', service_cidr())
    api_opts.add('min-request-timeout', '300')
    api_opts.add('v', '4')
-    api_opts.add('client-ca-file', ca_cert_path)


I agree that we need to remove this, but I don't think this doesn't do anything for existing deployments. Does this need to be explicitly checked and removed from the file? (i didn't notice that anywhere else in the code)

We remove the certificate from the kubeconfig file by un-setting the users section. Is this what you would expect? https://github.com/kubernetes/kubernetes/pull/47835/files#diff-25f2686437984f527090205ba417c242R158 . Do you have something else in mind?

I think the concern is that the client-ca-file entry might not be cleared from api_opts when upgrading an older deployment.

Thankfully, all flags get cleared during an upgrade-charm, so we're good: https://github.com/juju-solutions/kubernetes/blob/55525773cec50fa7510a9801c5c763e6ec6ad2ac/cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py#L139-L141

Don't know why that's in migrate_from_pre_snaps, it really doesn't belong there. But hey, at least it's happening!

lazypower · 2017-06-21T20:58:26Z

cluster/juju/layers/kubernetes-worker/reactive/kubernetes_worker.py

      'cni.available', 'kubernetes-worker.restart-needed')
-def start_worker(kube_api, kube_control, cni):
+def start_worker(kube_api, kube_control, auth_control, cni):


We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.

lazypower · 2017-06-21T21:03:16Z

cluster/juju/layers/kubernetes-worker/reactive/kubernetes_worker.py

@@ -463,13 +465,12 @@ def configure_worker_services(api_servers, dns, cluster_cidr):
    kubelet_opts.add('require-kubeconfig', 'true')
    kubelet_opts.add('kubeconfig', kubeconfig_path)
    kubelet_opts.add('network-plugin', 'cni')
-    kubelet_opts.add('logtostderr', 'true')


I'm not sure why this was removed. was this intentional?

The default value is already true:

--logtostderr log to standard error instead of files (default true)

Adding it back.

Cynerva · 2017-06-23T16:42:48Z

@k8s-bot ok to test

I'm +1 to this, not seeing any problems after the recent changes. @chuckbutler can you give it another look too?

lazypower · 2017-06-23T16:54:20Z

I've been following along and i'm +1 to this.

/lgtm

marcoceppi · 2017-06-23T17:43:31Z

/approve no-issue

lazypower · 2017-06-23T19:38:39Z

Ah I see there's still some dust settling here with respect to relationship scoping. I'm going to wait until you've signalled this PR is ready for re-review @ktsakalozos

ktsakalozos · 2017-06-26T08:17:41Z

@marcoceppi @chuckbutler I had to update this PR to align it with the changes done on the kube-control interface. Retested again and it looks strong. Thank you.

ktsakalozos · 2017-06-27T09:11:56Z

Since we got all comments addressed I am squashing the commits

Cynerva · 2017-06-29T18:06:49Z

/assign @Cynerva

k8s-ci-robot · 2017-06-29T18:06:50Z

@Cynerva: GitHub didn't allow me to assign the following users: cynerva.

Note that only kubernetes members can be assigned.

In response to this:

/assign @Cynerva

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Cynerva · 2017-06-29T18:06:58Z

/lgtm

k8s-github-robot · 2017-06-29T18:07:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cynerva, chuckbutler, ktsakalozos, marcoceppi

Associated issue requirement bypassed by: marcoceppi

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cluster/juju/OWNERS~~ [Cynerva,chuckbutler,ktsakalozos,marcoceppi]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

Cynerva · 2017-06-29T19:51:01Z

/retest

k8s-github-robot · 2017-06-29T22:16:38Z

Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 21, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 21, 2017

k8s-github-robot assigned lazypower and marcoceppi Jun 21, 2017

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 21, 2017

ktsakalozos force-pushed the feature/security branch from 24f4b2c to edf1478 Compare June 21, 2017 10:24

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2017

Cynerva reviewed Jun 21, 2017

View reviewed changes

lazypower reviewed Jun 21, 2017

View reviewed changes

Cynerva reviewed Jun 21, 2017

View reviewed changes

lazypower suggested changes Jun 21, 2017

View reviewed changes

ktsakalozos mentioned this pull request Jun 23, 2017

Add tests for securing the deployment charmed-kubernetes/bundle#316

Merged

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 23, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2017

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 23, 2017

Improve security of Juju deployed clusters

0b01cd7

ktsakalozos force-pushed the feature/security branch from 866c292 to 0b01cd7 Compare June 27, 2017 09:20

Disable anonymous-auth

0525b84

k8s-ci-robot assigned Cynerva Jun 29, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2017

k8s-github-robot merged commit d19773d into kubernetes:master Jun 29, 2017

		controller_opts.add('service-account-private-key-file', service_key)

		remove_state('kubernetes-master.components.started')

Securing the cluster created by Juju #47835

Securing the cluster created by Juju #47835

Conversation

ktsakalozos commented Jun 21, 2017

k8s-ci-robot commented Jun 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cynerva Jun 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktsakalozos Jun 22, 2017 • edited

Choose a reason for hiding this comment

lazypower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cynerva commented Jun 23, 2017

lazypower commented Jun 23, 2017

marcoceppi commented Jun 23, 2017

lazypower commented Jun 23, 2017

ktsakalozos commented Jun 26, 2017

ktsakalozos commented Jun 27, 2017

Cynerva commented Jun 29, 2017

k8s-ci-robot commented Jun 29, 2017

Cynerva commented Jun 29, 2017

k8s-github-robot commented Jun 29, 2017

Cynerva commented Jun 29, 2017

k8s-github-robot commented Jun 29, 2017

Cynerva Jun 21, 2017 •

edited

ktsakalozos Jun 22, 2017 •

edited