Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Securing the cluster created by Juju #47835

Merged
merged 2 commits into from
Jun 29, 2017

Conversation

ktsakalozos
Copy link
Contributor

What this PR does / why we need it: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

Securing Juju kubernetes dashboard

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 21, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @ktsakalozos. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 21, 2017
@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 21, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2017
@@ -21,7 +21,8 @@ and then relate the `kubernetes-e2e` charm.
```shell
juju deploy kubernetes-core
juju deploy cs:~containers/kubernetes-e2e
juju add-relation kubernetes-e2e kubernetes-master
juju add-relation kubernetes-my-worker:kube-api-endpoint kubernetes-master:kube-api-endpoint
juju add-relation kubernetes-my-worker:kube-control kubernetes-master:kube-control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes-my-worker -> kubernetes-worker? Pretty sure the worker in this example is coming from juju deploy kubernetes-core, hence it would get the default name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, backing up -- this should be relating e2e to master, not worker to master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, my mistake. Thank you

@@ -44,9 +44,11 @@ def messaging():

missing_services = []
if not is_state('kubernetes-master.available'):
missing_services.append('kubernetes-master')
missing_services.append('kubernetes-master(http)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HTTP here is the interface name, not the relation name. That's unclear. Perhaps scope it properly in juju terms?

kubernetes-master:http

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if not is_state('certificates.available'):
missing_services.append('certificates')
if not is_state('kubeconfig.ready'):
missing_services.append('kubernetes-master(kube-control)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, change the output to kubernetes-master:kube-control

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

set_state('authentication.setup')


@when_not('leadership.is_leader')
@when_not('authentication.setup')
def setup_non_leader_authentication():
Copy link
Contributor

@Cynerva Cynerva Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could removing the @when_not('authentication.setup') decorator here result in a 5 minute delay?:

  1. setup_non_leader_authentication handler runs
  2. password_changed handler runs, removes authentication.setup state
  3. reactive does not dispatch setup_non_leader_authentication handler again until the next hook

I'm having trouble reasoning through the specific outcome of that, but we've had similar problems with delays caused by using is_state in combination with state-based dispatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch! The password update flow should should go like this:

  • User sets new passowrd
  • Leader grabs the password and propagates to non-leaders
  • Leader restarts
  • Non-leaders get the new password and restart.

You will notice I added the 'leadership.is_leader' on the password_changed() so that only the leader acts on password change.

To your question, I think the change of leadership data will trigger the reactive framework on the non-leaders in a timely fashion so there will not be the 5 minute wait.

controller_opts.add('service-account-private-key-file', service_key)

remove_state('kubernetes-master.components.started')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should setup_leader_authentication remove this state too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that.

On the leader we have two triggers that cause a reconfiguration of the authentication 1) charm upgrade 2) password change. In both triggers we do

    remove_state('authentication.setup')
    remove_state('kubernetes-master.components.started')

On the non-leaders the reconfig of the authorisation is triggered after detecting a change on the leader. This is why I had the remove state there.

To make it homogeneous I added the remove state on the setup_leader_authentication and removed it from the upgrade and change password triggers.

if not db.get(save_salt):
db.set(save_salt, token)
else:
return db.get(save_salt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're storing passwords in unitdata, what happens if we lose the leader unit? Will a password/token change when we wouldn't want it to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, this isn't getting syndicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit should address this. I got rid of the local db and everything is read from the token files after they have been synced with the leader

'cni.available', 'kubernetes-worker.restart-needed')
def start_worker(kube_api, kube_control, cni):
def start_worker(kube_api, kube_control, auth_control, cni):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprises me -- are we getting two copies of the kube-control interface here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.

kubelet_opts.add('v', '0')
kubelet_opts.add('address', '0.0.0.0')
kubelet_opts.add('port', '10250')
kubelet_opts.add('cluster-dns', dns['sdn-ip'])
kubelet_opts.add('cluster-domain', dns['domain'])
kubelet_opts.add('anonymous-auth', 'false')
kubelet_opts.add('anonymous-auth=false', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here? If kubelet_opts.add('anonymous-auth', 'false') didn't work then we should probably fix that in FlagManager rather than work around it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood how the snap options work. anonymous-auth=false will be ignored because it is not part of the config params. However, if I set the anonymous-auth to false it will not make it to the arguments list and it will take the default value of true. Opened this issue:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/314

I am going to remove the anonymous-auth option for now so as not to cause any confusion.

Copy link
Contributor

@lazypower lazypower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission konstantinos. Overall I think this is a great first-draft submission of the fix but there are some serious questions posed here relating to state distribution and upgrade paths for existing users who have certificate auth enabled (all of our existing deployments to date).

We'll need to sort those satisfactorily before I'm comfortable acking this change.

The majority of what's here is excellent though and I look forward to giving this a second round review once the requested modifications have been made.

if not db.get(save_salt):
db.set(save_salt, token)
else:
return db.get(save_salt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.

# Set permissions on the ubuntu users kubeconfig to ensure a consistent UX
cmd = ['chown', 'ubuntu:ubuntu', kubeconfig_path]
check_call(cmd)

messaging()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont do this. You're making an explicit invocation of a method that is controlled by a decorator.

What I suggest is to move the messaging method into a non-decorated space in the source file, and write a new method declaration for invoking the messaging() method during the @when('kubernetes.e2e.installed') state is present, and you can then do this messaging() invocation inline like you have it (for guarantee of it being invoked when you expect it to be).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -781,7 +830,6 @@ def configure_master_services():
api_opts.add('service-cluster-ip-range', service_cidr())
api_opts.add('min-request-timeout', '300')
api_opts.add('v', '4')
api_opts.add('client-ca-file', ca_cert_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to remove this, but I don't think this doesn't do anything for existing deployments. Does this need to be explicitly checked and removed from the file? (i didn't notice that anywhere else in the code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove the certificate from the kubeconfig file by un-setting the users section. Is this what you would expect? https://github.com/kubernetes/kubernetes/pull/47835/files#diff-25f2686437984f527090205ba417c242R158 . Do you have something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concern is that the client-ca-file entry might not be cleared from api_opts when upgrading an older deployment.

Thankfully, all flags get cleared during an upgrade-charm, so we're good: https://github.com/juju-solutions/kubernetes/blob/55525773cec50fa7510a9801c5c763e6ec6ad2ac/cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py#L139-L141

Don't know why that's in migrate_from_pre_snaps, it really doesn't belong there. But hey, at least it's happening!

'cni.available', 'kubernetes-worker.restart-needed')
def start_worker(kube_api, kube_control, cni):
def start_worker(kube_api, kube_control, auth_control, cni):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.

@@ -463,13 +465,12 @@ def configure_worker_services(api_servers, dns, cluster_cidr):
kubelet_opts.add('require-kubeconfig', 'true')
kubelet_opts.add('kubeconfig', kubeconfig_path)
kubelet_opts.add('network-plugin', 'cni')
kubelet_opts.add('logtostderr', 'true')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was removed. was this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is already true:

--logtostderr log to standard error instead of files (default true)

Adding it back.

@Cynerva
Copy link
Contributor

Cynerva commented Jun 23, 2017

@k8s-bot ok to test

I'm +1 to this, not seeing any problems after the recent changes. @chuckbutler can you give it another look too?

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 23, 2017
@lazypower
Copy link
Contributor

I've been following along and i'm +1 to this.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2017
@marcoceppi
Copy link
Contributor

/approve no-issue

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 23, 2017
@lazypower
Copy link
Contributor

Ah I see there's still some dust settling here with respect to relationship scoping. I'm going to wait until you've signalled this PR is ready for re-review @ktsakalozos

@ktsakalozos
Copy link
Contributor Author

@marcoceppi @chuckbutler I had to update this PR to align it with the changes done on the kube-control interface. Retested again and it looks strong. Thank you.

@ktsakalozos
Copy link
Contributor Author

Since we got all comments addressed I am squashing the commits

@Cynerva
Copy link
Contributor

Cynerva commented Jun 29, 2017

/assign @Cynerva

@k8s-ci-robot
Copy link
Contributor

@Cynerva: GitHub didn't allow me to assign the following users: cynerva.

Note that only kubernetes members can be assigned.

In response to this:

/assign @Cynerva

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Cynerva
Copy link
Contributor

Cynerva commented Jun 29, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cynerva, chuckbutler, ktsakalozos, marcoceppi

Associated issue requirement bypassed by: marcoceppi

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@Cynerva
Copy link
Contributor

Cynerva commented Jun 29, 2017

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)

@k8s-github-robot k8s-github-robot merged commit d19773d into kubernetes:master Jun 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants