Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd TLS Options #3114

Merged
merged 3 commits into from
Aug 6, 2017
Merged

Etcd TLS Options #3114

merged 3 commits into from
Aug 6, 2017

Conversation

gambol99
Copy link
Contributor

@gambol99 gambol99 commented Aug 2, 2017

The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster

  • cleaned up and fixed any formatting issues on the journey
  • added two new certificates (server/client) for etcd peers and a client certificate for kubeapi and others perhaps (perhaps calico?)
  • disabled the protokube service for nodes completely is not required; note this was first raised in Disable Protokube Options - Nodes #3091, but figured it would be easier to place in here given the relation
  • updated protokube codebase to reflect the changes, removing the master option as its no longer required
  • added additional integretion tests for the protokube manifests;
  • note, still need to add documentation, but opening the PR to get feedback
  • one outstanding issue is the migration from http -> https for preexisting clusters, i'm gonna hit the coreos board to ask for the best options

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @gambol99. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2017
@gambol99 gambol99 force-pushed the etcd_tls branch 2 times, most recently from ff5ac91 to ad4846e Compare August 3, 2017 21:12
@gambol99 gambol99 force-pushed the etcd_tls branch 2 times, most recently from c04e5b3 to 8b89b74 Compare August 4, 2017 19:49
// Build is responsible for generating the options for protokube
func (t *ProtokubeBuilder) Build(c *fi.ModelBuilderContext) error {
// @check if protokube; we have decided to disable this by default (https://github.com/kubernetes/kops/pull/3091)
if !t.IsMaster {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want protokube if we're running with gossip DNS i.e. the same change we had to make there && !dns.IsGossipHostname(b.Cluster.Spec.MasterInternalName). I suspect this means you can't reflow the function as you've done here, but I could well be wrong...

// retrieve the etcd peer certificates and private keys from the keystore
if t.Cluster.Spec.EnableEtcdTLS {
for _, x := range []string{"etcd", "etcd-client"} {
if err = t.buildCeritificateTask(c, x, fmt.Sprintf("%s.pem", x)); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo

type ProtokubeFlags struct {
Master *bool `json:"master,omitempty" flag:"master"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master flag is important, otherwise we risk mounting volumes on the nodes (although I guess this is because of gossip DNS). I do want to split up protokube more, i.e. gossip DNS really shouldn't be linked to protokube, but one change at a time I think...

func (t *ProtokubeBuilder) ProtokubeFlags(k8sVersion semver.Version) *ProtokubeFlags {
f := &ProtokubeFlags{}

master := t.IsMaster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to put back the isMaster logic here.

@@ -110,6 +110,8 @@ type ClusterSpec struct {
UpdatePolicy *string `json:"updatePolicy,omitempty"`
// Additional policies to add for roles
AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"`
// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients
EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is it possible to put this into the EtcdClusterSpec? I think it's still fine to share keys, but it logically makes more sense (IMO) in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So originally it placed in the EtcdClusterSpec; but issue was the settings effect both cluster regardless since the kube-apiserver would have to setup to use a client cert and TLS on etcd endpoints and the etcd override used by #events/ reuses the same client cert anyhow. Hence why I threw it on the outside, as there is no means for you to set them independently; but clusters are gated by the same option.

If it was placed inside the spec, should it just throw a validation error so one is set but the other is not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.

The whole split of events is sort of vestigial anyway.

My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.

I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.

When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.

But this is my opinion - if you disagree we can do it the other way!

// ensure we have the correct probe schema
if c.isTLS() {
container.LivenessProbe.TCPSocket = &v1.TCPSocketAction{
Host: "127.0.0.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I checked on how the GCE configuration does it, and AFAICT they don't use TLS for clients (which are always over localhost), just for peer-to-peer, and then they health-check on the client-port. Your approach is better IMO.

},
})
// @check if tls is enabled and mount the directory - it might be worth concidering
// if we you use our own directory in /srv i.e /srv/etcd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense in my mind, in that we want to think about isolating all these keys from each other. I think also using something under /etc/kubernetes would work well.

But I also believe not too hard to change later.

return fmt.Errorf("error getting state of file %q: %v", p, err)
}

f, err := os.Create(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: fix the race here :-) os.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_EXCL, 0644).

@@ -86,52 +84,37 @@ func (k *KubeBoot) RunSyncLoop() {
}

func (k *KubeBoot) syncOnce() error {
if k.Master {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do only want to run this on the master (i.e. not if gossip dns)

if k.Master && k.ApplyTaints {
if err := ApplyMasterTaints(k.Kubernetes); err != nil {
// apply the kubernetes taints?
if k.ApplyTaints {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And definitely only want to taint the master :-)

@justinsb
Copy link
Member

justinsb commented Aug 6, 2017

Looks really good - thanks.

The only blockers are that we need protokube in the gossip-dns case, and so we can't strip out the master logic out of protokube. And then I would prefer to have the TLS option on the EtcdClusterSpec rather than at the top-level - it feels more logical and in theory we can then carve off the etcd config if we have to in future.

Other than that I'm inclined to say we merge and iterate - it is effectively gated behind the field.

I know there was a concern in that if people changed the option for an existing cluster we would have to change the peer urls, but I put together #3147 (and #3148) which will let us block this "real soon now".

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 6, 2017
@justinsb justinsb self-assigned this Aug 6, 2017
@gambol99
Copy link
Contributor Author

gambol99 commented Aug 6, 2017

hi @justinsb ... thank you for the quick response!!! :-) .. I did that changes requested (see commit below), but just wanted to get feedback on the EtcdClusterSpec; comments left above. Also, should the e2e now support creating a TLS enabled cluster; admittedly I wouldn't have a clue how to add that though :-)

The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster

- cleaned up and fixed any formatting issues on the journey
- added two new certificates (server/client) for etcd peers and a client certificate for kubeapi and others perhaps (perhaps calico?)
- disabled the protokube service for nodes completely is not required; note this was first raised in kubernetes#3091, but figured it would be easier to place in here given the relation
- updated protokube codebase to reflect the changes, removing the master option as its no longer required
- added additional integretion tests for the protokube manifests;
- note, still need to add documentation, but opening the PR to get feedback
- one outstanding issue is the migration from http -> https for preexisting clusters, i'm gonna hit the coreos board to ask for the best options
@justinsb
Copy link
Member

justinsb commented Aug 6, 2017

Looks great: we just need to agree on whether to put TLS into the EtcdClusterSpec (my inclination is yes but I'm open to other views), and then I think we might still be mounting the volumes on the master, although I could well have missed it! But then we can merge :-)

Re e2e testing, it's actually been very complicated and I'm working on making it simpler... kubernetes/test-infra#3883 will eventually make it simpler. We would need a way to specify tls=true for the command line; I saw a cool proposal on slack to expose kops set cluster.spec.etcd.tls=true or a similar "generic" extension mechanism for things that don't have flags. But let's not block this PR on getting it tested (it is important, but this PR is fairly large and I also don't want it to get stuck in rebase cycles).

@@ -110,6 +110,8 @@ type ClusterSpec struct {
UpdatePolicy *string `json:"updatePolicy,omitempty"`
// Additional policies to add for roles
AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"`
// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients
EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.

The whole split of events is sort of vestigial anyway.

My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.

I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.

When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.

But this is my opinion - if you disagree we can do it the other way!

@@ -312,23 +282,23 @@ func run() error {
}

k := &protokube.KubeBoot{
Master: master,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would certainly be happy to see gossip dns split out (or protokube split out into an etcd piece, an addons piece and a gossip DNS piece), but my guess is that it isn't the most important thing on your plate :-) Protokube etcd also sets some gossip DNS hostnames, so it's not totally trivial, though I don't think it's too bad...

return err
}
// attempt to mount the volumes
volumes, err := k.volumeMounter.mountMasterVolumes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing it, but have we stopped calling this on non-masters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- added the master option back the protokube, updating the nodeup model and protokube code
- removed any comments no related to the PR as suggested
- reverted the ordering of the mutex in the AWSVolumes in protokube
@gambol99
Copy link
Contributor Author

gambol99 commented Aug 6, 2017

hi @justinsb ... so i went with your recommendation, you can check this commit and reverted to using the EtcdCluster spec rather than Cluster spec. I did another quick test in aws to ensure it's cool.

core@ip-10-250-34-40 /etc/kubernetes/manifests $ docker exec -ti c9f8d4180d13 sh
/ # etcdctl -C https://127.0.0.1:4001 --cert-file /srv/kubernetes/etcd-client.pem --key-file /srv/kubernetes/etcd-client-key.pem --ca-file /srv/kubernetes/ca.crt cluster-health
member 30ce51b661909c54 is healthy: got healthy result from https://etcd-b-1.internal.testing.testing.domain.com:4001
member 8529efc60ffe70dd is healthy: got healthy result from https://etcd-a-1.internal.testing.testing.domain.com:4001
member 890f8dac75a542b1 is healthy: got healthy result from https://etcd-a-3.internal.testing.testing.domain.com:4001
member 8c1f371803810af2 is healthy: got healthy result from https://etcd-b-2.internal.testing.testing.domain.com:4001
member c2277fdad4c57074 is healthy: got healthy result from https://etcd-a-2.internal.testing.testing.domain.com:4001
cluster is healthy

And the logs now appear in the kube-logs for etcd

[jest@starfury kops]$ kubectl --namespace=kube-system logs etcd-server-ip-10-250-34-44.eu-west-2.compute.internal | tail -n 4
2017-08-06 19:23:11.581215 E | rafthttp: failed to dial 890f8dac75a542b1 on stream MsgApp v2 (dial tcp 10.250.34.22:2380: i/o timeout)
2017-08-06 19:23:11.706713 I | rafthttp: the connection with c2277fdad4c57074 became active
2017-08-06 19:23:11.706834 I | rafthttp: the connection with 890f8dac75a542b1 became active
2017-08-06 19:23:11.707051 I | rafthttp: the connection with 30ce51b661909c54 became active

And everything looks good :-)

[jest@starfury kops]$ kubectl get node
NAME                                          STATUS         AGE       VERSION
ip-10-250-34-16.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-26.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-40.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-44.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-6.eu-west-2.compute.internal     Ready,master   17m       v1.7.0
ip-10-250-35-162.eu-west-2.compute.internal   Ready,node     31m       v1.7.0
ip-10-250-36-176.eu-west-2.compute.internal   Ready,node     31m       v1.7.0

- changed the location of this variable to be in the etcd cluster spec rather the kops cluster spec
- reflected the changes against the models
@justinsb
Copy link
Member

justinsb commented Aug 6, 2017

/lgtm

Nice work, sir :-)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gambol99, justinsb

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 5d5945c into kubernetes:master Aug 6, 2017
@chrislovecnm
Copy link
Contributor

@gambol99 did we get docs??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants