Etcd TLS Options #3114

gambol99 · 2017-08-02T00:41:54Z

The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster

cleaned up and fixed any formatting issues on the journey
added two new certificates (server/client) for etcd peers and a client certificate for kubeapi and others perhaps (perhaps calico?)
disabled the protokube service for nodes completely is not required; note this was first raised in Disable Protokube Options - Nodes #3091, but figured it would be easier to place in here given the relation
updated protokube codebase to reflect the changes, removing the master option as its no longer required
added additional integretion tests for the protokube manifests;
note, still need to add documentation, but opening the PR to get feedback
one outstanding issue is the migration from http -> https for preexisting clusters, i'm gonna hit the coreos board to ask for the best options

k8s-ci-robot · 2017-08-02T00:42:02Z

Hi @gambol99. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

justinsb · 2017-08-06T05:39:38Z

nodeup/pkg/model/protokube.go

+// Build is responsible for generating the options for protokube
+func (t *ProtokubeBuilder) Build(c *fi.ModelBuilderContext) error {
+	// @check if protokube; we have decided to disable this by default (https://github.com/kubernetes/kops/pull/3091)
+	if !t.IsMaster {


I think we want protokube if we're running with gossip DNS i.e. the same change we had to make there && !dns.IsGossipHostname(b.Cluster.Spec.MasterInternalName). I suspect this means you can't reflow the function as you've done here, but I could well be wrong...

justinsb · 2017-08-06T05:39:57Z

nodeup/pkg/model/protokube.go

+	// retrieve the etcd peer certificates and private keys from the keystore
+	if t.Cluster.Spec.EnableEtcdTLS {
+		for _, x := range []string{"etcd", "etcd-client"} {
+			if err = t.buildCeritificateTask(c, x, fmt.Sprintf("%s.pem", x)); err != nil {


justinsb · 2017-08-06T05:42:48Z

nodeup/pkg/model/protokube.go

 type ProtokubeFlags struct {
-	Master        *bool  `json:"master,omitempty" flag:"master"`


The master flag is important, otherwise we risk mounting volumes on the nodes (although I guess this is because of gossip DNS). I do want to split up protokube more, i.e. gossip DNS really shouldn't be linked to protokube, but one change at a time I think...

justinsb · 2017-08-06T05:44:34Z

nodeup/pkg/model/protokube.go

 func (t *ProtokubeBuilder) ProtokubeFlags(k8sVersion semver.Version) *ProtokubeFlags {
-	f := &ProtokubeFlags{}
-
-	master := t.IsMaster


I think we'll have to put back the isMaster logic here.

justinsb · 2017-08-06T05:47:10Z

pkg/apis/kops/cluster.go

@@ -110,6 +110,8 @@ type ClusterSpec struct {
 	UpdatePolicy *string `json:"updatePolicy,omitempty"`
 	// Additional policies to add for roles
 	AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"`
+	// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients
+	EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"`


Q: Is it possible to put this into the EtcdClusterSpec? I think it's still fine to share keys, but it logically makes more sense (IMO) in there.

So originally it placed in the EtcdClusterSpec; but issue was the settings effect both cluster regardless since the kube-apiserver would have to setup to use a client cert and TLS on etcd endpoints and the etcd override used by #events/ reuses the same client cert anyhow. Hence why I threw it on the outside, as there is no means for you to set them independently; but clusters are gated by the same option.

If it was placed inside the spec, should it just throw a validation error so one is set but the other is not?

I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.

The whole split of events is sort of vestigial anyway.

My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.

I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.

When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.

But this is my opinion - if you disagree we can do it the other way!

justinsb · 2017-08-06T06:26:37Z

protokube/pkg/protokube/etcd_manifest.go

+		// ensure we have the correct probe schema
+		if c.isTLS() {
+			container.LivenessProbe.TCPSocket = &v1.TCPSocketAction{
+				Host: "127.0.0.1",


Makes sense. I checked on how the GCE configuration does it, and AFAICT they don't use TLS for clients (which are always over localhost), just for peer-to-peer, and then they health-check on the client-port. Your approach is better IMO.

justinsb · 2017-08-06T06:28:03Z

protokube/pkg/protokube/etcd_manifest.go

-			},
-		})
+		// @check if tls is enabled and mount the directory - it might be worth concidering
+		// if we you use our own directory in /srv i.e /srv/etcd


That would make sense in my mind, in that we want to think about isolating all these keys from each other. I think also using something under /etc/kubernetes would work well.

But I also believe not too hard to change later.

justinsb · 2017-08-06T06:34:12Z

protokube/pkg/protokube/helper.go

+		return fmt.Errorf("error getting state of file %q: %v", p, err)
+	}
+
+	f, err := os.Create(p)


Note to self: fix the race here :-) os.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_EXCL, 0644).

justinsb · 2017-08-06T06:35:17Z

protokube/pkg/protokube/kube_boot.go

@@ -86,52 +84,37 @@ func (k *KubeBoot) RunSyncLoop() {
 }

 func (k *KubeBoot) syncOnce() error {
-	if k.Master {


We do only want to run this on the master (i.e. not if gossip dns)

justinsb · 2017-08-06T06:36:00Z

protokube/pkg/protokube/kube_boot.go

-	if k.Master && k.ApplyTaints {
-		if err := ApplyMasterTaints(k.Kubernetes); err != nil {
+	// apply the kubernetes taints?
+	if k.ApplyTaints {


And definitely only want to taint the master :-)

justinsb · 2017-08-06T06:40:42Z

Looks really good - thanks.

The only blockers are that we need protokube in the gossip-dns case, and so we can't strip out the master logic out of protokube. And then I would prefer to have the TLS option on the EtcdClusterSpec rather than at the top-level - it feels more logical and in theory we can then carve off the etcd config if we have to in future.

Other than that I'm inclined to say we merge and iterate - it is effectively gated behind the field.

I know there was a concern in that if people changed the option for an existing cluster we would have to change the peer urls, but I put together #3147 (and #3148) which will let us block this "real soon now".

/ok-to-test

gambol99 · 2017-08-06T15:25:09Z

hi @justinsb ... thank you for the quick response!!! :-) .. I did that changes requested (see commit below), but just wanted to get feedback on the EtcdClusterSpec; comments left above. Also, should the e2e now support creating a TLS enabled cluster; admittedly I wouldn't have a clue how to add that though :-)

The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster - cleaned up and fixed any formatting issues on the journey - added two new certificates (server/client) for etcd peers and a client certificate for kubeapi and others perhaps (perhaps calico?) - disabled the protokube service for nodes completely is not required; note this was first raised in kubernetes#3091, but figured it would be easier to place in here given the relation - updated protokube codebase to reflect the changes, removing the master option as its no longer required - added additional integretion tests for the protokube manifests; - note, still need to add documentation, but opening the PR to get feedback - one outstanding issue is the migration from http -> https for preexisting clusters, i'm gonna hit the coreos board to ask for the best options

justinsb · 2017-08-06T16:59:48Z

Looks great: we just need to agree on whether to put TLS into the EtcdClusterSpec (my inclination is yes but I'm open to other views), and then I think we might still be mounting the volumes on the master, although I could well have missed it! But then we can merge :-)

Re e2e testing, it's actually been very complicated and I'm working on making it simpler... kubernetes/test-infra#3883 will eventually make it simpler. We would need a way to specify tls=true for the command line; I saw a cool proposal on slack to expose kops set cluster.spec.etcd.tls=true or a similar "generic" extension mechanism for things that don't have flags. But let's not block this PR on getting it tested (it is important, but this PR is fairly large and I also don't want it to get stuck in rebase cycles).

justinsb · 2017-08-06T16:45:31Z

pkg/apis/kops/cluster.go

@@ -110,6 +110,8 @@ type ClusterSpec struct {
 	UpdatePolicy *string `json:"updatePolicy,omitempty"`
 	// Additional policies to add for roles
 	AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"`
+	// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients
+	EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"`


I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.

The whole split of events is sort of vestigial anyway.

My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.

I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.

When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.

But this is my opinion - if you disagree we can do it the other way!

justinsb · 2017-08-06T16:47:14Z

protokube/cmd/protokube/main.go

@@ -312,23 +282,23 @@ func run() error {
 	}

 	k := &protokube.KubeBoot{
-		Master:            master,


Would certainly be happy to see gossip dns split out (or protokube split out into an etcd piece, an addons piece and a gossip DNS piece), but my guess is that it isn't the most important thing on your plate :-) Protokube etcd also sets some gossip DNS hostnames, so it's not totally trivial, though I don't think it's too bad...

justinsb · 2017-08-06T16:53:30Z

protokube/pkg/protokube/kube_boot.go

-			return err
-		}
+	// attempt to mount the volumes
+	volumes, err := k.volumeMounter.mountMasterVolumes()


I may be missing it, but have we stopped calling this on non-masters?

Good spot, I must have messed it up in the revert ... https://github.com/kubernetes/kops/compare/master...gambol99:etcd_tls?diff=split&name=etcd_tls#diff-eed2b06d1a16a4f2b372370d5795f8de

- added the master option back the protokube, updating the nodeup model and protokube code - removed any comments no related to the PR as suggested - reverted the ordering of the mutex in the AWSVolumes in protokube

gambol99 · 2017-08-06T19:43:41Z

hi @justinsb ... so i went with your recommendation, you can check this commit and reverted to using the EtcdCluster spec rather than Cluster spec. I did another quick test in aws to ensure it's cool.

core@ip-10-250-34-40 /etc/kubernetes/manifests $ docker exec -ti c9f8d4180d13 sh
/ # etcdctl -C https://127.0.0.1:4001 --cert-file /srv/kubernetes/etcd-client.pem --key-file /srv/kubernetes/etcd-client-key.pem --ca-file /srv/kubernetes/ca.crt cluster-health
member 30ce51b661909c54 is healthy: got healthy result from https://etcd-b-1.internal.testing.testing.domain.com:4001
member 8529efc60ffe70dd is healthy: got healthy result from https://etcd-a-1.internal.testing.testing.domain.com:4001
member 890f8dac75a542b1 is healthy: got healthy result from https://etcd-a-3.internal.testing.testing.domain.com:4001
member 8c1f371803810af2 is healthy: got healthy result from https://etcd-b-2.internal.testing.testing.domain.com:4001
member c2277fdad4c57074 is healthy: got healthy result from https://etcd-a-2.internal.testing.testing.domain.com:4001
cluster is healthy

And the logs now appear in the kube-logs for etcd

[jest@starfury kops]$ kubectl --namespace=kube-system logs etcd-server-ip-10-250-34-44.eu-west-2.compute.internal | tail -n 4
2017-08-06 19:23:11.581215 E | rafthttp: failed to dial 890f8dac75a542b1 on stream MsgApp v2 (dial tcp 10.250.34.22:2380: i/o timeout)
2017-08-06 19:23:11.706713 I | rafthttp: the connection with c2277fdad4c57074 became active
2017-08-06 19:23:11.706834 I | rafthttp: the connection with 890f8dac75a542b1 became active
2017-08-06 19:23:11.707051 I | rafthttp: the connection with 30ce51b661909c54 became active

And everything looks good :-)

[jest@starfury kops]$ kubectl get node
NAME                                          STATUS         AGE       VERSION
ip-10-250-34-16.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-26.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-40.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-44.eu-west-2.compute.internal    Ready,master   17m       v1.7.0
ip-10-250-34-6.eu-west-2.compute.internal     Ready,master   17m       v1.7.0
ip-10-250-35-162.eu-west-2.compute.internal   Ready,node     31m       v1.7.0
ip-10-250-36-176.eu-west-2.compute.internal   Ready,node     31m       v1.7.0

- changed the location of this variable to be in the etcd cluster spec rather the kops cluster spec - reflected the changes against the models

justinsb · 2017-08-06T20:28:54Z

/lgtm

Nice work, sir :-)

k8s-github-robot · 2017-08-06T20:29:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gambol99, justinsb

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~OWNERS~~ [justinsb]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-06T20:29:59Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-08-06T21:02:06Z

Automatic merge from submit-queue

chrislovecnm · 2017-09-15T19:50:33Z

@gambol99 did we get docs??

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2017

This was referenced Aug 2, 2017

etcdv2 migration http -> https etcd-io/etcd#8353

Closed

Disable Protokube Options - Nodes #3091

Closed

gambol99 force-pushed the etcd_tls branch 2 times, most recently from ff5ac91 to ad4846e Compare August 3, 2017 21:12

gambol99 mentioned this pull request Aug 3, 2017

Customize Kops Platform UKHomeOffice/kops#1

Closed

gambol99 force-pushed the etcd_tls branch 2 times, most recently from c04e5b3 to 8b89b74 Compare August 4, 2017 19:49

justinsb reviewed Aug 6, 2017

View reviewed changes

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 6, 2017

justinsb self-assigned this Aug 6, 2017

gambol99 force-pushed the etcd_tls branch from b32899d to 65e328f Compare August 6, 2017 16:07

justinsb reviewed Aug 6, 2017

View reviewed changes

Fixes

74f5961

- added the master option back the protokube, updating the nodeup model and protokube code - removed any comments no related to the PR as suggested - reverted the ordering of the mutex in the AWSVolumes in protokube

gambol99 force-pushed the etcd_tls branch from 65e328f to 74f5961 Compare August 6, 2017 17:52

Etcd TLS Options

f61e1eb

- changed the location of this variable to be in the etcd cluster spec rather the kops cluster spec - reflected the changes against the models

gambol99 force-pushed the etcd_tls branch from b35a524 to f61e1eb Compare August 6, 2017 19:50

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2017

k8s-github-robot merged commit 5d5945c into kubernetes:master Aug 6, 2017

tgrosinger mentioned this pull request Dec 26, 2017

Support TLS between Etcd when using Calico #4146

Closed

gambol99 deleted the etcd_tls branch February 22, 2019 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcd TLS Options #3114

Etcd TLS Options #3114

gambol99 commented Aug 2, 2017

k8s-ci-robot commented Aug 2, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

gambol99 Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb commented Aug 6, 2017

gambol99 commented Aug 6, 2017 •

edited

Loading

justinsb commented Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

justinsb Aug 6, 2017

gambol99 Aug 6, 2017

gambol99 commented Aug 6, 2017 •

edited

Loading

justinsb commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

chrislovecnm commented Sep 15, 2017

		type ProtokubeFlags struct {
		Master *bool `json:"master,omitempty" flag:"master"`

Etcd TLS Options #3114

Etcd TLS Options #3114

Conversation

gambol99 commented Aug 2, 2017

k8s-ci-robot commented Aug 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsb commented Aug 6, 2017

gambol99 commented Aug 6, 2017 • edited Loading

justinsb commented Aug 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gambol99 commented Aug 6, 2017 • edited Loading

justinsb commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

k8s-github-robot commented Aug 6, 2017

chrislovecnm commented Sep 15, 2017

gambol99 commented Aug 6, 2017 •

edited

Loading

gambol99 commented Aug 6, 2017 •

edited

Loading