-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd TLS Options #3114
Etcd TLS Options #3114
Conversation
Hi @gambol99. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
ff5ac91
to
ad4846e
Compare
c04e5b3
to
8b89b74
Compare
nodeup/pkg/model/protokube.go
Outdated
// Build is responsible for generating the options for protokube | ||
func (t *ProtokubeBuilder) Build(c *fi.ModelBuilderContext) error { | ||
// @check if protokube; we have decided to disable this by default (https://github.com/kubernetes/kops/pull/3091) | ||
if !t.IsMaster { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want protokube if we're running with gossip DNS i.e. the same change we had to make there && !dns.IsGossipHostname(b.Cluster.Spec.MasterInternalName)
. I suspect this means you can't reflow the function as you've done here, but I could well be wrong...
nodeup/pkg/model/protokube.go
Outdated
// retrieve the etcd peer certificates and private keys from the keystore | ||
if t.Cluster.Spec.EnableEtcdTLS { | ||
for _, x := range []string{"etcd", "etcd-client"} { | ||
if err = t.buildCeritificateTask(c, x, fmt.Sprintf("%s.pem", x)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo
type ProtokubeFlags struct { | ||
Master *bool `json:"master,omitempty" flag:"master"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The master flag is important, otherwise we risk mounting volumes on the nodes (although I guess this is because of gossip DNS). I do want to split up protokube more, i.e. gossip DNS really shouldn't be linked to protokube, but one change at a time I think...
func (t *ProtokubeBuilder) ProtokubeFlags(k8sVersion semver.Version) *ProtokubeFlags { | ||
f := &ProtokubeFlags{} | ||
|
||
master := t.IsMaster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll have to put back the isMaster
logic here.
pkg/apis/kops/cluster.go
Outdated
@@ -110,6 +110,8 @@ type ClusterSpec struct { | |||
UpdatePolicy *string `json:"updatePolicy,omitempty"` | |||
// Additional policies to add for roles | |||
AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"` | |||
// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients | |||
EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Is it possible to put this into the EtcdClusterSpec? I think it's still fine to share keys, but it logically makes more sense (IMO) in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So originally it placed in the EtcdClusterSpec; but issue was the settings effect both cluster regardless since the kube-apiserver would have to setup to use a client cert and TLS on etcd endpoints and the etcd override used by #events/ reuses the same client cert anyhow. Hence why I threw it on the outside, as there is no means for you to set them independently; but clusters are gated by the same option.
If it was placed inside the spec, should it just throw a validation error so one is set but the other is not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.
The whole split of events is sort of vestigial anyway.
My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.
I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.
When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.
But this is my opinion - if you disagree we can do it the other way!
// ensure we have the correct probe schema | ||
if c.isTLS() { | ||
container.LivenessProbe.TCPSocket = &v1.TCPSocketAction{ | ||
Host: "127.0.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I checked on how the GCE configuration does it, and AFAICT they don't use TLS for clients (which are always over localhost), just for peer-to-peer, and then they health-check on the client-port. Your approach is better IMO.
}, | ||
}) | ||
// @check if tls is enabled and mount the directory - it might be worth concidering | ||
// if we you use our own directory in /srv i.e /srv/etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make sense in my mind, in that we want to think about isolating all these keys from each other. I think also using something under /etc/kubernetes
would work well.
But I also believe not too hard to change later.
return fmt.Errorf("error getting state of file %q: %v", p, err) | ||
} | ||
|
||
f, err := os.Create(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: fix the race here :-) os.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_EXCL, 0644)
.
protokube/pkg/protokube/kube_boot.go
Outdated
@@ -86,52 +84,37 @@ func (k *KubeBoot) RunSyncLoop() { | |||
} | |||
|
|||
func (k *KubeBoot) syncOnce() error { | |||
if k.Master { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do only want to run this on the master (i.e. not if gossip dns)
protokube/pkg/protokube/kube_boot.go
Outdated
if k.Master && k.ApplyTaints { | ||
if err := ApplyMasterTaints(k.Kubernetes); err != nil { | ||
// apply the kubernetes taints? | ||
if k.ApplyTaints { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And definitely only want to taint the master :-)
Looks really good - thanks. The only blockers are that we need protokube in the gossip-dns case, and so we can't strip out the Other than that I'm inclined to say we merge and iterate - it is effectively gated behind the field. I know there was a concern in that if people changed the option for an existing cluster we would have to change the peer urls, but I put together #3147 (and #3148) which will let us block this "real soon now". /ok-to-test |
hi @justinsb ... thank you for the quick response!!! :-) .. I did that changes requested (see commit below), but just wanted to get feedback on the EtcdClusterSpec; comments left above. Also, should the e2e now support creating a TLS enabled cluster; admittedly I wouldn't have a clue how to add that though :-) |
The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster - cleaned up and fixed any formatting issues on the journey - added two new certificates (server/client) for etcd peers and a client certificate for kubeapi and others perhaps (perhaps calico?) - disabled the protokube service for nodes completely is not required; note this was first raised in kubernetes#3091, but figured it would be easier to place in here given the relation - updated protokube codebase to reflect the changes, removing the master option as its no longer required - added additional integretion tests for the protokube manifests; - note, still need to add documentation, but opening the PR to get feedback - one outstanding issue is the migration from http -> https for preexisting clusters, i'm gonna hit the coreos board to ask for the best options
Looks great: we just need to agree on whether to put TLS into the EtcdClusterSpec (my inclination is yes but I'm open to other views), and then I think we might still be mounting the volumes on the master, although I could well have missed it! But then we can merge :-) Re e2e testing, it's actually been very complicated and I'm working on making it simpler... kubernetes/test-infra#3883 will eventually make it simpler. We would need a way to specify tls=true for the command line; I saw a cool proposal on slack to expose |
pkg/apis/kops/cluster.go
Outdated
@@ -110,6 +110,8 @@ type ClusterSpec struct { | |||
UpdatePolicy *string `json:"updatePolicy,omitempty"` | |||
// Additional policies to add for roles | |||
AdditionalPolicies *map[string]string `json:"additionalPolicies,omitempty"` | |||
// EnableEtcdTLS indicates the etcd service should use TLS between peers and clients | |||
EnableEtcdTLS bool `json:"enableEtcdTLS,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's awkward having to set it twice... I would have guessed that setting the override to http (instead of https) would have worked, but I can easily see that this isn't a scenario that k8s would have tested.
The whole split of events is sort of vestigial anyway.
My 2c is that because it is more logically coherent we should put the TLS config on the EtcdClusterSpec, and raise errors in validation if the value is non-equal across clusters. I imagine we would do the same for etcd versions. I don't think there's a big use case for setting them separately, but I also don't think it's something that will be changed terribly often so I think it's a tolerable inconvenience.
I think in general we want to get people to TLS & etcd3. One way to do that have create cluster default new clusters to TLS with etcd3, as of a certain version of k8s.
When we have aggregated api servers, we might have more etcd servers. They likely won't be configured in the same Cluster object, but it would be nice here to have a consistent self-contained object.
But this is my opinion - if you disagree we can do it the other way!
@@ -312,23 +282,23 @@ func run() error { | |||
} | |||
|
|||
k := &protokube.KubeBoot{ | |||
Master: master, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would certainly be happy to see gossip dns split out (or protokube split out into an etcd piece, an addons piece and a gossip DNS piece), but my guess is that it isn't the most important thing on your plate :-) Protokube etcd also sets some gossip DNS hostnames, so it's not totally trivial, though I don't think it's too bad...
protokube/pkg/protokube/kube_boot.go
Outdated
return err | ||
} | ||
// attempt to mount the volumes | ||
volumes, err := k.volumeMounter.mountMasterVolumes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing it, but have we stopped calling this on non-masters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, I must have messed it up in the revert ... https://github.com/kubernetes/kops/compare/master...gambol99:etcd_tls?diff=split&name=etcd_tls#diff-eed2b06d1a16a4f2b372370d5795f8de
hi @justinsb ... so i went with your recommendation, you can check this commit and reverted to using the EtcdCluster spec rather than Cluster spec. I did another quick test in aws to ensure it's cool. core@ip-10-250-34-40 /etc/kubernetes/manifests $ docker exec -ti c9f8d4180d13 sh
/ # etcdctl -C https://127.0.0.1:4001 --cert-file /srv/kubernetes/etcd-client.pem --key-file /srv/kubernetes/etcd-client-key.pem --ca-file /srv/kubernetes/ca.crt cluster-health
member 30ce51b661909c54 is healthy: got healthy result from https://etcd-b-1.internal.testing.testing.domain.com:4001
member 8529efc60ffe70dd is healthy: got healthy result from https://etcd-a-1.internal.testing.testing.domain.com:4001
member 890f8dac75a542b1 is healthy: got healthy result from https://etcd-a-3.internal.testing.testing.domain.com:4001
member 8c1f371803810af2 is healthy: got healthy result from https://etcd-b-2.internal.testing.testing.domain.com:4001
member c2277fdad4c57074 is healthy: got healthy result from https://etcd-a-2.internal.testing.testing.domain.com:4001
cluster is healthy And the logs now appear in the kube-logs for etcd [jest@starfury kops]$ kubectl --namespace=kube-system logs etcd-server-ip-10-250-34-44.eu-west-2.compute.internal | tail -n 4
2017-08-06 19:23:11.581215 E | rafthttp: failed to dial 890f8dac75a542b1 on stream MsgApp v2 (dial tcp 10.250.34.22:2380: i/o timeout)
2017-08-06 19:23:11.706713 I | rafthttp: the connection with c2277fdad4c57074 became active
2017-08-06 19:23:11.706834 I | rafthttp: the connection with 890f8dac75a542b1 became active
2017-08-06 19:23:11.707051 I | rafthttp: the connection with 30ce51b661909c54 became active And everything looks good :-)
|
- changed the location of this variable to be in the etcd cluster spec rather the kops cluster spec - reflected the changes against the models
/lgtm Nice work, sir :-) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gambol99, justinsb The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
@gambol99 did we get docs?? |
The current implementation does not put any transport security on the etcd cluster. The PR provides and optional flag to enable TLS the etcd cluster