New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm stacked etcd #69486

Merged
merged 2 commits into from Oct 27, 2018

Conversation

@fabriziopandini
Contributor

fabriziopandini commented Oct 6, 2018

What this PR does / why we need it:
kubeadm now automatically creates a new stacked etcd member when joining a new control plane node (does not applies to external etcd)

Which issue(s) this PR fixes:
Fixes # kubernetes/kubeadm#1123

Special notes for your reviewer:
IMO two points deserve more attention:

  • The fact the local/stacked etcd client now uses the IP defined by API server advertise address
  • How the joining node discovers the list of endpoints for the existing etcd members

I will keep this in WIP until there is agreement on the above points

Release note:

kubeadm now automatically creates a new stacked etcd member when joining a new control plane node (does not applies to external etcd)

/sig cluster-lifecycle
/kind feature

/cc @timothysc
/cc @chuckha
/cc @detiber
@kubernetes/sig-cluster-lifecycle-pr-reviews

@neolit123

thanks for the PR @fabriziopandini . added a couple of comments.

// advertise address
advertiseAddress := net.ParseIP(cfg.APIEndpoint.AdvertiseAddress)
if advertiseAddress == nil {
return nil, fmt.Errorf("error parsing APIEndpoint AdvertiseAddress %v: is not a valid textual representation of an IP address", cfg.APIEndpoint.AdvertiseAddress)

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

probably better %v: to be %q?

This comment has been minimized.

@fabriziopandini
}
// notifies the other members of the etcd cluster about the joining member
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

hm, did we investigate if this approach is safe?
what happens if the port is already taken on that endpoint?

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

That's the default peer port and "shouldn't collide" but we should definitely add a check here.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

added check

// notifies the other members of the etcd cluster about the joining member
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)
glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress)

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

Member -> member:

This comment has been minimized.

@fabriziopandini
}
glog.V(1).Infof("Updated etcd member list %v", initialCluster)
glog.V(1).Infoln("creating local etcd static pod manifest file")

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

possibly creating should be uppercase for consistency?

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

using Infoln

}
if len(initialCluster) == 0 {
defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress)

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

should we make such ports into constants?

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

Yes we should add them to the default consts file.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

done in a separated PR to keep the scope of this PR as small as possible

ret := map[string]string{}
for _, m := range resp.Members {
// fixes the entry for of the joining member (that doesn't have a name set in the initialCluster returned by etcd)

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

for of -> for or of only.

This comment has been minimized.

@fabriziopandini
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {
cli, err := clientv3.New(clientv3.Config{
Endpoints: c.Endpoints,
DialTimeout: 5 * time.Second,

This comment has been minimized.

@neolit123

neolit123 Oct 7, 2018

Member

i wonder if the connection times increase with the number of etcd pods.

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

I would up the timeout for new connection. Default client inside the api-server is 20 seconds.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

increased timeout to 20

@chuckha

Is the expectation that we've already copied over the etcd ca key/cert and generated the correct certificates?

@@ -36,10 +39,47 @@ const (
)
// CreateLocalEtcdStaticPodManifestFile will write local etcd static pod manifest file.
// This function is used by init (when there the etcd cluster is empty) or by kubeadm

This comment has been minimized.

@chuckha

chuckha Oct 8, 2018

Member

"(when there the etcd cluster is empty)" => "(when the etcd cluster is empty)"

This comment has been minimized.

@fabriziopandini
}
// AddMember notifies an existing etcd cluster that a new member is joining
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {

This comment has been minimized.

@chuckha

chuckha Oct 8, 2018

Member

I almost want this map[string]string to be a struct, something like etcdClusterConfiguration or similar. Do you think that would add anything here?

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

done using a struct

return nil, err
}
// Note for reviewers: I'm not sure this is the best method for getting the endpoint

This comment has been minimized.

@chuckha

chuckha Oct 8, 2018

Member

I think this is a good approach with one minor concern: there is a race condition that the PodIP may change after the PodIP lookup and before our client access, but I'm ok living with that possibility.

I wonder if there is a clean way to solve this by managing etcd as a statefulset. The other thought I had was adding a Service object to the static manifest and managing that by hand (by kubeadm). The goal of these two ideas is to provide a stable DNS name instead of a PodIP lookup. Either way, those types of changes would be way out of scope for this PR.

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.

This comment has been minimized.

@detiber

detiber Oct 11, 2018

Member

+1 to using ClusterStatus or a pre-set annotation to use as a starting point.

After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.

This comment has been minimized.

@detiber

detiber Oct 11, 2018

Member

Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

done reading from cluster status + calling sync to get the real list of endpoints from etcd

@timothysc

Bunch of comments plus we should highlight that folks will need todo this on odd numbers for stacked deploys.

@@ -334,8 +338,8 @@ func GetEtcdPeerAltNames(cfg *kubeadmapi.InitConfiguration) (*certutil.AltNames,
// create AltNames with defaults DNSNames/IPs
altNames := &certutil.AltNames{
DNSNames: []string{cfg.NodeRegistration.Name, "localhost"},
IPs: []net.IP{advertiseAddress, net.IPv4(127, 0, 0, 1), net.IPv6loopback},
DNSNames: []string{cfg.NodeRegistration.Name},

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

Why are you removing loopback from the SAN?

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

I'd have to dig through history but I remember us adding it on purpose.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

sound strange, but restored

}
// notifies the other members of the etcd cluster about the joining member
etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

That's the default peer port and "shouldn't collide" but we should definitely add a check here.

if err != nil {
return err
}
glog.V(1).Infof("Updated etcd member list %v", initialCluster)

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

We should do a local client member check, to verify it's correct, and to list the current members in the log line.

}
glog.V(1).Infof("Updated etcd member list %v", initialCluster)
glog.V(1).Infoln("creating local etcd static pod manifest file")

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.

}
if len(initialCluster) == 0 {
defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress)

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

Yes we should add them to the default consts file.

return nil, err
}
// Note for reviewers: I'm not sure this is the best method for getting the endpoint

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.

func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {
cli, err := clientv3.New(clientv3.Config{
Endpoints: c.Endpoints,
DialTimeout: 5 * time.Second,

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

I would up the timeout for new connection. Default client inside the api-server is 20 seconds.

}
// AddMember notifies an existing etcd cluster that a new member is joining
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.

}
// AddMember notifies an existing etcd cluster that a new member is joining
func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {

This comment has been minimized.

@timothysc

timothysc Oct 10, 2018

Member

Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage

@timothysc

This comment has been minimized.

Member

timothysc commented Oct 10, 2018

etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)
glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress)
initialCluster, err := etcdClient.AddMember(cfg.NodeRegistration.Name, etcdPeerAddress)

This comment has been minimized.

@detiber

detiber Oct 11, 2018

Member

Is this method used during upgrade? If so, it seems odd to call 'AddMember()' for an instance that is already a member of the etcd cluster.

This comment has been minimized.

@fabriziopandini

fabriziopandini Oct 21, 2018

Contributor

No, this is used only when adding a new control plane instance

return nil, err
}
// Note for reviewers: I'm not sure this is the best method for getting the endpoint

This comment has been minimized.

@detiber

detiber Oct 11, 2018

Member

+1 to using ClusterStatus or a pre-set annotation to use as a starting point.

After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.

return nil, err
}
// Note for reviewers: I'm not sure this is the best method for getting the endpoint

This comment has been minimized.

@detiber

detiber Oct 11, 2018

Member

Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Oct 21, 2018

@neolit123 @chuckha @timothysc @detiber Thanks for the valuable feedback!

Now

  1. The list of etcd members is initially discovered from cluster status, and then I query etcd for getting the real list of members.
  2. Now also upgrades works with stacked etcd

The latest open point to be addressed before removing WIP is the usage of API server advertise an address, that can lead to problems in case the user choose an advertise address that doesn't correspond to any IP address on the machine

@timothysc

This comment has been minimized.

Member

timothysc commented Oct 24, 2018

@fabriziopandini Is this ready to go? If so can you remove the WIPs from this and other PRs.

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Oct 24, 2018

@timothysc last point pending is the discussion about usage of API server advertise for etcd

@timothysc

This comment has been minimized.

Member

timothysc commented Oct 24, 2018

@fabriziopandini ^ want to update now?

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Oct 24, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Oct 24, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 26, 2018

@fabriziopandini fabriziopandini changed the title from [WIP] - kubeadm stacked etcd to kubeadm stacked etcd Oct 26, 2018

@timothysc timothysc added the lgtm label Oct 26, 2018

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Oct 26, 2018

@timothysc this is ready to go for me now
If we can have some help on testing this before the end of the cycle it will be great...

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Oct 26, 2018

/test pull-kubernetes-e2e-kops-aws

@fejta-bot

This comment has been minimized.

fejta-bot commented Oct 26, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Oct 26, 2018

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 26, 2018

@neolit123

This comment has been minimized.

Member

neolit123 commented Oct 27, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Oct 27, 2018

@k8s-ci-robot k8s-ci-robot merged commit 481fa19 into kubernetes:master Oct 27, 2018

18 checks passed

cla/linuxfoundation fabriziopandini authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment