Skip to content

Commit

Permalink
kubeadm: fix flakes when performing etcd MemberAdd on slower setups
Browse files Browse the repository at this point in the history
In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.

The client dial can also fail for similar reasons.

Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).

This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
make consistent so that the etcd logic is in a sane state.
  • Loading branch information
neolit123 committed May 14, 2020
1 parent 2db6ec1 commit 74c1fde
Showing 1 changed file with 21 additions and 12 deletions.
33 changes: 21 additions & 12 deletions cmd/kubeadm/app/util/etcd/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -269,23 +269,32 @@ func (c *Client) AddMember(name string, peerAddrs string) ([]Member, error) {
return nil, errors.Wrapf(err, "error parsing peer address %s", peerAddrs)
}

cli, err := clientv3.New(clientv3.Config{
Endpoints: c.Endpoints,
DialTimeout: dialTimeout,
DialOptions: []grpc.DialOption{
grpc.WithBlock(), // block until the underlying connection is up
},
TLS: c.TLS,
})
if err != nil {
return nil, err
// Exponential backoff for the MemberAdd operation (up to ~200 seconds)
etcdBackoffAdd := wait.Backoff{
Steps: 18,
Duration: 100 * time.Millisecond,
Factor: 1.5,
Jitter: 0.1,
}
defer cli.Close()

// Adds a new member to the cluster
var lastError error
var resp *clientv3.MemberAddResponse
err = wait.ExponentialBackoff(etcdBackoff, func() (bool, error) {
err = wait.ExponentialBackoff(etcdBackoffAdd, func() (bool, error) {
cli, err := clientv3.New(clientv3.Config{
Endpoints: c.Endpoints,
DialTimeout: etcdTimeout,
DialOptions: []grpc.DialOption{
grpc.WithBlock(), // block until the underlying connection is up
},
TLS: c.TLS,
})
if err != nil {
lastError = err
return false, nil
}
defer cli.Close()

ctx, cancel := context.WithTimeout(context.Background(), etcdTimeout)
resp, err = cli.MemberAdd(ctx, []string{peerAddrs})
cancel()
Expand Down

0 comments on commit 74c1fde

Please sign in to comment.