kubeadm: fix flakes when performing etcd MemberAdd on slower setups #90645

neolit123 · 2020-04-30T15:57:47Z

What this PR does / why we need it:

In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.

The client dial can also fail for similar reasons.

Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).

This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
made consistent, so that the etcd logic is in a sane state.

Which issue(s) this PR fixes:

refs kubernetes/kubeadm#2094

Special notes for your reviewer:
NONE

Does this PR introduce a user-facing change?:

kubeadm: increase robustness for "kubeadm join" when adding etcd members on slower setups

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

In slower setups it can take more time for the existing cluster to be in a healthy state, so the existing backoff of ~50 seconds is apparently not sufficient. The client dial can also fail for similar reasons. Improve kubeadm's join toleration of adding new etcd members. Wrap both the client dial and member add in a longer backoff (up to ~200 seconds). This particular change should be backported to the support skew. In a future change for master, all etcd client operations should be make consistent so that the etcd logic is in a sane state.

neolit123 · 2020-04-30T15:58:16Z

/kind bug

neolit123 · 2020-04-30T15:58:58Z

/priority important-soon

neolit123 · 2020-05-01T17:58:54Z

/approve cancel
/assign @fabriziopandini

aojea · 2020-05-02T23:09:26Z

/cc @ereslibre

fabriziopandini

@neolit123 thanks!
/approve
/lgtm

/hold
for @ereslibre to review the changes

k8s-ci-robot · 2020-05-04T07:40:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123 · 2020-05-10T23:20:44Z

applying lazy consensus until Tuesday next week.
@kubernetes/sig-cluster-lifecycle-pr-reviews

/retest

neolit123 · 2020-05-13T01:48:09Z

/hold cancel

fejta-bot · 2020-05-13T08:04:26Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-05-13T12:16:16Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

…645-origin-release-1.18 Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower

…645-origin-release-1.17 Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 30, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2020

k8s-ci-robot requested review from detiber and yagonobre April 30, 2020 15:59

neolit123 changed the title ~~WIP: kubeadm: fix flakes when performing etcd MemberAdd on slower setups~~ kubeadm: fix flakes when performing etcd MemberAdd on slower setups May 1, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2020

k8s-ci-robot assigned fabriziopandini May 1, 2020

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 1, 2020

k8s-ci-robot requested a review from ereslibre May 2, 2020 23:09

fabriziopandini approved these changes May 4, 2020

View reviewed changes

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 4, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2020

k8s-ci-robot merged commit 3b02433 into kubernetes:master May 13, 2020

k8s-ci-robot added this to the v1.19 milestone May 13, 2020

This was referenced May 14, 2020

Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower #91079

Merged

Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower #91080

Merged

k8s-ci-robot added a commit that referenced this pull request May 18, 2020

Merge pull request #91080 from neolit123/automated-cherry-pick-of-#90…

e848e58

…645-origin-release-1.18 Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower

neolit123 mentioned this pull request May 19, 2020

Increase robustness for kubeadm join / add etcd kubernetes/kubeadm#2094

Closed

k8s-ci-robot added a commit that referenced this pull request May 30, 2020

Merge pull request #91079 from neolit123/automated-cherry-pick-of-#90…

9e7f164

…645-origin-release-1.17 Automated cherry pick of #90645: kubeadm: fix flakes when performing etcd MemberAdd on slower

neolit123 mentioned this pull request Jun 11, 2020

make the etcd client retry logic consistent kubernetes/kubeadm#2181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: fix flakes when performing etcd MemberAdd on slower setups #90645

kubeadm: fix flakes when performing etcd MemberAdd on slower setups #90645

neolit123 commented Apr 30, 2020 •

edited

Loading

neolit123 commented Apr 30, 2020 •

edited

Loading

neolit123 commented Apr 30, 2020

neolit123 commented May 1, 2020

aojea commented May 2, 2020

fabriziopandini left a comment

k8s-ci-robot commented May 4, 2020

neolit123 commented May 10, 2020

neolit123 commented May 13, 2020

fejta-bot commented May 13, 2020

fejta-bot commented May 13, 2020

kubeadm: fix flakes when performing etcd MemberAdd on slower setups #90645

kubeadm: fix flakes when performing etcd MemberAdd on slower setups #90645

Conversation

neolit123 commented Apr 30, 2020 • edited Loading

neolit123 commented Apr 30, 2020 • edited Loading

neolit123 commented Apr 30, 2020

neolit123 commented May 1, 2020

aojea commented May 2, 2020

fabriziopandini left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 4, 2020

neolit123 commented May 10, 2020

neolit123 commented May 13, 2020

fejta-bot commented May 13, 2020

fejta-bot commented May 13, 2020

neolit123 commented Apr 30, 2020 •

edited

Loading

neolit123 commented Apr 30, 2020 •

edited

Loading