Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: increase robustness for kubeadm etcd operations #92131

Merged
merged 1 commit into from Jul 1, 2020

Conversation

SataQiu
Copy link
Member

@SataQiu SataQiu commented Jun 15, 2020

What type of PR is this?
/kind bug
/kind cleanup

What this PR does / why we need it:
make the etcd client retry logic consistent, increase robustness for kubeadm etcd operations.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2181

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. labels Jun 15, 2020
@k8s-ci-robot
Copy link
Contributor

@SataQiu: The label(s) kind/cleanu cannot be applied, because the repository doesn't have them

In response to this:

What type of PR is this?
/kind bug
/kind cleanu

What this PR does / why we need it:
make the etcd client retry logic consistent, increase robustness for kubeadm etcd operations.

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2181

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 15, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Jun 15, 2020

/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 15, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Jun 15, 2020

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 15, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Jun 15, 2020

/assign @neolit123

@SataQiu
Copy link
Member Author

SataQiu commented Jun 15, 2020

/retest

var err error
cli, err = clientv3.New(clientv3.Config{
Endpoints: c.Endpoints,
DialTimeout: dialTimeout,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for MemberAdd we made this equal to etcdTimeout, which means that we can potentially remove the dialTimeout var.

the note on https://github.com/kubernetes/kubernetes/blob/64c70a7f0f4a55398b6bc7fc1be0fde3dcaa97bb/cmd/kubeadm/app/util/etcd/etcd.go#L213-L216

is concerning, but maybe our new retries scheme would be beneficial for the arm64 issue too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's do it.
If there's a problem, we can revert it back.

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @SataQiu .
added one comment to make the dial timeout consistent too.

/assign @fabriziopandini

Signed-off-by: SataQiu <1527062125@qq.com>
@neolit123
Copy link
Member

/lgtm
defering to @fabriziopandini for approval.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2020
@SataQiu
Copy link
Member Author

SataQiu commented Jun 16, 2020

/retest

@SataQiu
Copy link
Member Author

SataQiu commented Jun 17, 2020

/test pull-kubernetes-node-e2e

@SataQiu
Copy link
Member Author

SataQiu commented Jun 19, 2020

/cc @fabriziopandini

@neolit123
Copy link
Member

/lgtm

@neolit123
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2020
@dims
Copy link
Member

dims commented Jun 29, 2020

/test pull-kubernetes-integration
/test pull-kubernetes-kubemark-e2e-gce-big

@SataQiu
Copy link
Member Author

SataQiu commented Jun 30, 2020

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@neolit123
Copy link
Member

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 4c523b1 into kubernetes:master Jul 1, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make the etcd client retry logic consistent
6 participants