-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: fix some retry logic in PatchNodeOnce #105343
Conversation
Hi @jonyhy96. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/release-note-none |
/priority awaiting-more-evidence |
Thanks for review! ping @neolit123 |
bfbc5eb
to
8e5ca51
Compare
/retest |
1 similar comment
/retest |
the kind CI failures seem somewhat related to the patch node changes here:
we should be careful with similar refactors to not break kubeadm, given the lack of unit tests. |
The reason for this failure is because we didn’t have enough time to retry. |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
/test pull-kubernetes-e2e-kind |
Factor: 2.0, | ||
Jitter: 0, | ||
} | ||
err := wait.ExponentialBackoff(backOff, PatchNodeOnce(client, nodeName, patchFn, &lastError)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something is not right here, why did wait.Poll(constants.APICallRetryInterval, constants.PatchNodeTimeout...
worked before and not after the change?
these interval / timeout constants have been used for a very long time in kubeadm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait.Poll
's retry logic is diffirent from wait.ExponentialBackoff
's.
failed message from https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/105343/pull-kubernetes-e2e-kind/1447957425293365248/build-log.txt
I1012 16:23:28.581402 214 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///run/containerd/containerd.sock" to the Node API object "kind-worker2" as an annotation
I1012 16:23:28.599223 214 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 17 milliseconds
I1012 16:23:28.612455 214 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 1 milliseconds
I1012 16:23:28.666911 214 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 2 milliseconds
I1012 16:23:28.937311 214 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 2 milliseconds
success message from https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/105343/pull-kubernetes-e2e-kind/1448583659891200000/build-log.txt
I1014 09:54:30.367606 213 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///run/containerd/containerd.sock" to the Node API object "kind-worker2" as an annotation
I1014 09:54:30.390173 213 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 22 milliseconds
I1014 09:54:30.895376 213 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 404 Not Found in 4 milliseconds
I1014 09:54:31.900145 213 round_trippers.go:541] GET https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 200 OK in 3 milliseconds
I1014 09:54:31.912029 213 round_trippers.go:541] PATCH https://kind-control-plane:6443/api/v1/nodes/kind-worker2?timeout=10s 200 OK in 6 milliseconds
We can see when use DefaultBackoff
in ExponentialBackoff, it steps too fast to wait for kind-worker2 registry on apiserver. When using wait.Poll
it will retry in every 500ms and timeout in 2min which gives it enough time to wait.This backOff
is similar to the behavior of wait.Poll
that it will timeout at 127.5s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the functions are different. It seems out of scope of this PR to change the function. We should continue to use Poll with the old constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, use wait.Poll instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this looks good now.
please squash the commits to 1 and i will LGTM.
/remove-priority awaiting-more-evidence
/priority backlog
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jonyhy96, neolit123 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: haoyun <yun.hao@daocloud.io>
50dc3bc
to
bd8f26c
Compare
/test pull-kubernetes-unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix the retry logic in
PatchNodeOnce
Special notes for your reviewer:
None