-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling OptimisticLockError in kubelet node lease controller #79341
Conversation
a1b3b13
to
b6edc95
Compare
b6edc95
to
307e1f7
Compare
307e1f7
to
cf67243
Compare
/test pull-kubernetes-e2e-gce-100-performance |
patchCalls chan pair | ||
} | ||
|
||
func newFakeLeaseClient() *fakeLeaseClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to do that.
You want to use this one:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/kubernetes/typed/coordination/v1/fake/fake_coordination_client.go
instead.
Then you register Reactor:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/testing/fake.go#L102
and I think you don't need much more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the test. Reactor works perfectly.
fdebc07
to
d490d9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to take a deeper look too and will have a couple more cosmetic comments.
pkg/kubelet/nodelease/controller.go
Outdated
for i := 0; i < maxUpdateRetries; i++ { | ||
_, err := c.leaseClient.Update(c.newLease(base)) | ||
// OptimisticLockError requires getting the newer version of lease to proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it to the bottm of the loop - then you don't need to define err, and obviously in the first iteration it will always be nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I wanted to avoid additional get call. I assume that code is more readable now?
pkg/kubelet/nodelease/controller.go
Outdated
for i := 0; i < maxUpdateRetries; i++ { | ||
_, err := c.leaseClient.Update(c.newLease(base)) | ||
// OptimisticLockError requires getting the newer version of lease to proceed. | ||
if err != nil && strings.Contains(err.Error(), registry.OptimisticLockErrorMsg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to leak OptimisticLockErrorMsg here. Instead use:
apierrors.IsConflict() function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/assign @wangzhen127 |
e25b569
to
4dd8892
Compare
UID: types.UID("foo-uid"), | ||
}, | ||
} | ||
noConnectionUpdateErr := apierrors.NewServerTimeout(schema.GroupResource{Group: "v1", Resource: "lease"}, "put", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gr := schema.GroupResource{Group: "v1", Resource: "lease"}
and use it here and below (the lines will be shorter then).
optimistcLockUpdateErr := apierrors.NewConflict(schema.GroupResource{Group: "v1", Resource: "lease"}, "lease", fmt.Errorf("conflict")) | ||
cases := []struct { | ||
desc string | ||
updateReactor func(action clienttesting.Action) (handled bool, ret runtime.Object, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove names of parameters in the here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
}{ | ||
{ | ||
desc: "no errors", | ||
updateReactor: func(action clienttesting.Action) (handled bool, ret runtime.Object, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove names of output params - they are not used anywhere
both here and in all following cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4dd8892
to
8a66376
Compare
/test pull-kubernetes-bazel-test |
leaseDurationSeconds: 10, | ||
onRepeatedHeartbeatFailure: tc.onRepeatedHeartbeatFailure, | ||
} | ||
if err := c.retryUpdateLease(nil); fmt.Sprintf("%v", err) != fmt.Sprintf("%v", tc.expectErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need fmt.Sprintf?
Please fix so that we could simply compare errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hacky way to compare errors - normally you use Error(), however this will not work with nil. To avoid nil checks etc, using Spritnf will cause to call Error() if err != nil, or just returns .
Assuming that we don't really need to compare errors, but rather check if error was returned or not, I can change ecpectErr to bool and verify of err != nil. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that we don't really need to compare errors, but rather check if error was returned or not, I can change ecpectErr to bool and verify of err != nil. WDYT?
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
8a66376
to
d45197a
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: krzysied, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Updating node lease assumes that node lease hasn't changed between get and update calls. It sometimes happens that kubelet doesn't get ack from apiserver even though that node lease is correctly updated. Kubelet tries to re-update node lease, which fails due to the apiserver having never version of lease. This PR adds additional lease get when optimistic lock error happens.
This bug was described by @mborsz in #79096 (comment)
Special notes for your reviewer:
Does this PR introduce a user-facing change?: