-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go: refactor retry logic for backoff, rate limiter and metric to be reused by Watch, Stream, and Do #108347
Conversation
// we can merge these two sleeps: | ||
// BackOffManager.Sleep(max(backoffManager.CalculateBackoff(), retryAfter)) | ||
// see https://github.com/kubernetes/kubernetes/issues/108302 | ||
request.backoff.Sleep(r.retryAfter.Wait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this sleep take into account the context?
It seems that the context can be cancelled meanwhile we are sleeping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it should, maybe the BackoffManager
interface predates context
. we allow users to specify their own BackoffManager
instance, so adding context to Sleep
would be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but at least we should have some place here that checks the context or we'll retry with a context that was cancelled.
maybe at the end of the function we can return ctx.Err() or check is not nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L127 func (r *withRetry) prepareForNextRetry(ctx context.Context, request *Request) error {
is doing the check for the context too
defer func() { | ||
// we are done with this attempt, start with a clean slate | ||
r.retryAfter = nil | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why defer if there are is only one exit path?
} | ||
|
||
// we always do a backoff sleep including the first try | ||
request.backoff.Sleep(request.backoff.CalculateBackoff(url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it simplify something if we sum the values ?
request.backoff.CalculateBackoff(url) + r.retryAfter.Wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the suggestion from @wojtek-t was to max(backoffManager.CalculateBackoff(), retryAfter)
, this will be done in a follow up PR, this PR is refactor-only
} | ||
|
||
func (r *withRetry) BeforeNextRetry(ctx context.Context, backoff BackoffManager, retryAfter *RetryAfter, url string, body io.Reader) error { | ||
func (r *withRetry) PrepareForNextRetry(ctx context.Context, request *Request) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern seems to be always
if r.retry.IsNextRetry(req, resp, err, neverRetryError) {
err := r.retry.PrepareForNextRetry(ctx, r)
if err == nil {
return false, nil
and this PrepareForNextRetry does 3 additional checks, the new one is just check the output of IsNextRetry
that is set r.retryAfter != nil
Should we merge this 2 methods? is PrepareForNextRetry really needed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good suggestion, i combined these two, now Stream
, Watch
, and Do
look more easy to follow
// we are done with this attempt, start with a clean slate | ||
r.retryAfter = nil | ||
}() | ||
updateURLMetrics(ctx, request, resp, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the metrics should not belong here, inside the retry logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i moved the metric out to its original place.
68c3b8e
to
b7ae62f
Compare
this looks really nice, |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice
// if retry is set to true, retryAfter will contain the information | ||
// regarding the next retry. | ||
// IsNextRetry internally maintains the retry after | ||
// parameters - retry reason, and wait duration associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part of the comment - what parameters? Where those are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's confusing, i moved the comments below where it's more localized
@@ -918,8 +866,7 @@ func (r *Request) request(ctx context.Context, fn func(*http.Request, *http.Resp | |||
fn(req, resp) | |||
} | |||
|
|||
var retry bool | |||
retryAfter, retry = r.retry.NextRetry(req, resp, err, func(req *http.Request, err error) bool { | |||
if retry := r.retry.IsNextRetry(ctx, r, req, resp, err, func(req *http.Request, err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you define the input function excplitly (same as in line 605 for Watch()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping (this comment and the other)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as soon as I define an explicit func, TestDoRequestSuccess
keeps failing, it's very strange, I will pursue it in a follow up PR if that is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had a typo, it's fixed now.
/triage accepted |
edac4aa
to
1553996
Compare
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tkashem, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-e2e-gce-ubuntu-containerd Kubernetes e2e suite: [sig-cli] Kubectl client Simple pod should contain last line of the log expand_less |
return nil | ||
} | ||
|
||
func (r *withRetry) After(ctx context.Context, request *Request, resp *http.Response, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we updating backoff here? Why can't this be in Before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we update the backoff after we get an answer tuple (response, err)
from the server. If we store the answer tuple then maybe we can update back off in Before
and thus get rid of After
. I can look into it when I do the follow up refactor.
On the other hand, After
can be a useful place to call metrics, update back-off and such. We can decide whether it makes sense to keep After
when we do the follow-up refactor.
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Watch
,Stream
, andDo
Which issue(s) this PR fixes:
Fixes #108302
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: