-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go: add unit test to verify order of calls with retry #108262
Conversation
396317f
to
166c7d2
Compare
166c7d2
to
6f878de
Compare
@aojea that's great, i will review the retry metric PR you are working on. In the meantime, this unit test PR can still merge, it asserts on the expected behavior with rate limiter, backoff and the existing metric call. Can you review this PR please? :) |
I think we should test more permutations,
something like (it can be simplified.) 😄 diff --git a/staging/src/k8s.io/client-go/rest/request_test.go b/staging/src/k8s.io/client-go/rest/request_test.go
index 49689119223..8ac86355183 100644
--- a/staging/src/k8s.io/client-go/rest/request_test.go
+++ b/staging/src/k8s.io/client-go/rest/request_test.go
@@ -3000,8 +3000,10 @@ func (lb *withRateLimiterBackoffManagerAndMetrics) Do() {
func testRetryWithRateLimiterBackoffAndMetrics(t *testing.T, key string, doFunc func(ctx context.Context, r *Request)) {
type expected struct {
- attempts int
- order []string
+ attempts int
+ order []string
+ sleeps []string
+ statusCodes []string
}
// we define the expected order of how the client invokes the
@@ -3010,47 +3012,6 @@ func testRetryWithRateLimiterBackoffAndMetrics(t *testing.T, key string, doFunc
// - A: original request fails with a retryable response: (500, 'Retry-After: 1')
// - B: retry 1: successful with a status code 200
// so we have a total of 2 attempts
- callOrderExpected := []string{
- // before we send the request to the server:
- // - we wait as dictated by the client rate lmiter
- // - we wait, as dictated by the backoff manager
- "RateLimiter.Wait",
- "BackoffManager.CalculateBackoff",
- "BackoffManager.Sleep",
-
- // A: first attempt for which the server sends a retryable response
- "Client.Do",
-
- // we got a response object, status code: 500, Retry-Afer: 1
- // - call metrics method with appropriate status code
- // - update backoff parameters with the status code returned
- // - sleep for N seconds from 'Retry-After: N' response header
- "RequestResult.Increment",
- "BackoffManager.UpdateBackoff",
- "BackoffManager.Sleep",
- // sleep for delay dictated by backoff parameters
- "BackoffManager.CalculateBackoff",
- "BackoffManager.Sleep",
- // wait as dictated by the client rate lmiter
- "RateLimiter.Wait",
-
- // B: 2nd attempt: retry, and this should return a status code=200
- "Client.Do",
-
- // it's a success, so do the following:
- // - call metrics and update backoff parameters
- "RequestResult.Increment",
- "BackoffManager.UpdateBackoff",
- }
- sleepExpected := []string{
- "0s", // initial backoff.Sleep before we send the request to the server for the first time
- (1 * time.Second).String(), // from 'Retry-After: 1' response header
- (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
- }
- statusCodesExpected := []string{
- "500",
- "200",
- }
tests := []struct {
name string
@@ -3060,7 +3021,76 @@ func testRetryWithRateLimiterBackoffAndMetrics(t *testing.T, key string, doFunc
expectations map[string]expected
}{
{
- name: "success after two retries",
+ name: "success",
+ maxRetries: 2,
+ serverReturns: []responseErr{
+ {response: &http.Response{StatusCode: http.StatusOK}, err: nil},
+ },
+ expectations: map[string]expected{
+ "Do": {
+ attempts: 1,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate limiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ },
+ statusCodes: []string{
+ "200",
+ },
+ },
+ "Watch": {
+ attempts: 1,
+ // Watch does not do 'RateLimiter.Wait' before initially sending the request to the server
+ order: []string{
+ // before we send the request to the server:
+ // - we wait, as dictated by the backoff manager
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ },
+ "Stream": {
+ attempts: 1,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate limiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ },
+ },
+ },
+ {
+ name: "success after 2 retries",
maxRetries: 2,
serverReturns: []responseErr{
{response: retryAfterResponse(), err: nil},
@@ -3069,19 +3099,358 @@ func testRetryWithRateLimiterBackoffAndMetrics(t *testing.T, key string, doFunc
expectations: map[string]expected{
"Do": {
attempts: 2,
- order: callOrderExpected,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "200",
+ },
},
"Watch": {
attempts: 2,
- // Watch does not do 'RateLimiter.Wait' before initially sending the request to the server
- order: callOrderExpected[1:],
+ order: []string{
+ // before we send the request to the server:
+ // - we wait, as dictated by the backoff manager
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "200",
+ },
},
"Stream": {
attempts: 2,
- order: callOrderExpected,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "200",
+ },
},
},
},
+ {
+ name: "failure after 2 retries",
+ maxRetries: 2,
+ serverReturns: []responseErr{
+ {response: retryAfterResponse(), err: nil},
+ {response: retryAfterResponse(), err: nil},
+ {response: retryAfterResponse(), err: nil},
+ },
+
+ expectations: map[string]expected{
+ "Do": {
+ attempts: 3,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ "Client.Do",
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ "RateLimiter.Wait",
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (4 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "500",
+ "500",
+ },
+ },
+ "Watch": {
+ attempts: 3,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait, as dictated by the backoff manager
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "500",
+ },
+ },
+ "Stream": {
+ attempts: 3,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // we got a response object, status code: 500, Retry-Afer: 1
+ // - call metrics method with appropriate status code
+ // - update backoff parameters with the status code returned
+ // - sleep for N seconds from 'Retry-After: N' response header
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ "BackoffManager.Sleep",
+ // sleep for delay dictated by backoff parameters
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+ // wait as dictated by the client rate lmiter
+ "RateLimiter.Wait",
+
+ // B: 2nd attempt: retry, and this should return a status code=200
+ "Client.Do",
+
+ // it's a success, so do the following:
+ // - call metrics and update backoff parameters
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ (1 * time.Second).String(), // from 'Retry-After: 1' response header
+ (2 * time.Minute).String(), // backoff.Sleep before retry 1 (B)
+ },
+ statusCodes: []string{
+ "500",
+ "500",
+ },
+ },
+ },
+ },
+ {
+ name: "do not retry on network errors",
+ maxRetries: 2,
+ serverReturns: []responseErr{
+ {response: nil, err: fmt.Errorf("network error")},
+ },
+ expectations: map[string]expected{
+ "Do": {
+ attempts: 1,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // ???
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ },
+ statusCodes: []string{"<error>"},
+ },
+ "Watch": {
+ attempts: 1,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait, as dictated by the backoff manager
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // ???
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ },
+ statusCodes: []string{"<error>"}},
+ "Stream": {
+ attempts: 1,
+ order: []string{
+ // before we send the request to the server:
+ // - we wait as dictated by the client rate lmiter
+ // - we wait, as dictated by the backoff manager
+ "RateLimiter.Wait",
+ "BackoffManager.CalculateBackoff",
+ "BackoffManager.Sleep",
+
+ // A: first attempt for which the server sends a retryable response
+ "Client.Do",
+
+ // ???
+ "RequestResult.Increment",
+ "BackoffManager.UpdateBackoff",
+ },
+ sleeps: []string{
+ "0s", // initial backoff.Sleep before we send the request to the server for the first time
+ },
+ statusCodes: []string{"<error>"}},
+ },
+ },
}
for _, test := range tests {
@@ -3149,11 +3518,11 @@ func testRetryWithRateLimiterBackoffAndMetrics(t *testing.T, key string, doFunc
if !cmp.Equal(expected.order, interceptor.order) {
t.Errorf("%s: Expected order of calls to match, diff: %s", key, cmp.Diff(expected.order, interceptor.order))
}
- if !cmp.Equal(sleepExpected, interceptor.sleeps) {
- t.Errorf("%s: Expected order of calls to match, diff: %s", key, cmp.Diff(sleepExpected, interceptor.sleeps))
+ if !cmp.Equal(expected.sleeps, interceptor.sleeps) {
+ t.Errorf("%s: Expected order of calls to match, diff: %s", key, cmp.Diff(expected.sleeps, interceptor.sleeps))
}
- if !cmp.Equal(statusCodesExpected, interceptor.statusCodes) {
- t.Errorf("%s: Expected status codes to match, diff: %s", key, cmp.Diff(statusCodesExpected, interceptor.statusCodes))
+ if !cmp.Equal(expected.statusCodes, interceptor.statusCodes) {
+ t.Errorf("%s: Expected status codes to match, diff: %s", key, cmp.Diff(expected.statusCodes, interceptor.statusCodes))
}
})
} is the behavior of "failure after exhausting the retries" correct?
|
@aojea we have dedicated tests for these above scenarios, this PR adds a new test only to validate on backoff, rate limiter and metric call invocation. |
/triage accepted |
ah ok, I though it would be good to validate that combo on the different scenarios If there is no need it LGTM then |
6f878de
to
f6a66bb
Compare
/test pull-kubernetes-e2e-kind |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold - for reacting on my comments
"BackoffManager.Sleep", | ||
// sleep for delay dictated by backoff parameters | ||
"BackoffManager.CalculateBackoff", | ||
"BackoffManager.Sleep", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised too, but I think that one sleep is the "retry-after" sleep and the other the actual backoff
(1 * time.Second).String(),
// backoff.Sleep before retry 1 (B)
(2 * time.Minute).String(),
Line 3045
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've found it
kubernetes/staging/src/k8s.io/client-go/rest/with_retry.go
Lines 171 to 175 in 296bf4f
klog.V(4).Infof("Got a Retry-After %s response for attempt %d to %v", retryAfter.Wait, retryAfter.Attempt, url) | |
if backoff != nil { | |
backoff.Sleep(retryAfter.Wait) | |
} | |
return nil |
why do we have to sleep on the retryAfter only if backoff exists?
are not both Sleeps independent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtek-t today, with a default client-go configuration of a noBackoff
, it's always BackoffManager.Sleep(0)
. I think when #106272 merges we can remove the sleep from Retry-After
response.
I am working on a refactor PR where I am trying to put the backoff, rate limiter, and metric calls in a common site so Do
, Watch
, and Stream
can use re-use them. I have added a TODO to remove it once #106272 merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#106272 merges we can remove the sleep from Retry-After response.
I don't think we can just simply remove it - we should respect what kube-apiserver is returning to us (it's always returning 1 now, but we should address that too at some point).
So I think it's not abot removing - it's rather about unifying them and making something like:
BackOffManager.Sleep(max(backoffManager.CalculateBackoff(), retryAfter))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is more forward looking, i will bake that in my next PR
}, | ||
"Watch": { | ||
attempts: 2, | ||
// Watch does not do 'RateLimiter.Wait' before initially sending the request to the server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another thing that seems strange to me - if it's not throttling before the first request, why it is throttling before retries?
@tkashem - can you please maybe open an issue with those inonsistencies so that we can discuss (and fix assuming I'm not missing something subtle) those things there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will open an issue with this, thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tkashem, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This unit test verifies that
Do
,DoRaw
,Stream
, andWatch
invoke theflowcontrol.RateLimiter
,BackoffManager
, andmetrics.ResultMetric
appropriately and in right order for retry.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: