Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - POC - Read and close response body before seeking to retry #109028

Closed
wants to merge 1 commit into from

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Mar 25, 2022

What type of PR is this?

/kind bug
/kind failing-test

Which issue(s) this PR fixes:

xref #108906

Fixes a bug in our client retry logic that was not draining/closing the body prior to trying to reset the request body for a retry. According to go upstream, this is required to avoid races.

WIP because one place we call IsNextRetry currently depends on being able to read from the response body after trying the Seek(0,0) call. Not sure what to do about that.

NONE

/assign @smarterclayton

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 25, 2022
@liggitt liggitt added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. release-blocker and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 25, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 25, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2022
@liggitt liggitt added this to the v1.24 milestone Mar 25, 2022

// TODO: we have to do this *before* IsNextRetry or the response body could be drained/closed when we return ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the place I don't see a simple way to 1) drain the response body before Seeking to retry, 2) only call fn once, and 3) still retry if the seek fails

@liggitt
Copy link
Member Author

liggitt commented Mar 25, 2022

/approve cancel

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from smarterclayton after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2022
@@ -157,7 +158,7 @@ func (r *withRetry) IsNextRetry(ctx context.Context, restReq *Request, httpReq *
r.retryAfter.Wait = time.Duration(seconds) * time.Second
r.retryAfter.Reason = getRetryReason(r.attempts, seconds, resp, err)

if err := r.prepareForNextRetry(ctx, restReq); err != nil {
if err := r.prepareForNextRetry(ctx, restReq, closer); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have to close it always that we call r.prepareForNextRetry?

Copy link
Member

@aojea aojea Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can unconditionally close here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are several paths in prepareForNextRetry that don't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure? I see that at this point we don't have to do anything anymore with the response

	// if we are here, we have either a or b:
	//  a: we have a retryable error, for which we already
	//     have an artificial "Retry-After" response.
	//  b: we have a response from the server for which we
	//     need to check if it is retryable
	seconds, wait := checkWait(resp)
	if !wait {
		return false
	}

we are after the wait

return
}
type responseCloser struct {
once sync.Once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can close it multiple times, we are not doing this in parallel, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can do that , is a matter of Close() it on the right places, like in

func (r *Request) transformResponse(resp *http.Response, req *http.Request) Result {

that will solve https://github.com/kubernetes/kubernetes/pull/109028/files#r835508597

 func (r *Request) transformResponse(resp *http.Response, req *http.Request) Result {
        var body []byte
        if resp.Body != nil {
+               defer resp.Body.Close()
                data, err := ioutil.ReadAll(resp.Body)
                switch err.(type) {
                case nil:
diff --git a/staging/src/k8s.io/client-go/rest/with_retry.go b/staging/src/k8s.io/client-go/rest/with_retry.go
index 11b9b5225b1..4469f5a1b9b 100644
--- a/staging/src/k8s.io/client-go/rest/with_retry.go
+++ b/staging/src/k8s.io/client-go/rest/with_retry.go
@@ -157,7 +157,7 @@ func (r *withRetry) IsNextRetry(ctx context.Context, restReq *Request, httpReq *
        r.retryAfter.Wait = time.Duration(seconds) * time.Second
        r.retryAfter.Reason = getRetryReason(r.attempts, seconds, resp, err)
 
-       if err := r.prepareForNextRetry(ctx, restReq); err != nil {
+       if err := r.prepareForNextRetry(ctx, restReq, resp); err != nil {
                klog.V(4).Infof("Could not retry request - %v", err)
                return false
        }
@@ -172,7 +172,7 @@ func (r *withRetry) IsNextRetry(ctx context.Context, restReq *Request, httpReq *
 // - we need to seek to the beginning of the request body before we
 //   initiate the next retry, the function should return an error if
 //   it fails to do so.
-func (r *withRetry) prepareForNextRetry(ctx context.Context, request *Request) error {
+func (r *withRetry) prepareForNextRetry(ctx context.Context, request *Request, response *http.Response) error {
        if ctx.Err() != nil {
                return ctx.Err()
        }
@@ -180,6 +180,7 @@ func (r *withRetry) prepareForNextRetry(ctx context.Context, request *Request) e
        // Ensure the response body is fully read and closed before
        // we reconnect, so that we reuse the same TCP connection.
        if seeker, ok := request.body.(io.Seeker); ok && request.body != nil {
+               readAndCloseResponseBody(response)
                if _, err := seeker.Seek(0, 0); err != nil {
                        return fmt.Errorf("can't Seek() back to beginning of body for %T", request)
                }

@liggitt liggitt changed the title WIP - Read and close response body before seeking to retry WIP - POC - Read and close response body before seeking to retry Mar 25, 2022
@liggitt
Copy link
Member Author

liggitt commented Mar 25, 2022

I actually don't have bandwidth to finish this fix out if someone else does and can pick it up (or come up with a different approach)

@liggitt liggitt removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Mar 25, 2022
@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 25, 2022
@liggitt liggitt removed this from the v1.24 milestone Mar 25, 2022
@liggitt liggitt marked this pull request as draft March 25, 2022 19:17
@liggitt liggitt closed this Mar 25, 2022
@dims
Copy link
Member

dims commented Mar 25, 2022

cc @MadhavJivrajani

@cici37
Copy link
Contributor

cici37 commented Mar 29, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants