Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport.RoundTrip should return a non-nil for failure to obtain a response. #92217

Merged

Conversation

p0lyn0mial
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind bug

What this PR does / why we need it: In general RoundTrip should not attempt to interpret the response. In particular, RoundTrip must return err == nil if it obtained a response, regardless of the response's HTTP status code. A non-nil err should be reserved for failure to obtain a response.

This allows higher-level layers to interpret the err and construct the final response to the client.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 17, 2020
@p0lyn0mial
Copy link
Contributor Author

/assign @sttts

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 17, 2020
@p0lyn0mial p0lyn0mial force-pushed the proxy-transport-pass-err branch 2 times, most recently from b018046 to 212f3cd Compare June 17, 2020 12:57
@sttts sttts added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 18, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 18, 2020
@p0lyn0mial
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 23, 2020
@sttts
Copy link
Contributor

sttts commented Jun 23, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@@ -261,7 +261,7 @@ func TestProxyHandler(t *testing.T) {
},
},
},
expectedStatusCode: http.StatusServiceUnavailable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still expect this to surface a 503 error to the caller

/hold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular test case fails due to incorrect APIService configuration, that leads to error trying to reach service: x509: certificate is valid for test-service.test-ns.svc, not ..svc. It sounds like a configuration error because at this point we don't know if the server we are trying to reach is available.

Copy link
Member

@liggitt liggitt Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a consistency perspective, if the periodic healthcheck for the APIService encountered this cert error, it would flip the APIService to unavailable, and this code would return a 503:

if !handlingInfo.serviceAvailable {
proxyError(w, req, "service unavailable", http.StatusServiceUnavailable)
return
}

It's unclear to me why the same error encountered by the proxy synchronously would surface a different error code to the requesting user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to preserve the old behaviour we could change the default responder https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go#L240 to return http.StatusServiceUnavailable, are you okay with that?

it seems that it's only used by the proxy - https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go#L178

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Now the default responder will return StatusServiceUnavailable, this aligns with the old behavior of the Transport. It looks like it safe to do so because the default responder wasn't wired before (https://github.com/kubernetes/kubernetes/pull/92217/files#diff-16a26bb0d342318c6eb3d03e6c5756e9R239)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems better to me, thanks. will let @sttts ack before merge

/hold cancel

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2020
@p0lyn0mial
Copy link
Contributor Author

/test pull-kubernetes-integration

// if an optional error interceptor/responder was provided wire it
// the custom responder might be used for providing a unified error reporting
// or supporting retry mechanisms by not sending non-fatal errors to the clients
proxy.ErrorHandler = h.Responder.Error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt FYI I'm going to use it in #92291 which introduces a retry mechanism for the aggregator. PTAL if you are interested.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2020
@sttts
Copy link
Contributor

sttts commented Jul 1, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2020
@k8s-ci-robot k8s-ci-robot merged commit c53ce48 into kubernetes:master Jul 1, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants