New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy request in timeout handler #108455
Copy request in timeout handler #108455
Conversation
Welcome @Argh4k! |
Hi @Argh4k. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -93,6 +93,10 @@ func (t *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
resultCh := make(chan interface{}) | |||
var tw timeoutWriter | |||
tw, w = newTimeoutWriter(w) | |||
|
|||
//Make a copy of request and work on it in new goroutine | |||
//to avoid race condition when accessing/modifying request headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are request headers the only problematic case?
If so, maybe we can shallow copy request and deep clone headers only to save few bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now we believe only headers are problematic, but I can imagine (e.g. in future) some handler modifying any other field in http.Request.
I think it's reasonable to clone entire object, but I would change the comment to say something like "to avoid races when accessing/modifying http.Request (e.g. request headers)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would rather make deep copy of all request metadata. This way we ensure that further changes to filter that precede timeout handler probably won't cause race condition. I think there is very low probability that any filter will touch body of the request, which is not deep copied here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit.
Otherwise LGTM.
The potential issue with this solution is that it blocks propagation of request's headers changes to filters that are above WithTimeoutForNonLongRunningRequests. It was already possible that the request timed out before the changes were made (which generated this race), but now it happens for all requests.
I tried to skim the code around and I wasn't able to find any practical problem with this now. This may lead to some potential issues in future if e.g. someone implements a filter A that changes request and expect this change to be propagated to another filter B (with WithTimeoutForNonLongRunningRequests in the middle), but:
- It's not clear to me if we will ever need to implement this kind of filters
- It will be easy to catch during testing (no request will have that request change propagated)
- Even without this change, it was possible that we never reach filter B (e.g. due to timeout), so any implementation of filter A that expected request change of B was simply invalid in general case. Now we block such changes which makes this more obvious.
So for me it looks good, but I would like to hear opinion of more senior folks about this change: @wojtek-t @liggitt
staging/src/k8s.io/apiserver/pkg/server/filters/timeout_test.go
Outdated
Show resolved
Hide resolved
what if instead of passing the clone to the handlers we use the clone for the timeout handler instead, and we pass the original request? |
/assign @tkashem |
The most common way in golang I have seen in other libraries is to simply use a single goroutine and use context cancellation for timeout. There is some discussion around that in #105884 but overall it looks like e.g. Write operation doesn't use context which makes it hard to switch to this approach now: #105884 (comment) Long term it looks like a way to go, but we will need to either to have a golang's support for Write with context support or workaround it somehow. |
It seems that all options were explored and this is the least bad option :) |
Added a release-note - we should cherrypick it once this is merged. /lgtm |
/approve |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Argh4k, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
…55-upstream-release-1.21 Automated cherry pick of #108455: Copy request in timeout handler
…55-upstream-release-1.22 Automated cherry pick of #108455: Copy request in timeout handler
…55-upstream-release-1.23 Automated cherry pick of #108455: Copy request in timeout handler
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes race condition:
withLogging
filter where respLogger hold pointer to original Requestkubernetes/staging/src/k8s.io/apiserver/pkg/server/httplog/httplog.go
Lines 124 to 127 in 88f9728
CleanVerb
that can access header mapkubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go
Line 572 in 88f9728
kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/filters/authentication.go
Line 77 in 88f9728
We can also get to the
CleanVerb
function bypostTimeoutFn()
called by timeout handler in case of timeout:kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go
Line 141 in 88f9728
This function looks like this:
kubernetes/staging/src/k8s.io/apiserver/pkg/server/filters/timeout.go
Lines 55 to 57 in 88f9728
And calls
cleanVerb
(that internally callsCleanVerb
):kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go
Line 420 in 88f9728
Those are spots that I've found, but there might be even more of those.
Test cases added in this pull request when run without fix and with
-race
flag, crash with race condition detected. In case ofTestTimeoutWithLogging
passing--args -v=3
is also necessary to force logging.Which issue(s) this PR fixes:
Fixes #