-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: window updates on randomWriteScheduler after stream closed cause memory leaks #33812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've run the gRPC server through a full traffic cycle with the Here's a suggested fix to the leak:
|
Sorry to harp on this issue, but quick ping on this @bradfitz (now that you're back from leave - congrats btw!) |
@jared2501, thanks for the great bug report & diagnosis! @gopherbot, please backport to Go 1.13. /cc @tombergan |
Backport issue(s) opened: #34636 (for 1.13). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/198462 mentions this issue: |
@jared2501, can you try out https://golang.org/cl/198462 and report back about whether it helps or not? And any performance numbers if that's easy? Also, code reviews welcome from anybody. |
Change https://golang.org/cl/198617 mentions this issue: |
In certain shutdown cases (from the client and/or server), the http2 Server can Push stream-specific frames on closed streams. This caused memory leaks in the random write scheduler. As a conservative fix for backporting, just clear the map element whenever its queue value is empty. The map entry is re-created as needed anyway. This isn't perfectly ideal (it adds a map+delete and free queue put+get) in the case where a stream is open & actively writing, but it's an easy fix for now. A future CL can optimize all this code. It looks like there are some other good optimization opportunities in related code anyway. But I'd rather that happen on master and not be done in a backported change. Updates golang/go#34636 (needs bundle to std before fixed) Updates golang/go#33812 Change-Id: I21508ba2ebc361e8b8532d0d1cebf882e82c473c Reviewed-on: https://go-review.googlesource.com/c/net/+/198462 Reviewed-by: Bryan C. Mills <bcmills@google.com> (cherry picked from commit d98b1b4) Reviewed-on: https://go-review.googlesource.com/c/net/+/198617 Reviewed-by: Andrew Bonventre <andybons@golang.org> Run-TryBot: Andrew Bonventre <andybons@golang.org>
Change https://golang.org/cl/198897 mentions this issue: |
…k fix This re-runs go generate with x/net checked out at CL 198617 on the release-branch.go1.13 branch for: [release-branch.go1.13] http2: fix memory leak in random write scheduler Fixes #34636 Updates #33812 Change-Id: Ibce630c6c7ffe43ff760d2ad40b44731c07ba870 Reviewed-on: https://go-review.googlesource.com/c/go/+/198897 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>
@bradfitz change looks like a reasonable fix to me. Thanks! Will try and find a chance to run it some time next week. I'll see what I can do re perf numbers, but I suspect we don't have good enough instrumentation to capture any difference here. Will include them if there is :) |
Change https://golang.org/cl/200102 mentions this issue: |
Updates x/net/http2 to git rev d66e71096ffb9f08f36d9aefcae80ce319de6d68 http2: end stream eagerly after sending the request body https://golang.org/cl/181157 (fixes #32254) all: fix typos https://golang.org/cl/193799 http2: fix memory leak in random write scheduler https://golang.org/cl/198462 (fixes #33812) http2: do not sniff body if Content-Encoding is set https://golang.org/cl/199841 (updates #31753) Also unskips tests from CL 199799. Change-Id: I241c0b1cd18cad5041485be92809137a973e33bd Reviewed-on: https://go-review.googlesource.com/c/go/+/200102 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
In certain shutdown cases (from the client and/or server), the http2 Server can Push stream-specific frames on closed streams. This caused memory leaks in the random write scheduler. As a conservative fix for backporting, just clear the map element whenever its queue value is empty. The map entry is re-created as needed anyway. This isn't perfectly ideal (it adds a map+delete and free queue put+get) in the case where a stream is open & actively writing, but it's an easy fix for now. A future CL can optimize all this code. It looks like there are some other good optimization opportunities in related code anyway. But I'd rather that happen on master and not be done in a backported change. Fixes golang/go#33812 Change-Id: I21508ba2ebc361e8b8532d0d1cebf882e82c473c Reviewed-on: https://go-review.googlesource.com/c/net/+/198462 Reviewed-by: Bryan C. Mills <bcmills@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What did you do? What did you expect to see? What did you see instead?
I've been tracing down some issues in a creeping response times in a gRPC server that uses HTTP/2.
The big jump downwards is when I restart the server (purple=median, blue=p95, yellow=max response times).
After much poking and logging, I've traced it down to the entries in the
randomWriteScheduler.sq
growing, and not being garbage collected. What I believe is happening is that RST_STREAM frames are being sent on closed streams, which causes therandomWriteScheduler
to allocate new entries in therandomWriteScheduler.sq
map (see code here). The RST_STREAM frames will get queued and emitted byWriteScheduler.Pop()
eventually, however since the stream has already been closed, they will never get deleted from therandomWriteScheduler.sq
map sincerandomWriteScheduler.CloseStream
will never be called, which causes this map to grow unbounded.The
priorityWriteScheduler
handles this case by not allocating new nodes, but instead attaching RST_STREAM frames for closed streams to it's root node (in fact, it has a nice comment explaining this logic here).I can look at providing a minimum repro of the issue if required, but it's a bit tricky to isolate since it requires delay in the stream to reproduce. Hopefully, the difference in behaviour between the
randomWriteScheduler
and thepriorityWriteScheduler
, and reading the code, is enough to identify & fix the issue here.The text was updated successfully, but these errors were encountered: