New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: memory leak due to a large number of items in transport.reqCanceler #61708
Comments
Issue #51354 also claims leaks in reqCanceler under certain conditions If you get the chance try to reproduce the issue with this patch, I believe this is the fix the author of the other issue suggested:
|
I've created #61724 to fix #51354 The test diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index d30eb79508..fb3f9e168d 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -26,6 +26,7 @@ import (
"net/textproto"
"net/url"
"reflect"
+ "runtime"
"strings"
"sync"
"sync/atomic"
@@ -2248,6 +2249,9 @@ func (pc *persistConn) readLoop() {
}
case <-rc.req.Cancel:
alive = false
+ for _, v := range pc.t.reqCanceler {
+ fmt.Printf(">> %v\n", runtime.FuncForPC(reflect.ValueOf(v).Pointer()).Name())
+ }
pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
case <-rc.req.Context().Done():
alive = false outputs |
@mauri870 @AlexanderYastrebov Thanks a lot for your great work. I test the solution you mentioned above, but it doesn't work. Maybe my case is much more complicated. My team members and I read go net http source code, the memory leak could happen in src/net/http/transport.go:2230. // Before looping back to the top of this function and peeking on
// the bufio.Reader, wait for the caller goroutine to finish
// reading the response body. (or for cancellation or death)
select {
case bodyEOF := <-waitForBodyRead:
replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil) // before pc might return to idle pool
alive = alive &&
bodyEOF &&
!pc.sawEOF &&
pc.wroteRequest() &&
replaced && tryPutIdleConn(trace)
if bodyEOF {
eofc <- struct{}{}
}
case <-rc.req.Cancel:
alive = false
pc.t.CancelRequest(rc.req)
case <-rc.req.Context().Done():
alive = false
pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
case <-pc.closech: // memroy may leak here. pc closed, rc.cancelKey would never be removed.
alive = false
} there could be a unexpected case: case bodyEOF := <-waitForBodyRead is still blocked(the caller goroutine is reading the response body.), pconn is closed by somewhere. Thus, rc.cancelKey would never clean. pc.closech would be set here: src/net/http/transport.go:2716 which is called by src/net/http/transport.go:2696. There several callers call pc.close, which one could cause the problem? we guess that in src/net/http/transport.go:2418 calls pc.close(err). In a rare case(or special timings), there is a writeErr(http: invalid Read on closed) , the request body not completely send to server, meanwhile the caller goroutine is reading the response body, pc.close(err) is called(could it be possible?). pc.writeErrCh <- err // to the body reader, which might recycle us
wr.ch <- err // to the roundTrip function
if err != nil {
pc.close(err)
return
} In addition,we found a possible solution to solve this memory leak problem, bu we are not sure the impact(would it impact normal request?). Maybe it is safe to remove rc.cancelKey here, for pconn has already been closed. // Before looping back to the top of this function and peeking on
// the bufio.Reader, wait for the caller goroutine to finish
// reading the response body. (or for cancellation or death)
select {
case bodyEOF := <-waitForBodyRead:
replaced := pc.t.replaceReqCanceler(rc.cancelKey, nil) // before pc might return to idle pool
alive = alive &&
bodyEOF &&
!pc.sawEOF &&
pc.wroteRequest() &&
replaced && tryPutIdleConn(trace)
if bodyEOF {
eofc <- struct{}{}
}
case <-rc.req.Cancel:
alive = false
pc.t.CancelRequest(rc.req)
case <-rc.req.Context().Done():
alive = false
pc.t.cancelRequest(rc.cancelKey, rc.req.Context().Err())
case <-pc.closech: // memroy may leak here. pc closed, rc.cancelKey would never be removed.
alive = false
pc.t.replaceReqCanceler(rc.cancelKey, nil) // remove rc.cancelKey when pconn has been closed
} In out test, this change indeed help to solve the problem, but it is really ok or is there other better solutions? |
@hzzzzzb
I think you are right. func getUnexportedFieldLen(field reflect.Value) int {
return reflect.NewAt(field.Type(), unsafe.Pointer(field.UnsafeAddr())).Elem().Len()
}
func countReqCanceler(t *http.Transport) int {
return getUnexportedFieldLen(reflect.ValueOf(t).Elem().FieldByName("reqCanceler"))
}
...
reverseProxy.ErrorLog = log.New(io.Discard, "", 0)
var requests atomic.Int64
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
reverseProxy.ServeHTTP(w, r)
n := requests.Add(1)
if n%1000 == 0 {
fmt.Println("proxy: reqCanceler:", countReqCanceler(t))
}
})
... and the number grows even with #51354 fix but stops growing with case <-pc.closech:
alive = false
pc.t.cancelRequest(rc.cancelKey, errRequestCanceled)
// pc.t.replaceReqCanceler(rc.cancelKey, nil) // or this I've added a second channel to send close error, it is indeed ErrBodyReadAfterClose, I correlated it to Lines 2422 to 2424 in ed8cbaf
where it calls pc.close Lines 2444 to 2449 in ed8cbaf
I think the problem could be reverseproxy specific due to the way it works similar to #46866 I could not come up with a reproducer test yet that we'd need to make a proper fix.
|
Due to a race condition persistConn could be closed without removing request canceler. Note that without the fix test occasionally passes and to demonstrate the issue it has to be run multiple times, e.g. using -count=10. Fixes golang#61708
Change https://go.dev/cl/515796 mentions this issue: |
any update on this ? |
Due to a race condition persistConn could be closed without removing request canceler. Note that without the fix test occasionally passes and to demonstrate the issue it has to be run multiple times, e.g. using -count=10. Fixes golang#61708
This reverts commit 291a32a due to flaking test. Updates golang#61708
This reverts commit 291a32a due to flaking test. Updates golang#61708
Change https://go.dev/cl/522097 mentions this issue: |
This reverts CL 515796 due to a flaking test. Updates #61708 Fixes #62224 Change-Id: I53911a07677d08c3196daaaa2708269593baf472 GitHub-Last-Rev: 3544648 GitHub-Pull-Request: #62233 Reviewed-on: https://go-review.googlesource.com/c/go/+/522097 Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
Due to a race condition persistConn could be closed without removing request canceler. Note that without the fix test occasionally passes and to demonstrate the issue it has to be run multiple times, e.g. using -count=10. Fixes golang#61708 Change-Id: I9029d7d65cf602dd29ee1b2a87a77a73e99d9c92 GitHub-Last-Rev: 6b31f98 GitHub-Pull-Request: golang#61745 Reviewed-on: https://go-review.googlesource.com/c/go/+/515796 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Damien Neil <dneil@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 515796 due to a flaking test. Updates golang#61708 Fixes golang#62224 Change-Id: I53911a07677d08c3196daaaa2708269593baf472 GitHub-Last-Rev: 3544648 GitHub-Pull-Request: golang#62233 Reviewed-on: https://go-review.googlesource.com/c/go/+/522097 Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Than McIntosh <thanm@google.com>
…n.readLoop is still blocking will cause the request canceler never be removed. This would cause memory leak in http.Transport. Note that the write error could be either "http: invalid Read on closed" (issue 61708 case, req body was close by (*chunkWriter).writeHeader src/net/http/server.go:1405 while (*persistConn).writeLoop is still reading it. A reverseproxy specific problem.) or "http: ContentLength=1048576 with Body length 1" (PR unit test case). Without the fix: pending requests = 5; want 0. With the fix: ALL TESTS PASSED. Thanks the previous works done by @AlexanderYastrebov. Fixes golang#61708
Change https://go.dev/cl/522916 mentions this issue: |
A write error in pconn.writeLoop when channel waitForBodyRead in pconn.readLoop is still blocking will cause the request canceler never be removed. This would cause memory leak in http.Transport. Note that the write error could be either "http: invalid Read on closed" (issue 61708 case, req body was close by (*chunkWriter).writeHeader src/net/http/server.go:1405 while (*persistConn).writeLoop is still reading it. A reverseproxy specific problem.) or "http: ContentLength=1048576 with Body length 1" (PR unit test case). Without the fix: pending requests = 5; want 0. With the fix: ALL TESTS PASSED. Thanks the previous works done by @AlexanderYastrebov. Fixes golang#61708
writeLoop goroutine closes persistConn closech in case of request body write error which in turn finishes readLoop without removing request canceler. Fixes golang#61708
Change https://go.dev/cl/523296 mentions this issue: |
writeLoop goroutine closes persistConn closech in case of request body write error which in turn finishes readLoop without removing request canceler. Fixes golang#61708
writeLoop goroutine closes persistConn closech in case of request body write error which in turn finishes readLoop without removing request canceler. Fixes golang#61708
writeLoop goroutine closes persistConn closech in case of request body write error which in turn finishes readLoop without removing request canceler. Fixes golang#61708
net/http: memory leak due to a large number of items in transport.reqCanceler
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
When using http.ReverseProxy to build http ReverseProxy, We found memory leak in http.Transport in certain case.
There is a client , proxy and server as bellow, :
reproduce condition:
proxy code:
https://go.dev/play/p/dRzdohisHy5?v=goprev
server code:
https://go.dev/play/p/iyBuNbJWKVi?v=goprev
client code:
https://go.dev/play/p/7gPDzt9vrkw?v=goprev
What did you expect to see?
http.ReverseProxy perform well and has no memory leak.
What did you see instead?
proxy error logs:
PProf file shows that memory leak in transport:

dlv debug info shows the memory leak is caused by reqCanceler. 1900+ items in the reqCanceller map should be gc after we stop client, but after a long time, they are still there.

a sample request(proxy -> server) diagram of tcpdump.

Source code and some files for further debug:
transport-memory-leak.zip
The text was updated successfully, but these errors were encountered: