Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upx/net/http2: receive buffer memory usage increases by 400x with HTTP2 #15930
Comments
adg
changed the title from
Receive buffer memory usage increases by 400x with HTTP2
to
net/http: receive buffer memory usage increases by 400x with HTTP2
Jun 2, 2016
adg
added this to the Go1.7 milestone
Jun 2, 2016
adg
added
the
NeedsInvestigation
label
Jun 2, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jacobsa
Jun 2, 2016
Contributor
Here is a minimal program to reproduce the issue. It downloads a 719 MB file from Google Cloud Storage with a loop that reads 1 MB every 5 ms (i.e. throughput should be up to 200 MB/s, likely bottlenecked on the network connection). This approximates what gcsfuse is doing, since fuse reads 128 KiB with high latency between calls.
The following runs are on a GCE VM in asia-east1-a running CentOS 7. I use the cputime tool to make it easier to see the OS's view of peak memory usage, but the GC trace also makes the difference clear.
% go version
go version go1.6.2 linux/amd64
% go get github.com/jacobsa/cputime
% go build -o /tmp/download_with_delay download_with_delay.go
With HTTP/2:
% GODEBUG=gctrace=1,http2client=1 cputime /tmp/download_with_delay
gc 1 @0.040s 1%: 0.26+0.90+0.31 ms clock, 0.52+0.21/0.77/0.82+0.62 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
gc 2 @0.290s 0%: 0.017+1.0+0.27 ms clock, 0.068+0.41/0.79/1.2+1.1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc 3 @0.416s 0%: 0.015+1.5+0.26 ms clock, 0.061+0.009/1.3/1.4+1.0 ms cpu, 9->9->6 MB, 10 MB goal, 4 P
gc 4 @0.430s 0%: 0.014+1.4+0.29 ms clock, 0.056+0/0.030/2.6+1.1 ms cpu, 14->14->10 MB, 15 MB goal, 4 P
gc 5 @0.446s 0%: 0.013+2.3+0.31 ms clock, 0.053+0/0.50/3.4+1.2 ms cpu, 25->25->18 MB, 26 MB goal, 4 P
gc 6 @0.509s 0%: 0.019+5.1+0.48 ms clock, 0.079+0/1.0/6.0+1.9 ms cpu, 49->49->33 MB, 50 MB goal, 4 P
gc 7 @0.575s 0%: 0.018+7.3+0.68 ms clock, 0.074+0/1.5/7.3+2.7 ms cpu, 96->96->65 MB, 97 MB goal, 4 P
gc 8 @0.725s 0%: 0.016+12+0.44 ms clock, 0.065+0/1.6/12+1.7 ms cpu, 191->191->128 MB, 192 MB goal, 4 P
gc 9 @1.335s 0%: 0.017+61+0.48 ms clock, 0.068+0/0.25/63+1.9 ms cpu, 381->381->380 MB, 382 MB goal, 4 P
gc 10 @2.009s 0%: 0.017+98+0.47 ms clock, 0.070+0/1.7/98+1.8 ms cpu, 885->885->506 MB, 886 MB goal, 4 P
11:28:30.849059 Copied 719208526 bytes in 3.940939993s (182496695.528853 bytes/s)
----------
Max RSS: 920.18 MiB
Without HTTP/2:
% GODEBUG=gctrace=1,http2client=0 cputime /tmp/download_with_delay
gc 1 @0.042s 1%: 0.071+0.66+0.28 ms clock, 0.21+0.30/0.58/0.56+0.86 ms cpu, 4->4->1 MB, 5 MB goal, 4 P
11:29:16.552639 Copied 719208526 bytes in 4.42289009s (162610535.501686 bytes/s)
----------
Max RSS: 12.95 MiB
In other words the HTTP/2 run used 71 times as much memory, and generated a whole lot of garbage while doing so. The difference in throughput was well within the run to run variation that I see on this VM. In contrast I've never seen a run without HTTP/2 that used more than 20 MB of memory.
|
Here is a minimal program to reproduce the issue. It downloads a 719 MB file from Google Cloud Storage with a loop that reads 1 MB every 5 ms (i.e. throughput should be up to 200 MB/s, likely bottlenecked on the network connection). This approximates what gcsfuse is doing, since fuse reads 128 KiB with high latency between calls. The following runs are on a GCE VM in asia-east1-a running CentOS 7. I use the
With HTTP/2:
Without HTTP/2:
In other words the HTTP/2 run used 71 times as much memory, and generated a whole lot of garbage while doing so. The difference in throughput was well within the run to run variation that I see on this VM. In contrast I've never seen a run without HTTP/2 that used more than 20 MB of memory. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jacobsa
Jun 2, 2016
Contributor
In other words, I'm now quite confident that this is a regression in the http package's management of memory for buffering. 900 MiB is enough to buffer over five seconds at that receive rate, which is longer than the entire program even took to run.
|
In other words, I'm now quite confident that this is a regression in the http package's management of memory for buffering. 900 MiB is enough to buffer over five seconds at that receive rate, which is longer than the entire program even took to run. |
jacobsa
referenced this issue
Jun 2, 2016
Closed
Possible Memory Leak with gcsfuse version 0.18.0 #173
added a commit
to GoogleCloudPlatform/gcsfuse
that referenced
this issue
Jun 2, 2016
added a commit
to GoogleCloudPlatform/gcsfuse
that referenced
this issue
Jun 2, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adg
Jun 3, 2016
Contributor
I think the issue is that the flow control doesn't take into account the bytes in clientStream.bufPipe.
This change brings the Max RSS of @jacobsa's example program down to ~40mb.
--- a/http2/transport.go
+++ b/http2/transport.go
@@ -1458,7 +1458,7 @@ func (b transportResponseBody) Read(p []byte) (n int, err error) {
connAdd = transportDefaultConnFlow - v
cc.inflow.add(connAdd)
}
- if err == nil { // No need to refresh if the stream is over or failed.
+ if err == nil && cs.bufPipe.b.Len() < transportDefaultStreamFlow {
if v := cs.inflow.available(); v < transportDefaultStreamFlow-transportDefaultStreamMinRefresh {
streamAdd = transportDefaultStreamFlow - v
cs.inflow.add(streamAdd)I'm not sure if this is the correct fix. We'll need to write a good test to be sure.
|
I think the issue is that the flow control doesn't take into account the bytes in This change brings the Max RSS of @jacobsa's example program down to ~40mb. --- a/http2/transport.go
+++ b/http2/transport.go
@@ -1458,7 +1458,7 @@ func (b transportResponseBody) Read(p []byte) (n int, err error) {
connAdd = transportDefaultConnFlow - v
cc.inflow.add(connAdd)
}
- if err == nil { // No need to refresh if the stream is over or failed.
+ if err == nil && cs.bufPipe.b.Len() < transportDefaultStreamFlow {
if v := cs.inflow.available(); v < transportDefaultStreamFlow-transportDefaultStreamMinRefresh {
streamAdd = transportDefaultStreamFlow - v
cs.inflow.add(streamAdd)I'm not sure if this is the correct fix. We'll need to write a good test to be sure. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adg
Jun 3, 2016
Contributor
Note this comment:
// transportDefaultStreamFlow is how many stream-level flow
// control tokens we announce to the peer, and how many bytes
// we buffer per stream.
transportDefaultStreamFlow = 4 << 20It's not actually true with the current behavior. A change like the one above is required to make this the comment accurate.
A separate discussion is the size of that buffer. It should probably be smaller.
|
Note this comment: // transportDefaultStreamFlow is how many stream-level flow
// control tokens we announce to the peer, and how many bytes
// we buffer per stream.
transportDefaultStreamFlow = 4 << 20It's not actually true with the current behavior. A change like the one above is required to make this the comment accurate. A separate discussion is the size of that buffer. It should probably be smaller. |
adg
assigned
bradfitz
Jun 3, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
gopherbot
commented
Jun 6, 2016
|
CL https://golang.org/cl/23812 mentions this issue. |
adg
self-assigned this
Jun 6, 2016
gopherbot
closed this
in
golang/net@154d9f9
Jun 7, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@adg Thanks! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
artyom
Jun 7, 2016
Contributor
@adg this fix introduces race between Len() call on buffer and Write() call from another goroutine.
|
@adg this fix introduces race between Len() call on buffer and Write() call from another goroutine. |
jacobsa commentedJun 2, 2016
In GoogleCloudPlatform/gcsfuse#173 I discovered that building with Go 1.6.2 increases memory usage by a factor of 400 or so over Go 1.6, apparently due to using HTTP2 by default (406752b). This is on linux/amd64.
I think this is on behalf of the receive buffer allocated in
http2clientConnReadLoop.handleResponse. Elsewhere in that file I see that there is an attempt to use fixed-size buffers for outgoing requests (correct?), but this buffer appears to grow without bound, no matter whether the caller consumes the response body or not.On the other hand there seems to be connection-wide flow control updated in
http2clientConnReadLoop.processData. So I'm not sure that this is actually a problem.My question: is it a known issue that memory usage is much much higher when making requests with large responses over HTTP2? Is this RAM "well spent" in the sense that it's buying me more throughput?