-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: reduce Framer ReadFrame allocations #18502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @iamqizhao |
DataFrame
structs
The Framer could conditionally (opt-in) keep a reference to each of concrete types of frames and reuse them if the caller wants. |
The application is running on a 32 core machine. How much RAM does the
machine have? I ask because 2 GBytes of allocation over 30 seconds does not
seem like a lot for a 32 core machine.
Could you run it with GODEBUG=gctrace=1 and post the output?
…On Tue, Jan 3, 2017 at 1:37 PM, apolcyn ***@***.***> wrote:
cc @iamqizhao <https://github.com/iamqizhao>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18502 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn-zGdgTVJ9VSWdsUX1X4rN1gotqJks5rOpV_gaJpZM4LZ4p7>
.
|
There's about 30GB on this machine.
Yeah capturing the gc trace from the server's whole life shows: https://gist.github.com/apolcyn/2f19a4230ed65c2b1738c98ee978a8c2 note in this log, the "reset" on lines 30 and 31 just separates the warmup and benchmark periods. |
This would work well for me! I can create a PR |
The GC is keeping the heap at a modest 45 Mbytes. The trace says 0%
overhead for the GC but that is rounded so it really means < 1%. Latency in
the warmup is a bit high but still < 9ms and latency during the actual run
is between 400 usecs and 1300 usecs. The GC seems to be working as designed.
Thanks for the trace.
…On Tue, Jan 3, 2017 at 5:00 PM, apolcyn ***@***.***> wrote:
The Framer could conditionally (opt-in) keep a reference to each of
concrete types of frames and reuse them if the caller wants.
This would work well for me! I can create a PR
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18502 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn6DazkjNpqQQiEQJnHW_MmJGVdyiks5rOsUagaJpZM4LZ4p7>
.
|
fyi just submitted PR with "Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8" |
CL https://golang.org/cl/34812 mentions this issue. |
The existing Framer in net/http2 allocates a new DataFrame struct for each DataFrame read on calls to ReadFrame. On long-lived http2 streams that use lots of small data frames, these DataFrame structs can be a significant source of memory allocation. The SetReuseFrame option introduced here, if set on a Framer, allows the Framer to reuse Frame objects and changes the ReadFrame API so that returned Frame objects are only valid until the next call to ReadFrame. This opt-in API now only implements reuse of DataFrames, but it allows the Framer to reuse of any type of Frame object. The footprint caused by creation of new DataFrame structs per data frame was noticed in micro benchmarks of "gRPC" server "streaming throuhgput", which uses the Framer in this package. After other experimental changes to the benchmark that reduce allocations, DataFrame structs appeared to a be significant source of allocations. In this benchmark, 6,400 concurrent http2 streams on the server did a total of roughly 900,000 round trips per second (each round trip involves request and response http2 data frames with payloads of about 10 bytes each). Over a roughly 35 second benchmark period, this caused slightly more than 30 million allocations of DataFrame structs, totalling about 1.4 GB of allocated space. Reusing DataFrames in the Framer eliminated these allocations and was found to improve the benchmark's throughput by about 5%. Fixes golang/go#18502 Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8
The existing Framer in net/http2 allocates a new DataFrame struct for each DataFrame read on calls to ReadFrame. The SetReuseFrame option introduced here, if set on a Framer, allows the Framer to reuse Frame objects and changes the ReadFrame API so that returned Frame objects are only valid until the next call to ReadFrame. This opt-in API now only implements reuse of DataFrames, but it allows the Framer to reuse of any type of Frame. The footprint caused by creation of new DataFrame structs per data frame was noticed in micro benchmarks of "gRPC" server "streaming throuhgput", which uses the Framer in this package. This benchmark happened to use long lived http2 streams that do client-server "ping-pong" requests with small data frames, and DataFrames were seen to be a significant source of allocations. Running local benchmarks with: (from x/net/http2 directory) $ go test -run=^$ -bench=BenchmarkServerToClientStream example output: * expect an alloc reduction of at least 1 and a small memory reduction between "BenchmarkServerToClientStreamDefaultOptions" and "BenchmarkServerToClientStreamReuseFrames" BenchmarkServerToClientStreamDefaultOptions-12 30000 46216 ns/op 971 B/op 17 allocs/op BenchmarkServerToClientStreamReuseFrames-12 30000 44952 ns/op 924 B/op 16 allocs/op Fixes golang/go#18502 Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8 Reviewed-on: https://go-review.googlesource.com/34812 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Please answer these questions before submitting your issue. Thanks!
This is a performance-related issue/proposal for the
net/http2
library in https://github.com/golang/net/tree/master/http2.What version of Go are you using (
go version
)?1.8beta2
What operating system and processor architecture are you using (
go env
)?amd64
I'm running "grpc-go" (https://github.com/grpc/grpc-go) micro-benchmarks (grpc-go uses only the "framer" from the
net/http2
library). Specifically I'm looking at a benchmarks that tests "grpc streaming" throughput, with a couple of clients against one 32 core server.Details on this benchmark setup: the server has a total of 64 tcp/http2 connections, with 100 long-lived http2 streams over each http2 connection. A streaming "round-trip" is a grpc request-response of about 10 bytes, each fitting into one data frame. The server is ran with a 5 second warmup and a 30 second benchmark, during which it does somewhere around 900K round trips per second.
After multiple changes that reduce memory allocations elsewhere, the memory "alloc_space" profile of the server after running this benchmark looks like:
Note
golang.org/x/net/http2.parseDataFrame
appears to be allocating a newDataFrame
struct per-data frame, which ends up being the largest source of allocations. (alloc appears to be from https://github.com/golang/net/blob/master/http2/frame.go#L577)Also, from the total "alloc_objects" profile,
golang.org/x/net/http2.parseDataFrame
appears to account for about 30M of the total ~80M object allocations.Experimenting with a code change that repeatedly returns the same
DataFrame
struct instead of creating new ones:golang.org/x/net/http2.parseDataFrame
dissappear,The current http2 framer returns a slice on "data frame reads" that's only valid until the next call to
ReadFrame
. I'm wondering if similar semantics for the entireDataFrame
struct sound reasonable, or possibly an option to turn this on.I can give more details on the benchmark setup if needed.
thanks
The text was updated successfully, but these errors were encountered: