Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: reduce Framer ReadFrame allocations #18502

Closed
apolcyn opened this issue Jan 3, 2017 · 8 comments
Closed

x/net/http2: reduce Framer ReadFrame allocations #18502

apolcyn opened this issue Jan 3, 2017 · 8 comments

Comments

@apolcyn
Copy link

@apolcyn apolcyn commented Jan 3, 2017

Please answer these questions before submitting your issue. Thanks!

This is a performance-related issue/proposal for the net/http2 library in https://github.com/golang/net/tree/master/http2.

What version of Go are you using (go version)?

1.8beta2

What operating system and processor architecture are you using (go env)?

amd64

I'm running "grpc-go" (https://github.com/grpc/grpc-go) micro-benchmarks (grpc-go uses only the "framer" from the net/http2 library). Specifically I'm looking at a benchmarks that tests "grpc streaming" throughput, with a couple of clients against one 32 core server.

Details on this benchmark setup: the server has a total of 64 tcp/http2 connections, with 100 long-lived http2 streams over each http2 connection. A streaming "round-trip" is a grpc request-response of about 10 bytes, each fitting into one data frame. The server is ran with a 5 second warmup and a 30 second benchmark, during which it does somewhere around 900K round trips per second.

After multiple changes that reduce memory allocations elsewhere, the memory "alloc_space" profile of the server after running this benchmark looks like:

2139.58MB of 2178.13MB total (98.23%)
Dropped 69 nodes (cum <= 10.89MB)
      flat  flat%   sum%        cum   cum%
 1436.57MB 65.95% 65.95%  1436.57MB 65.95%  golang.org/x/net/http2.parseDataFrame
  473.51MB 21.74% 87.69%   473.51MB 21.74%  google.golang.org/grpc.protoCodec.Marshal
  225.50MB 10.35% 98.05%   225.50MB 10.35%  google.golang.org/grpc/transport.(*http2Server).handleData
    2.50MB  0.11% 98.16%   494.02MB 22.68%  google.golang.org/grpc/benchmark.(*testServer).StreamingCall
       1MB 0.046% 98.21%   495.02MB 22.73%  google.golang.org/grpc.(*Server).processStreamingRPC
    0.50MB 0.023% 98.23%   477.01MB 21.90%  google.golang.org/grpc.(*serverStream).SendMsg
         0     0% 98.23%  1442.07MB 66.21%  golang.org/x/net/http2.(*Framer).ReadFrame
         0     0% 98.23%  1678.60MB 77.07%  google.golang.org/grpc.(*Server).handleRawConn
         0     0% 98.23%   495.02MB 22.73%  google.golang.org/grpc.(*Server).handleStream
         0     0% 98.23%  1677.60MB 77.02%  google.golang.org/grpc.(*Server).serveHTTP2Transport
         0     0% 98.23%  1676.57MB 76.97%  google.golang.org/grpc.(*Server).serveStreams
         0     0% 98.23%   495.02MB 22.73%  google.golang.org/grpc.(*Server).serveStreams.func1.1
         0     0% 98.23%   473.51MB 21.74%  google.golang.org/grpc.(*protoCodec).Marshal
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc.(*serverStream).RecvMsg
         0     0% 98.23%   473.51MB 21.74%  google.golang.org/grpc.encode
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc.recv
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).RecvMsg
         0     0% 98.23%   477.01MB 21.90%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).Send
         0     0% 98.23%   494.02MB 22.68%  google.golang.org/grpc/benchmark/grpc_testing._BenchmarkService_StreamingCall_Handler
         0     0% 98.23%  1442.07MB 66.21%  google.golang.org/grpc/transport.(*framer).readFrame
         0     0% 98.23%  1676.57MB 76.97%  google.golang.org/grpc/transport.(*http2Server).HandleStreams
         0     0% 98.23%  2173.63MB 99.79%  runtime.goexit

Note golang.org/x/net/http2.parseDataFrame appears to be allocating a new DataFrame struct per-data frame, which ends up being the largest source of allocations. (alloc appears to be from https://github.com/golang/net/blob/master/http2/frame.go#L577)

Also, from the total "alloc_objects" profile, golang.org/x/net/http2.parseDataFrame appears to account for about 30M of the total ~80M object allocations.

Experimenting with a code change that repeatedly returns the same DataFrame struct instead of creating new ones:

  • memory allocations from golang.org/x/net/http2.parseDataFrame dissappear,
  • total memory allocated by the benchmark goes down ~1.5GB to about 740MB
  • QPS in the grpc-microbenchmark increases about 5%

The current http2 framer returns a slice on "data frame reads" that's only valid until the next call to ReadFrame. I'm wondering if similar semantics for the entire DataFrame struct sound reasonable, or possibly an option to turn this on.

I can give more details on the benchmark setup if needed.

thanks

@apolcyn

This comment has been minimized.

Copy link
Author

@apolcyn apolcyn commented Jan 3, 2017

@bradfitz bradfitz added this to the Unreleased milestone Jan 3, 2017
@bradfitz bradfitz added the Performance label Jan 3, 2017
@bradfitz bradfitz changed the title per-data-frame allocation overhead due to creation of new `DataFrame` structs x/net/http2: reduce Framer ReadFrame allocations Jan 3, 2017
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Jan 3, 2017

The Framer could conditionally (opt-in) keep a reference to each of concrete types of frames and reuse them if the caller wants.

@RLH

This comment has been minimized.

Copy link
Contributor

@RLH RLH commented Jan 3, 2017

@apolcyn

This comment has been minimized.

Copy link
Author

@apolcyn apolcyn commented Jan 3, 2017

The application is running on a 32 core machine. How much RAM does the
machine have? I ask because 2 GBytes of allocation over 30 seconds does not
seem like a lot for a 32 core machine.

There's about 30GB on this machine.
Other changes to the server outside of this one bring it down from a higher amount (~15GB to ~2GB here), but have still been seeing steady benchmark improvements with general reducing of allocations.

Could you run it with GODEBUG=gctrace=1 and post the output?

Yeah capturing the gc trace from the server's whole life shows: https://gist.github.com/apolcyn/2f19a4230ed65c2b1738c98ee978a8c2

note in this log, the "reset" on lines 30 and 31 just separates the warmup and benchmark periods.

@apolcyn

This comment has been minimized.

Copy link
Author

@apolcyn apolcyn commented Jan 3, 2017

The Framer could conditionally (opt-in) keep a reference to each of concrete types of frames and reuse them if the caller wants.

This would work well for me! I can create a PR

@RLH

This comment has been minimized.

Copy link
Contributor

@RLH RLH commented Jan 3, 2017

@apolcyn

This comment has been minimized.

Copy link
Author

@apolcyn apolcyn commented Jan 4, 2017

fyi just submitted PR with "Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8"

(https://go-review.googlesource.com/#/c/34812/)

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jan 4, 2017

CL https://golang.org/cl/34812 mentions this issue.

apolcyn added a commit to apolcyn/grpc that referenced this issue Jan 5, 2017
The existing Framer in net/http2 allocates a new DataFrame struct
for each DataFrame read on calls to ReadFrame. On long-lived http2
streams that use lots of small data frames, these DataFrame structs
can be a significant source of memory allocation. The
SetReuseFrame option introduced here, if set on a Framer, allows the
Framer to reuse Frame objects and changes the ReadFrame API
so that returned Frame objects are only valid until the next call
to ReadFrame. This opt-in API now only implements reuse of DataFrames,
but it allows the Framer to reuse of any type of Frame object.

The footprint caused by creation of new DataFrame structs per data
frame was noticed in micro benchmarks of "gRPC" server "streaming
throuhgput", which uses the Framer in this package. After other
experimental changes to the benchmark that reduce allocations,
DataFrame structs appeared to a be significant source of allocations.
In this benchmark, 6,400 concurrent http2 streams on the server
did a total of roughly 900,000 round trips per second (each round
trip involves request and response http2 data frames with payloads of about
10 bytes each). Over a roughly 35 second benchmark period, this caused
slightly more than 30 million allocations of DataFrame structs, totalling
about 1.4 GB of allocated space. Reusing DataFrames in the Framer eliminated these
allocations and was found to improve the benchmark's throughput by about
5%.

Fixes golang/go#18502

Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8
@golang golang locked and limited conversation to collaborators Feb 27, 2018
c3mb0 pushed a commit to c3mb0/net that referenced this issue Apr 2, 2018
The existing Framer in net/http2 allocates a new DataFrame struct
for each DataFrame read on calls to ReadFrame. The SetReuseFrame
option introduced here, if set on a Framer, allows the
Framer to reuse Frame objects and changes the ReadFrame API
so that returned Frame objects are only valid until the next call
to ReadFrame. This opt-in API now only implements reuse of DataFrames,
but it allows the Framer to reuse of any type of Frame.

The footprint caused by creation of new DataFrame structs per data
frame was noticed in micro benchmarks of "gRPC" server "streaming
throuhgput", which uses the Framer in this package. This benchmark
happened to use long lived http2 streams that do client-server "ping-pong"
requests with small data frames, and DataFrames were seen to be a
significant source of allocations.

Running local benchmarks with: (from x/net/http2 directory)

$ go test -run=^$ -bench=BenchmarkServerToClientStream

example output:
* expect an alloc reduction of at least 1 and a small memory reduction between
"BenchmarkServerToClientStreamDefaultOptions" and
"BenchmarkServerToClientStreamReuseFrames"

BenchmarkServerToClientStreamDefaultOptions-12    	   30000
46216 ns/op	     971 B/op	      17 allocs/op
BenchmarkServerToClientStreamReuseFrames-12       	   30000
44952 ns/op	     924 B/op	      16 allocs/op

Fixes golang/go#18502

Change-Id: Iad93420ef6c3918f54249d867098f1dadfa324d8
Reviewed-on: https://go-review.googlesource.com/34812
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.