Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance regression when using ServeHTTP #586

Closed
tamird opened this issue Mar 4, 2016 · 25 comments
Closed

Severe performance regression when using ServeHTTP #586

tamird opened this issue Mar 4, 2016 · 25 comments
Labels
P3 Type: Performance Performance improvements (CPU, network, memory, etc)

Comments

@tamird
Copy link
Contributor

tamird commented Mar 4, 2016

Extracting from discussion in #549. See https://github.com/cockroachdb/rpc-bench; relevant parts as of this writing:

name                 time/op
GRPCServe_1K-4         72.6µs ±10%
GRPCServe_64K-4         785µs ± 8%
GRPCServeHTTP_1K-4      178µs ± 2%
GRPCServeHTTP_64K-4    1.32ms ± 2%

name                 speed
GRPCServe_1K-4       28.3MB/s ± 9%
GRPCServe_64K-4       167MB/s ± 7%
GRPCServeHTTP_1K-4   11.5MB/s ± 2%
GRPCServeHTTP_64K-4  99.4MB/s ± 2%

name                 alloc/op
GRPCServe_1K-4         15.6kB ± 0%
GRPCServe_64K-4         711kB ± 0%
GRPCServeHTTP_1K-4     88.5kB ± 0%
GRPCServeHTTP_64K-4     801kB ± 0%

name                 allocs/op
GRPCServe_1K-4           96.0 ± 0%
GRPCServe_64K-4           162 ± 0%
GRPCServeHTTP_1K-4        166 ± 0%
GRPCServeHTTP_64K-4       671 ± 0%
@tamird
Copy link
Contributor Author

tamird commented May 4, 2016

@bradfitz any progress on this?

@bradfitz
Copy link
Contributor

bradfitz commented May 4, 2016

No, I've been busy with Go 1.7 stuff for the May 1st freeze. Hopefully I can spend a bit of time on this soon, but then I'm out for a month starting in a couple weeks.

@mwitkow
Copy link
Contributor

mwitkow commented May 16, 2016

@bradfitz is there some way in which we could help? :)

@bradfitz
Copy link
Contributor

bradfitz commented May 16, 2016

is there some way in which we could help? :)

Sure. Look at the http2 code and find ways to improve it, while not hurting readability or maintainability.

@mwitkow
Copy link
Contributor

mwitkow commented May 16, 2016

Any pointers as to how to start? Is there any particular test you pprof to find bottlenecks? :)

@bradfitz
Copy link
Contributor

Honestly I forget. That code was mostly written for correctness and not performance. There's probably a lot of low-hanging fruit in there but there might not even be enough benchmarks yet to start using pprof & friends. You'd probably have to write at least a few new ones.

@tamird
Copy link
Contributor Author

tamird commented Jul 28, 2016

@iamqizhao this is a serious issue - can we include it for GA?

@iamqizhao
Copy link
Contributor

@tamird Sorry. We are approaching GA very closely and so far this is treated as an experimental effort and we do not expect the real usage for this feature at the current stage. This is in Brad's plate and I am going to talk to him after GA to see how to make progress on this.

@tamird
Copy link
Contributor Author

tamird commented Mar 13, 2017

This is still a major inconvenience for us. I've updated the performance numbers at https://github.com/cockroachdb/rpc-bench; the relevant ones are:

name                         time/op
GRPCServe_1K-24                40.5µs ± 1%
GRPCServe_64K-24                365µs ± 1%
GRPCServe_Stream_1K-24         22.7µs ± 1%
GRPCServe_Stream_64k-24         298µs ± 1%
GRPCServeHTTP_1K-24             137µs ± 1%
GRPCServeHTTP_64K-24            585µs ± 1%
GRPCServeHTTP_Stream_1K-24     38.7µs ± 1%
GRPCServeHTTP_Stream_64k-24     455µs ± 1%

name                         speed
GRPCServe_1K-24              50.5MB/s ± 1%
GRPCServe_64K-24              359MB/s ± 1%
GRPCServe_Stream_1K-24       90.3MB/s ± 1%
GRPCServe_Stream_64k-24       439MB/s ± 1%
GRPCServeHTTP_1K-24          14.9MB/s ± 1%
GRPCServeHTTP_64K-24          224MB/s ± 1%
GRPCServeHTTP_Stream_1K-24   52.9MB/s ± 1%
GRPCServeHTTP_Stream_64k-24   288MB/s ± 1%

name                         alloc/op
GRPCServe_1K-24                16.3kB ± 0%
GRPCServe_64K-24                711kB ± 0%
GRPCServe_Stream_1K-24         11.6kB ± 0%
GRPCServe_Stream_64k-24         707kB ± 0%
GRPCServeHTTP_1K-24            34.2kB ± 0%
GRPCServeHTTP_64K-24            763kB ± 0%
GRPCServeHTTP_Stream_1K-24     12.4kB ± 0%
GRPCServeHTTP_Stream_64k-24     739kB ± 0%

name                         allocs/op
GRPCServe_1K-24                  97.0 ± 0%
GRPCServe_64K-24                  147 ± 0%
GRPCServe_Stream_1K-24           22.0 ± 0%
GRPCServe_Stream_64k-24          72.0 ± 0%
GRPCServeHTTP_1K-24               169 ± 0%
GRPCServeHTTP_64K-24              275 ± 0%
GRPCServeHTTP_Stream_1K-24       29.0 ± 0%
GRPCServeHTTP_Stream_64k-24       138 ± 0%

So ServeHTTP's throughput is 50%-70% lower than Serve's, and its latency is 60%-350% higher.

@bradfitz
Copy link
Contributor

@tamird, the grpc-go team is undergoing a changing of the guard at the moment so this isn't much of a priority.

And I don't officially work on grpc-go. It's not surprising that ServeHTTP is slower considering it's shoehorned into the existing ServerTransport interface somewhat awkwardly.

I started working on a new design for all of grpc-go in https://github.com/bradfitz/grpc-go but it's not ready yet for wider discussion or testing. It was mostly a proof of concept. But it's something I intend to get back to. @philips et al at CoreOS are also interested.

@Iulian7
Copy link

Iulian7 commented May 3, 2017

Is there any update on this?

@hsaliak
Copy link

hsaliak commented May 16, 2017

@menghanl would you be able to take a look?

@hsaliak hsaliak added P2 Type: Performance Performance improvements (CPU, network, memory, etc) labels May 16, 2017
@mwitkow
Copy link
Contributor

mwitkow commented Jun 4, 2017

@dfawley this is the bug I mentioned in person at CoreOS Fest. Would be fantastic if we could reach serving parity on the HandlerTransport to the one with a pure socket listener.

@dfawley
Copy link
Member

dfawley commented Jun 5, 2017

I agree, but unfortunately we won't have any time to spend on this for at least another quarter. It will be a pretty sizable effort. It's possible we may get some contributions for this before then, but I wouldn't want to set any expectations at this point.

@glerchundi
Copy link

at least another quarter

Now that one quarter has passed, any chance to make any progress on this?

@dfawley
Copy link
Member

dfawley commented Feb 26, 2018

@mwitkow, @tamird, @glerchundi, I have been poking at this periodically and learning more about it since June.

The goal of throwing away the grpc http2 server and using x/net/http2 instead would be a major undertaking, and would require adding significant new features and hooks and (likely) optimizations into x/net/http2 to reach parity with the existing grpc http2 server. The handler-based transport was/is an experiment (and is documented as such), and isn't something fully supported. The core grpc-go team does not currently have the cycles to support this and keep up with our existing commitments -- let alone to completely replace our default transport with it -- but we would be happy to review PRs that improve features or performance of the handler-based transport.

Re: @bradfitz's comment:

It's not surprising that ServeHTTP is slower considering it's shoehorned into the existing
ServerTransport interface somewhat awkwardly.

I agree this is an awkward fit: the handler-based transport is mimicking a full transport for each RPC. As we continue to dig in and clean up some of the code and interfaces between the grpc layer and the transport layer (e.g.s #1745, #1854), it could happen that the handler-based transport becomes a more natural fit, and performance may be improved as a result if that is a factor in its performance differences.

@mofirouz
Copy link

Is this performance still an issue given the use of grpc.ServerHTTP?

I'm weighing on whether we should wrap gRPC's server with net/http server as I need to define / as healthcheck so that it will pass GCLB?

@snowzach
Copy link

snowzach commented Apr 6, 2019

@tamird is this still an issue for you guys?

@tamird
Copy link
Contributor Author

tamird commented Apr 6, 2019

I don't work on CockroachDB anymore, but it should be trivial for you to replicate my results using rpc-bench.

I can post new results on Monday if you'd like. Has any work been done in this area? I haven't seen any mention.

srenatus added a commit to chef/automate that referenced this issue May 22, 2019
This was carried over from some ancient grpc-gateway example code[1]; and
we're not exposing the GRPC API that way right now.

Using (*grpc.Server).ServeHTTP has some issues:

1. it's known to be slow, very slow[2]
2. it's seems to be not working with go-grpc >= 1.19 [3]

Furthermore, we're exposing the proper GRPC server (the native go-grpc
one) on a different port. If decide to (finally!) expose our GRPC API,
that's the thing we should expose.

So, since we don't need this multiplexing business, we shouldn't keep
the code around. This is the cleanup.

[1]: https://github.com/philips/grpc-gateway-example/
[2]: grpc/grpc-go#586
[3]: soheilhy/cmux#64 (comment)

Signed-off-by: Stephan Renatus <srenatus@chef.io>
srenatus added a commit to chef/automate that referenced this issue May 22, 2019
This was carried over from some ancient grpc-gateway example code[1]; and
we're not exposing the GRPC API that way right now.

Using (*grpc.Server).ServeHTTP has some issues:

1. it's known to be slow, very slow[2]
2. it's seems to be not working with go-grpc >= 1.19 [3]

Furthermore, we're exposing the proper GRPC server (the native go-grpc
one) on a different port. If decide to (finally!) expose our GRPC API,
that's the thing we should expose.

So, since we don't need this multiplexing business, we shouldn't keep
the code around. This is the cleanup.

[1]: https://github.com/philips/grpc-gateway-example/
[2]: grpc/grpc-go#586
[3]: soheilhy/cmux#64 (comment)

Signed-off-by: Stephan Renatus <srenatus@chef.io>
srenatus added a commit to chef/automate that referenced this issue May 22, 2019
* automate-gateway: cleanup GRPC/HTTP multiplexing cruft

This was carried over from some ancient grpc-gateway example code[1]; and
we're not exposing the GRPC API that way right now.

Using (*grpc.Server).ServeHTTP has some issues:

1. it's known to be slow, very slow[2]
2. it's seems to be not working with go-grpc >= 1.19 [3]

Furthermore, we're exposing the proper GRPC server (the native go-grpc
one) on a different port. If decide to (finally!) expose our GRPC API,
that's the thing we should expose.

So, since we don't need this multiplexing business, we shouldn't keep
the code around. This is the cleanup.

[1]: https://github.com/philips/grpc-gateway-example/
[2]: grpc/grpc-go#586
[3]: soheilhy/cmux#64 (comment)

* [integration] automate-gateway: fix tests to use GRPC port

So apparently these HAD used the multiplexing code I've deleted. Well,
they don't have to. Changed the hardcoded port to automate-gateway's
default grpc port.

Signed-off-by: Stephan Renatus <srenatus@chef.io>
@stale
Copy link

stale bot commented Sep 6, 2019

This issue is labeled as requiring an update from the reporter, and no update has been received after 7 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@tcolgate
Copy link

tcolgate commented Feb 20, 2020

I hacked around and got the rpc-tests working and using the standard lib http2 (instead of the x/net/http2. Current results for GRPC Server vs GRPC ServeHTTP:

BenchmarkGRPCServe_1K-10                           55292             22533 ns/op          90.89 MB/s
BenchmarkGRPCServe_64K-10                           4573            226846 ns/op         577.80 MB/s
BenchmarkGRPCServe_Stream_1K-10                   101989             11717 ns/op         174.79 MB/s
BenchmarkGRPCServe_Stream_64k-10                    5167            221615 ns/op         591.44 MB/s
BenchmarkGRPCServeHTTP_1K-10                       19066             63799 ns/op          32.10 MB/s
BenchmarkGRPCServeHTTP_64K-10                       3297            365040 ns/op         359.06 MB/s
BenchmarkGRPCServeHTTP_Stream_1K-10                59731             19344 ns/op         105.87 MB/s
BenchmarkGRPCServeHTTP_Stream_64k-10                3696            307412 ns/op         426.37 MB/s
BenchmarkGobRPC_1K-10                              90954             12879 ns/op         159.02 MB/s
BenchmarkGobRPC_64K-10                             10990            111328 ns/op        1177.34 MB/s
BenchmarkProtoHTTP1_1K-10                           7953            139164 ns/op          14.72 MB/s
BenchmarkProtoHTTP1_64K-10                          1718            601093 ns/op         218.06 MB/s
BenchmarkProtoHTTP2_1K-10                           9948            126565 ns/op          16.18 MB/s
BenchmarkProtoHTTP2_64K-10                          2184            565510 ns/op         231.78 MB/s

The ProtoRPC benchmarks aren't running, but I'm not that fussed about those.

@tcolgate
Copy link

This implies the results are still in the sme ballpark as the previous runs (grpc 1.27.1, go 1.13). For me the overhead is worth being able to run on regular mux , single port, without cmux (which gets confusing fast). But clearly it can be improved.
My guess is it is quite a big job to get enough of an understanding of this to help out, I'm willing to give it a go. If anyone can point me at some files that would help. I'll start taking a look anyway.

@snowzach
Copy link

@tcolgate I could never get the benchmarks to run. Admittedly, I didn't spend a ton of time on it. If you create a repo with working benchmarks, I wouldn't mind taking a crack at it as well or helping.

@tcolgate
Copy link

Uploaded here: https://github.com/tcolgate/rpc-bench
It's probably worth putting some benchmarks in grpc-go directly, if they aren't there already. I started poking around the code. @bradfitz attempted replacement is probably a good way to get to understand what's going on, though it's not rebasable, there's less code and it's simpler. It will at least be an aide to understanding.

@dfawley
Copy link
Member

dfawley commented May 3, 2021

Closing due to lack of activity and priority.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P3 Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

No branches or pull requests