-
Notifications
You must be signed in to change notification settings - Fork 4.3k
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate header decoding performance #1587
Comments
Actually, this wasn't doubling performance, it was more like a ~30% slowdown with header decoding enabled. Also, I didn't fully understand what was meant by not decoding the headers (because I assumed the http2 library always did some decoding on its side), so now that I understand that it can be completely disabled, this seems reasonable. |
@dfawley I just re-tested this and the slowdown I see is 45%. Flipped around, not decoding headers provides an 84% improvement in throughput. I haven't looked at all at what about header decoding is causing this problem. The only difference between the two tests (using diff --git a/y.go b/y.go
index 1bfb580..b2ab044 100644
--- a/y.go
+++ b/y.go
@@ -36,7 +36,7 @@ func newYServer(conn net.Conn) *yServerConn {
y.wr = bufio.NewWriter(conn)
y.fr = http2.NewFramer(y.wr, y.rd)
y.fr.SetReuseFrames()
- y.fr.ReadMetaHeaders = hpack.NewDecoder(4096, nil)
+ // y.fr.ReadMetaHeaders = hpack.NewDecoder(4096, nil)
y.sender.cond.L = &y.sender.Mutex
return y
}
@@ -185,7 +185,7 @@ func newYClient(conn net.Conn) *yClientConn {
y.wr = bufio.NewWriter(conn)
y.fr = http2.NewFramer(y.wr, y.rd)
y.fr.SetReuseFrames()
- y.fr.ReadMetaHeaders = hpack.NewDecoder(4096, nil)
+ // y.fr.ReadMetaHeaders = hpack.NewDecoder(4096, nil)
y.sender.cond.L = &y.sender.Mutex
y.receiver.streamID = 1
y.receiver.pending = make(map[uint32]*yPending) With header decoding:
Without header decoding:
|
Ok, I took a quick look and it appears that that ~50% of the performance delta is due to allocations of the |
How are you getting that number? I ran a cpu profile on your benchmarks( btw cpu profiling there is a little buggy at least for y, if you have fixed your local branch, can you push the updates?) and it seems to me that most of
Did you make a prototype change already? implementation of Our current focus is on reducing contention for high-concurrency use-cases. Vetting net/http's implementation for performance seems like long shot right now. @dfawley what do you think? |
I added caching of Concretely (and these numbers differ from the ones above because they were gathered between 2 VMs in GCE):
Yes. |
@petermattis Are those latest numbers with gRPC or your x/y implementations? I'm assuming the latter because of this comment:
So you're saying optimizations in header decoding doesn't help gRPC at all? If so, that probably means the bottleneck is in the sending path right now. Maybe we can turn this into an issue/PR against http2. Reusing header frames seems like a useful feature for everyone. |
These numbers were with my
Yes, these optimizations do not help gRPC at all right now. I would guess that the gRPC bottleneck is somewhere in the sending path, though it is also possible something is happening on the receiving path.
Sounds good to me. |
cockroachdb/cockroach#17370 (comment)
@petermattis found that disabling header decoding can double overall throughput in his benchmarks. This is surprising, and should be investigated.
The text was updated successfully, but these errors were encountered: