stream: allow codecs to reuse output buffers #2816

dfawley · 2019-05-15T15:53:50Z

This would require a codec API change/extension to recycle the memory once grpc is done writing it to the wire (or compressing it).

Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 61821 65568 6.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116033.83 105560.37 -9.03% Allocs/op 111.79 117.89 5.37% ReqT/op 506437632.00 537133056.00 6.06% RespT/op 506437632.00 537133056.00 6.06% 50th-Lat 143.303µs 136.558µs -4.71% 90th-Lat 197.926µs 188.623µs -4.70% 99th-Lat 521.575µs 507.591µs -2.68% Avg-Lat 161.294µs 152.038µs -5.74% Closes grpc#2816

Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 61821 65568 6.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116033.83 105560.37 -9.03% Allocs/op 111.79 117.89 5.37% ReqT/op 506437632.00 537133056.00 6.06% RespT/op 506437632.00 537133056.00 6.06% 50th-Lat 143.303µs 136.558µs -4.71% 90th-Lat 197.926µs 188.623µs -4.70% 99th-Lat 521.575µs 507.591µs -2.68% Avg-Lat 161.294µs 152.038µs -5.74% Closes grpc#2816

Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816

dfawley · 2021-01-26T16:35:33Z

The PR that implemented this ultimately needed to be rolled back (#3307); this should have been reopened at that time.

menghanl · 2021-05-03T18:57:26Z

There's another attempt to reuse the buffer for reads, but the team didn't have time to review the PR (#3220 (comment)).
Future performance work may reopen it, or reuse some of the change.

ginayeh · 2023-09-19T18:45:50Z

Bring tags over to #6619 and close this down.

dfawley added P2 Type: Performance Performance improvements (CPU, network, memory, etc) labels May 15, 2019

dfawley assigned canguler May 15, 2019

dfawley mentioned this issue May 15, 2019

transport: reuse memory on read path #1455

Closed

stale bot added the stale label Sep 6, 2019

dfawley removed the stale label Sep 6, 2019

menghanl unassigned canguler Nov 7, 2019

adtac mentioned this issue Nov 8, 2019

codec/proto: reuse of marshal byte buffers #3167

Merged

dfawley closed this as completed in #3167 Dec 20, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

dfawley reopened this Jan 26, 2021

grpc deleted a comment from stale bot May 3, 2021

dfawley added P3 and removed P2 labels May 16, 2022

ginayeh closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream: allow codecs to reuse output buffers #2816

stream: allow codecs to reuse output buffers #2816

dfawley commented May 15, 2019

dfawley commented Jan 26, 2021 •

edited by menghanl

Loading

menghanl commented May 3, 2021 •

edited

Loading

ginayeh commented Sep 19, 2023

stream: allow codecs to reuse output buffers #2816

stream: allow codecs to reuse output buffers #2816

Comments

dfawley commented May 15, 2019

dfawley commented Jan 26, 2021 • edited by menghanl Loading

menghanl commented May 3, 2021 • edited Loading

ginayeh commented Sep 19, 2023

dfawley commented Jan 26, 2021 •

edited by menghanl

Loading

menghanl commented May 3, 2021 •

edited

Loading