Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Stats handler and InTapHandle called during critical read path #2811

Open
dfawley opened this issue May 14, 2019 · 3 comments
Open
Labels
Area: Server Includes Server, Streams and Server Options. P2 Type: Performance Performance improvements (CPU, network, memory, etc)

Comments

@dfawley
Copy link
Member

dfawley commented May 14, 2019

Moving the service handler goroutine forking into the transport (from server.go) would allow other reads to proceed without blocking on the tap/stats handlers.

@dfawley dfawley added P1 Type: Performance Performance improvements (CPU, network, memory, etc) labels May 14, 2019
@canguler
Copy link

Anything below line 385, including stats handler, can easily be moved to another go routine: https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_server.go#L385

Code between lines 358 and 385 does some checks and then saves the created stream into activeStreams map. It makes use of transport's mutex. So, moving them into another go routine risks lock contention.

InTapHandle code lives between lines 341 and 358 and doesn't use the mutex. If it is OK to run it after stream creation, then it can be moved into the new go routine. But its documentation says that InTapHandle should run before the stream is created.

This draft shows how the safe part can be moved into its own go routine: https://github.com/grpc/grpc-go/compare/master...canguler:a?expand=1

@canguler
Copy link

canguler commented Jun 3, 2019

I changed my draft code a bit more and ran benchmarks. New implementation (with forking a go routine) had ~5% worse QPS and latency results.
Code here.

Benchmark results:

go run benchmark/benchmain/main.go -benchtime=10s -workloads=unary -compression=off -maxConcurrentCalls=100   -reqSizeBytes=1 -respSizeBytes=1 -networkMode=Local 

Without this change:

Unary-traceMode_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-Compressor_off-Preloader_false: 
50_Latency: 708.0560 µs 	90_Latency: 1323.9940 µs 	99_Latency: 1657.3950 µs 	Avg latency: 849.0950 µs 	Count: 1177079 	9450 Bytes/op	169 Allocs/op	
Histogram (unit: µs)
Count: 1177079  Min: 121.9  Max: 28079.0  Avg: 849.10
------------------------------------------------------------
[     121.905000,      121.906000)        1    0.0%    0.0%  
[     121.906000,      121.911720)        0    0.0%    0.0%  
[     121.911720,      121.950162)        0    0.0%    0.0%  
[     121.950162,      122.208504)        0    0.0%    0.0%  
[     122.208504,      123.944634)        0    0.0%    0.0%  
[     123.944634,      135.611937)        0    0.0%    0.0%  
[     135.611937,      214.019607)        6    0.0%    0.0%  
[     214.019607,      740.941973)   705278   59.9%   59.9%  ######
[     740.941973,     4282.013644)   468485   39.8%   99.7%  ####
[    4282.013644,    28079.046000)     3308    0.3%  100.0%  
[   28079.046000,             inf)        1    0.0%  100.0%  

Number of requests:  1177079	Request throughput:  941663.2 bit/s
Number of responses: 1177079	Response throughput: 941663.2 bit/s

With this change:

Unary-traceMode_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-Compressor_off-Preloader_false: 
50_Latency: 730.0850 µs 	90_Latency: 1383.5280 µs 	99_Latency: 1969.0170 µs 	Avg latency: 896.3180 µs 	Count: 1115219 	9460 Bytes/op	169 Allocs/op	
Histogram (unit: µs)
Count: 1115219  Min: 272.7  Max: 28344.5  Avg: 896.32
------------------------------------------------------------
[     272.665000,      272.666000)        1    0.0%    0.0%  
[     272.666000,      272.671723)        0    0.0%    0.0%  
[     272.671723,      272.710203)        0    0.0%    0.0%  
[     272.710203,      272.968918)        0    0.0%    0.0%  
[     272.968918,      274.708349)        0    0.0%    0.0%  
[     274.708349,      286.403149)        0    0.0%    0.0%  
[     286.403149,      365.031371)       21    0.0%    0.0%  
[     365.031371,      893.676332)   827263   74.2%   74.2%  #######
[     893.676332,     4447.940831)   283711   25.4%   99.6%  ###
[    4447.940831,    28344.501000)     4222    0.4%  100.0%  
[   28344.501000,             inf)        1    0.0%  100.0%  

Number of requests:  1115219	Request throughput:  892175.2 bit/s
Number of responses: 1115219	Response throughput: 892175.2 bit/s

@stale stale bot added the stale label Sep 6, 2019
@dfawley dfawley removed the stale label Sep 6, 2019
@grpc grpc deleted a comment from stale bot Nov 7, 2019
@adtac
Copy link
Contributor

adtac commented Nov 7, 2019

ping @menghanl

@dfawley dfawley added P2 and removed P1 labels Dec 10, 2019
@aranjans aranjans added the Area: Server Includes Server, Streams and Server Options. label Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Server Includes Server, Streams and Server Options. P2 Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

No branches or pull requests

4 participants