Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Fix deadlock produced during server shutdown #2966

Merged
merged 4 commits into from
Apr 17, 2023

Conversation

cthulhu-rider
Copy link
Contributor

special attention to b6e34b2, maybe I missed the hidden meaning

@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #2966 (e29c33e) into master (37f7c9d) will decrease coverage by 0.08%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master    #2966      +/-   ##
==========================================
- Coverage   84.86%   84.78%   -0.08%     
==========================================
  Files         329      328       -1     
  Lines       43028    43047      +19     
==========================================
- Hits        36515    36498      -17     
- Misses       5028     5066      +38     
+ Partials     1485     1483       -2     
Impacted Files Coverage Δ
cli/server/server.go 68.75% <66.66%> (+0.04%) ⬆️
pkg/services/rpcsrv/server.go 77.76% <100.00%> (+0.22%) ⬆️

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

cli/server/server.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server_test.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server_test.go Outdated Show resolved Hide resolved
cli/server/server.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server.go Outdated Show resolved Hide resolved
cli/server/server.go Show resolved Hide resolved
@cthulhu-rider cthulhu-rider force-pushed the bugfix/2896-rpc-shutdown-deadlock branch from b53b1bd to 4926d8b Compare April 14, 2023 11:01
@cthulhu-rider cthulhu-rider marked this pull request as ready for review April 14, 2023 11:16
pkg/services/rpcsrv/server_test.go Outdated Show resolved Hide resolved
pkg/services/rpcsrv/server_test.go Outdated Show resolved Hide resolved
There is an existing problem with RPC server shutdown freeze after start
failure due to some init actions (at least HTTP listen) described in
#2896.

Add dedicated unit test which checks that `Shutdown` returns within 5s
after `Start` method encounters internal problems.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
Previously RPC server could never be shut down completely due to
some start precondition failure (in particular, inability to serve HTTP
on any configured endpoint). The problem was caused by next facts:
 * start method ran subscription routine after HTTP init succeeded only
 * stop method blocked waiting for the subscription routine to return

Run `handleSubEvents` routine on fresh `Start` unconditionally. With
this change, `Shutdown` method won't produce deadlock since
`handleSubEvents` closes wait channel.

Refs #2896.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
Previously RPC server shutdown procedure listened to the execution
channel and stopped at the first element that arrived in the queue. This
could lead to the following problems:
 * stopper could steal the execution result from subscriber
 * stopper didn't wait for other subscription actions to complete

Add dedicated channel to `Server` for subscription routine. Close the
channel on `handleSubEvents` return and wait for signal in `Shutdown`.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
If `StartWhenSynchronized` is unset in config, `node` command runs RPC
service instantly. Previously there was a ground for deadlock. Command
started RPC server synchronously. According to server implementation, it
sends all internal failures to the parameterized error channel. Deadlock
occured because main routine didn't scan the channel.

Run `rpcsrv.Server.Start` in a separate go-routine in `startServer`.
This prevents potential deadlock caused by writing into unread channel.

Fixes #2896.

Signed-off-by: Leonard Lyubich <leonard@morphbits.io>
@roman-khimov roman-khimov merged commit a4cc6da into master Apr 17, 2023
@roman-khimov roman-khimov deleted the bugfix/2896-rpc-shutdown-deadlock branch April 17, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpc: App hangs on when rpc server unhealthy
3 participants