-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/Subscribe dead lock #2272
Fix/Subscribe dead lock #2272
Conversation
Codecov Report
@@ Coverage Diff @@
## support/v0.35 #2272 +/- ##
==============================================
Coverage 31.00% 31.00%
==============================================
Files 384 384
Lines 28461 28453 -8
==============================================
- Hits 8824 8823 -1
+ Misses 18897 18890 -7
Partials 740 740
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also describe a particular deadlock scenario in the commit message?
pkg/morph/event/listener.go
Outdated
return nil | ||
} | ||
|
||
func (l *listener) listenLoop(ctx context.Context, intErr chan<- error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change it's signature? Ideally, only startup code should be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was not listenLoop
before: it was subscribeAndListen
. it could be done in some other way but i found it more readable
Ideally, only startup code should be affected.
that is startup code changes in fact. just listening and subscribing routines shares some channels now (and they have to do that to prevent dead-lock)
Extended commit msg. |
pkg/morph/subscriber/subscriber.go
Outdated
} | ||
|
||
func (s *subscriber) SubscribeForNotaryRequests(mainTXSigner util.Uint160) (<-chan *result.NotaryRequestEvent, error) { | ||
func (s *subscriber) SubscribeForNotaryRequests(rcv chan<- *result.NotaryRequestEvent, mainTXSigner util.Uint160) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we could subscribe multiple times, but now we have a single channel. IMO this is a time-bomb.
I am also bothered about possible race-conditions when we switch to another endpoint.
Not something we would like to do in the support branch.
Specifically: can we prove that we cannot block when we read the channel, switch to another endpoint and then send a notifications?
Simplified changes. |
case err := <-subErrCh: | ||
if intErr != nil { | ||
intErr <- err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a log in the else
branch in the previous implementation. Why do we not have it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added Error
(not Debug
like was before) log
It led to a neo-go dead-lock in the `subscriber` component. Subscribing to notifications is the same RPC as any others, so it could also be blocked forever if no async listening (reading the notification channel) routine exists. If a number of subscriptions is big enough (or a caller is lucky enough) subscribing loop might have not finished subscribing before the first notification is received and then: subscribing RPC is blocked by received notification (non)handling and listening notifications routine is blocked by not finished subscription loop. That commit starts listening notification channel _before_ any subscription actions. Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
It led to a neo-go dead-lock in the
subscriber
component.