Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/Subscribe dead lock #2272

Merged
merged 1 commit into from
Feb 27, 2023
Merged

Fix/Subscribe dead lock #2272

merged 1 commit into from
Feb 27, 2023

Conversation

carpawell
Copy link
Member

It led to a neo-go dead-lock in the subscriber component.

@codecov
Copy link

codecov bot commented Feb 22, 2023

Codecov Report

Merging #2272 (a323e4d) into support/v0.35 (dea9713) will increase coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head a323e4d differs from pull request most recent head 5616177. Consider uploading reports for the commit 5616177 to get more accurate results

@@              Coverage Diff               @@
##           support/v0.35    #2272   +/-   ##
==============================================
  Coverage          31.00%   31.00%           
==============================================
  Files                384      384           
  Lines              28461    28453    -8     
==============================================
- Hits                8824     8823    -1     
+ Misses             18897    18890    -7     
  Partials             740      740           
Impacted Files Coverage Δ
pkg/morph/event/listener.go 0.00% <0.00%> (ø)
pkg/local_object_storage/engine/get.go 79.31% <0.00%> (-1.15%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@fyrchik fyrchik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also describe a particular deadlock scenario in the commit message?

return nil
}

func (l *listener) listenLoop(ctx context.Context, intErr chan<- error,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change it's signature? Ideally, only startup code should be affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was not listenLoop before: it was subscribeAndListen. it could be done in some other way but i found it more readable

Ideally, only startup code should be affected.

that is startup code changes in fact. just listening and subscribing routines shares some channels now (and they have to do that to prevent dead-lock)

@carpawell
Copy link
Member Author

Extended commit msg.

}

func (s *subscriber) SubscribeForNotaryRequests(mainTXSigner util.Uint160) (<-chan *result.NotaryRequestEvent, error) {
func (s *subscriber) SubscribeForNotaryRequests(rcv chan<- *result.NotaryRequestEvent, mainTXSigner util.Uint160) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we could subscribe multiple times, but now we have a single channel. IMO this is a time-bomb.
I am also bothered about possible race-conditions when we switch to another endpoint.
Not something we would like to do in the support branch.

Specifically: can we prove that we cannot block when we read the channel, switch to another endpoint and then send a notifications?

@carpawell
Copy link
Member Author

Simplified changes.

fyrchik
fyrchik previously approved these changes Feb 27, 2023
case err := <-subErrCh:
if intErr != nil {
intErr <- err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a log in the else branch in the previous implementation. Why do we not have it now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added Error (not Debug like was before) log

It led to a neo-go dead-lock in the `subscriber` component. Subscribing to
notifications is the same RPC as any others, so it could also be blocked
forever if no async listening (reading the notification channel) routine
exists. If a number of subscriptions is big enough (or a caller is lucky
enough) subscribing loop might have not finished subscribing before the
first notification is received and then: subscribing RPC is blocked by
received notification (non)handling and listening notifications routine is
blocked by not finished subscription loop.
That commit starts listening notification channel _before_ any subscription
actions.

Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
@fyrchik fyrchik merged commit 5867961 into nspcc-dev:support/v0.35 Feb 27, 2023
@carpawell carpawell deleted the fix/subscribe-dead-lock branch February 27, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants