Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gracefully suspend bee node on interrupt signal #3671

Merged
merged 13 commits into from
Dec 22, 2022

Conversation

vladopajic
Copy link
Contributor

@vladopajic vladopajic commented Dec 21, 2022

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Closes: #3660


This change is Reviewable

@vladopajic vladopajic requested review from a team, istae and notanatol and removed request for a team December 21, 2022 10:21
@vladopajic vladopajic added the ready for review The PR is ready to be reviewed label Dec 21, 2022
},
stop: func() {
// Shutdown
// Whenever program is being stoppend we need to close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in stoppend

@mrekucci
Copy link
Contributor

The interrupt signal still does not work when batch synchronization is in progress. Also when a node is waiting to be funded.
The interrupt signal probably has to be propagated to the postage listener and also to the init chequebook service.

@vladopajic vladopajic changed the title fix: interrupt bee node and shutdown fix: gracefully suspend bee node on interrupt signal Dec 22, 2022
select {
case <-sysInterruptChannel:
logger.Debug("received interrupt signal")
case <-time.After(shutdownTimeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Adding a hard-coded timeout is not the right thing to do. There could be valid cases when the node takes more than 5 secs to stop.

As the comment said, the correct way to handle this is to check for a second interrupt signal to force stop if user does not have the patience. I am not sure if this can be done using context cancellation.

Copy link
Contributor

@aloknerurkar aloknerurkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. This looks good. Can you manually test the following scenarios, as we dont have proper unit tests in this package.

  1. Ctrl-C waiting for the chequebook deployment
  2. Ctrl-C while postage sync is running
  3. Ideally we should test if Ctrl-C also works during bootstrapping. But right now we have not yet updated the postage snapshot feed. When we do, we should test this.
  4. Randomly try to do Ctrl-C during different times. Also trying this after node warmup period (5mins) is done.

@vladopajic
Copy link
Contributor Author

@aloknerurkar I have manually testes all those scenarios. However there could be some issue if some logic is not responding to context cancelation signal. This would be hard to catch on this level; instead it should be covered/tested in respective package.

@vladopajic vladopajic merged commit 84b7604 into master Dec 22, 2022
@vladopajic vladopajic deleted the fix-bee-interrupt branch December 22, 2022 12:37
@istae istae added this to the 1.11.0 milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

batch syncing does not respond to interrupt signal
5 participants