Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve distributed_work stopping with ongoing worker tasks #2369

Conversation

guilhermelawless
Copy link
Contributor

@guilhermelawless guilhermelawless commented Oct 29, 2019

Found while running RPC epoch_upgrade which places the task in nano::worker. When preemptively stopping the node (with SIGINT, not RPC "stop") the I/O threads are destroyed and the background work_generate_blocking task can't complete if using work peers.

This adds manual canceling of all ongoing work before attempting to destroy the worker.

Also fixes an issue where canceled work would turn into zero-filled work, which was only a problem when stopping the node.

@guilhermelawless guilhermelawless added this to the V20.0 milestone Oct 29, 2019
@guilhermelawless guilhermelawless self-assigned this Oct 29, 2019
@guilhermelawless guilhermelawless requested review from cryptocode and removed request for wezrule October 29, 2019 11:01
Comment on lines +454 to +455
void nano::distributed_work_factory::stop ()
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should items be cleared at the end of this function? Otherwise in the destructor they can be cancelled again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each vector of work is erased in cancel() , but I've added a clear to make sure, and a stopped flag to avoid calling stop() twice or adding new work in make () after stopping

@guilhermelawless guilhermelawless merged commit 2ad00ef into nanocurrency:master Oct 30, 2019
@guilhermelawless guilhermelawless deleted the distributed-work/improve-stop branch October 30, 2019 14:22
argakiig pushed a commit that referenced this pull request Oct 31, 2019
* Improve distributed_work stopping with ongoing worker tasks

* Another const

* Fix work_generate_blocking replacing cancelled work with zero-filled work

* Make sure items cannot be canceled twice

* Simplifying

* Protect stopped

* Fix ocasionally stuck tests

* Return true/false for errors in distributed_work::make(), leaving the decision to do the callback for the caller

* Add a comment to clarify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants