Improve distributed_work stopping with ongoing worker tasks #2369

guilhermelawless · 2019-10-29T10:50:40Z

Found while running RPC epoch_upgrade which places the task in nano::worker. When preemptively stopping the node (with SIGINT, not RPC "stop") the I/O threads are destroyed and the background work_generate_blocking task can't complete if using work peers.

This adds manual canceling of all ongoing work before attempting to destroy the worker.

Also fixes an issue where canceled work would turn into zero-filled work, which was only a problem when stopping the node.

…work

wezrule · 2019-10-29T18:21:50Z

nano/node/distributed_work.cpp

+void nano::distributed_work_factory::stop ()
+{


Should items be cleared at the end of this function? Otherwise in the destructor they can be cancelled again

Each vector of work is erased in cancel() , but I've added a clear to make sure, and a stopped flag to avoid calling stop() twice or adding new work in make () after stopping

… decision to do the callback for the caller

nano/node/distributed_work.cpp

* Improve distributed_work stopping with ongoing worker tasks * Another const * Fix work_generate_blocking replacing cancelled work with zero-filled work * Make sure items cannot be canceled twice * Simplifying * Protect stopped * Fix ocasionally stuck tests * Return true/false for errors in distributed_work::make(), leaving the decision to do the callback for the caller * Add a comment to clarify

Improve distributed_work stopping with ongoing worker tasks

1246557

guilhermelawless added the bug label Oct 29, 2019

guilhermelawless added this to the V20.0 milestone Oct 29, 2019

guilhermelawless requested a review from wezrule October 29, 2019 10:50

guilhermelawless self-assigned this Oct 29, 2019

guilhermelawless requested review from cryptocode and removed request for wezrule October 29, 2019 11:01

Another const

6765c9a

guilhermelawless requested a review from wezrule October 29, 2019 12:08

Fix work_generate_blocking replacing cancelled work with zero-filled …

3acab76

…work

wezrule reviewed Oct 29, 2019

View reviewed changes

guilhermelawless added 5 commits October 29, 2019 18:43

Make sure items cannot be canceled twice

805ba44

Simplifying

07fe122

Protect stopped

9635453

Fix ocasionally stuck tests

bd145dd

Return true/false for errors in distributed_work::make(), leaving the…

f021dc7

… decision to do the callback for the caller

cryptocode reviewed Oct 30, 2019

View reviewed changes

nano/node/distributed_work.cpp Show resolved Hide resolved

Add a comment to clarify

9aa579d

cryptocode approved these changes Oct 30, 2019

View reviewed changes

guilhermelawless requested a review from wezrule October 30, 2019 13:54

wezrule approved these changes Oct 30, 2019

View reviewed changes

guilhermelawless merged commit 2ad00ef into nanocurrency:master Oct 30, 2019

guilhermelawless deleted the distributed-work/improve-stop branch October 30, 2019 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve distributed_work stopping with ongoing worker tasks #2369

Improve distributed_work stopping with ongoing worker tasks #2369

guilhermelawless commented Oct 29, 2019 •

edited

wezrule Oct 29, 2019

guilhermelawless Oct 29, 2019

		void nano::distributed_work_factory::stop ()
		{

Improve distributed_work stopping with ongoing worker tasks #2369

Improve distributed_work stopping with ongoing worker tasks #2369

Conversation

guilhermelawless commented Oct 29, 2019 • edited

wezrule Oct 29, 2019

Choose a reason for hiding this comment

guilhermelawless Oct 29, 2019

Choose a reason for hiding this comment

guilhermelawless commented Oct 29, 2019 •

edited