Standardize work waiting pattern #2854

marta-lokhova · 2020-12-22T19:07:17Z

Resolves #2519

MonsieurNicolas · 2021-01-05T02:14:59Z

src/historywork/RunCommandWork.cpp

-                    self->wakeUp();
-                }
-            });
+            exit->async_wait(wakeSelfUpCallback());


This seems to be a different change and shows that the abstractions are not exactly right: we're not using any "waiting timer" in this situation, yet we call a function that talks about timers in its name...
We could add the concept of error code to "wakeUp" but it doesn't seem to fit well from a generic point of view, so I would keep RunCommand untouched.

Agree I think it's easier to reason about this without RunCommandWork

MonsieurNicolas · 2021-01-05T02:37:22Z

src/historywork/ResolveSnapshotWork.cpp

@@ -27,7 +26,7 @@ BasicWork::State
 ResolveSnapshotWork::onRun()
 {
    ZoneScoped;
-    if (mEc)
+    if (waitingTimerFailed())


I think the code was (and still is) confusing: for this to fail, it would have to be a cancelled timer, which can only happen if we delete the Work object while it's waiting for the timer to trigger.

In fact it's so bad that with 3 classes, we have 3 slightly different behaviors!

TxSimGenerateBucketsWork was ignoring this case (noop on failure) so would never wake up work in that situation (now we do?!).

ConditionalWork always reschedules which is also wrong (was and still is probably wrong).

To me it looks like we should instead take away control from implementors on what happens when we want to "yield" (to avoid busy waits):

add a new return code for onRun WORK_SLEEPING that tells BasicWork to setup the timer and callback in a consistent way (internal state would still be WAITING but BasicWork::crankWork() would make sure to do the setup to wake things up later and make keep track of some internal state to handle some edge cases).

probably make the sleep duration a property (default to 1 second) so that callers can make it shorter if they want to

harden the behavior so that we can actually reason about what will happen if a Work gets aborted/deleted while waiting like this

Your comment made me go back to the implementors and understand the use cases better. What I realize is that the three works don't need to check on the timer status (e.g. ResolveSnapshotWork), and timer cancellation/expiration should be handled within the BasicWork itself. After realizing that, I think the change is simpler than what you propose (and is only a slight variation of the current diff, and no new state is needed). The list of features is as follows:

Child works are only allowed to call the timer setup function (which is idempotent) then transition to WAITING state (no direct access to the timer)

All callbacks no-op on failure. I believe it's the right thing to do because the timer only gets cancelled if 1. work is abruptly deleted (this violates existing work safeguards already as work must be destructed in finished state, so I don't see a reason to wake up work here), 2. work is abruptly reset (this also violates the allowed transitions, as work is only reset in the finished state) and 3. work is aborting (I added timer cancellation when shutdown is issued -- I don't think it makes sense to fire the callback and wake up work, because once in ABORTING state, onRun is never scheduled)

I implemented the changes, and am working on adding a test for this behavior, but let me know if you don't agree with the proposal in the meantime.

yes this makes sense.

updated, added a test case as well.

MonsieurNicolas · 2021-01-11T17:47:48Z

r+ dc76982

MonsieurNicolas requested changes Jan 5, 2021

View reviewed changes

MonsieurNicolas added this to In progress in v15.2.0 via automation Jan 5, 2021

marta-lokhova force-pushed the work_improvements branch 2 times, most recently from 3bffe84 to 41b76ab Compare January 8, 2021 02:45

Make waiting mechanism a property of BasicWork

dc76982

marta-lokhova force-pushed the work_improvements branch from 41b76ab to dc76982 Compare January 8, 2021 02:47

latobarita merged commit 3db5591 into stellar:master Jan 11, 2021

v15.2.0 automation moved this from In progress to Done Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize work waiting pattern #2854

Standardize work waiting pattern #2854

marta-lokhova commented Dec 22, 2020

MonsieurNicolas Jan 5, 2021

marta-lokhova Jan 7, 2021

MonsieurNicolas Jan 5, 2021

marta-lokhova Jan 7, 2021

MonsieurNicolas Jan 8, 2021

marta-lokhova Jan 8, 2021

MonsieurNicolas commented Jan 11, 2021

Standardize work waiting pattern #2854

Standardize work waiting pattern #2854

Conversation

marta-lokhova commented Dec 22, 2020

MonsieurNicolas Jan 5, 2021

Choose a reason for hiding this comment

marta-lokhova Jan 7, 2021

Choose a reason for hiding this comment

MonsieurNicolas Jan 5, 2021

Choose a reason for hiding this comment

marta-lokhova Jan 7, 2021

Choose a reason for hiding this comment

MonsieurNicolas Jan 8, 2021

Choose a reason for hiding this comment

marta-lokhova Jan 8, 2021

Choose a reason for hiding this comment

MonsieurNicolas commented Jan 11, 2021