New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize work waiting pattern #2854
Conversation
src/historywork/RunCommandWork.cpp
Outdated
self->wakeUp(); | ||
} | ||
}); | ||
exit->async_wait(wakeSelfUpCallback()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a different change and shows that the abstractions are not exactly right: we're not using any "waiting timer" in this situation, yet we call a function that talks about timers in its name...
We could add the concept of error code to "wakeUp" but it doesn't seem to fit well from a generic point of view, so I would keep RunCommand
untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree I think it's easier to reason about this without RunCommandWork
@@ -27,7 +26,7 @@ BasicWork::State | |||
ResolveSnapshotWork::onRun() | |||
{ | |||
ZoneScoped; | |||
if (mEc) | |||
if (waitingTimerFailed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code was (and still is) confusing: for this to fail, it would have to be a cancelled timer, which can only happen if we delete the Work object while it's waiting for the timer to trigger.
In fact it's so bad that with 3 classes, we have 3 slightly different behaviors!
TxSimGenerateBucketsWork
was ignoring this case (noop on failure) so would never wake up work in that situation (now we do?!).ConditionalWork
always reschedules which is also wrong (was and still is probably wrong).
To me it looks like we should instead take away control from implementors on what happens when we want to "yield" (to avoid busy waits):
- add a new return code for
onRun
WORK_SLEEPING
that tellsBasicWork
to setup the timer and callback in a consistent way (internal state would still beWAITING
butBasicWork::crankWork()
would make sure to do the setup to wake things up later and make keep track of some internal state to handle some edge cases). - probably make the sleep duration a property (default to 1 second) so that callers can make it shorter if they want to
- harden the behavior so that we can actually reason about what will happen if a
Work
gets aborted/deleted while waiting like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment made me go back to the implementors and understand the use cases better. What I realize is that the three works don't need to check on the timer status (e.g. ResolveSnapshotWork), and timer cancellation/expiration should be handled within the BasicWork itself. After realizing that, I think the change is simpler than what you propose (and is only a slight variation of the current diff, and no new state is needed). The list of features is as follows:
- Child works are only allowed to call the timer setup function (which is idempotent) then transition to WAITING state (no direct access to the timer)
- All callbacks no-op on failure. I believe it's the right thing to do because the timer only gets cancelled if 1. work is abruptly deleted (this violates existing work safeguards already as work must be destructed in finished state, so I don't see a reason to wake up work here), 2. work is abruptly reset (this also violates the allowed transitions, as work is only reset in the finished state) and 3. work is aborting (I added timer cancellation when shutdown is issued -- I don't think it makes sense to fire the callback and wake up work, because once in ABORTING state,
onRun
is never scheduled)
I implemented the changes, and am working on adding a test for this behavior, but let me know if you don't agree with the proposal in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, added a test case as well.
3bffe84
to
41b76ab
Compare
41b76ab
to
dc76982
Compare
r+ dc76982 |
Resolves #2519