New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle workunits corresponding to canceled Nodes. #10659
Conversation
cf4989d
to
18ee626
Compare
WorkunitState::Completed { .. } => "Completed:", | ||
fn log_workunit_state(&self, canceled: bool) { | ||
let state = match (&self.state, canceled) { | ||
(_, true) => "Canceled:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this part of the codebase very well. Does it make sense to have WorkunitState::Canceled
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll eventually probably want to represent workunit failure as part of the state, but adding that variant to the type now means we have to modify code in the pathway to the streaming workunit handler right now. I think we still need to do a little bit of thinking about how we want to present failed workunits to plugins, and that can come as a follow-up commit to this one.
[ci skip-build-wheels]
[ci skip-build-wheels]
18ee626
to
722724e
Compare
sgtm |
Problem
cf. #10650 . When a node is canceled by the engine, the engine will remove the future associated with that node without communicating this fact to the workunit store. This means that the store has no idea that a Started but not yet Completed workunit is never going to complete.
Solution
This commit adds a
CanceledWorkunitGuard
in thewith_workunit
public interface to workunits. If a workunit completes normally, the last thing it will do is callnot_canceled()
on the guard, and the guard will do nothing. If the guard is dropped before this happens, though, itsDrop
implementation will call a cleanup function on the workunit store to remove the workunit from the heavy hitters and display a log message stating that the workunit has been canceled.