-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: ensure result in planner_avail_resources_at #1038
Conversation
Hold off on this for a minute. It works, but I think it works for the wrong reason. I'm pretty sure the planner is being asked for available resources at a negative time point. |
During restart, we can end up in a situation where a time before the planner's base is requested, in this case negative time, so there's no earlier time to return from get_state. It feels like get_state should always return *some kind of state*, but it's not clear how to do that and avoid accidentally pretending there are no resource available. This avoids asking for that invalid state by checking the precondition in the planner, if an `at` before the `plan_start` is requested it's treated as an invalid argument.
Ok, this version is better. I'm pretty sure there's still more to do to ensure planner always handles this right, but at least this case is addressed for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing shows this works and the reproducer that showed Fluxion couldn't restart with running jobs + queues now passes.
I went ahead added the script as an issues reproducer along with a new test t5100-issues-test-driver.t
(a copy of t4000-issues-test-driver.t
from flux-core).
Mind if we cherry pick that commit and add it to this PR?
f6d0367
Problem: There is no reproducer for issue flux-framework#1035: fluxion can't restart with queues enabled. Add a new test driver for issue reproducers: t5100-issues-test-driver.t Then add a reproducer script for flux-framework#1035 to the t/issues subdirectory.
I went ahead and pushed the testsuite commit on top here @trws. Thanks! |
Codecov Report
@@ Coverage Diff @@
## master #1038 +/- ##
========================================
- Coverage 74.4% 74.3% -0.1%
========================================
Files 86 86
Lines 9434 9434
========================================
- Hits 7021 7017 -4
- Misses 2413 2417 +4 |
During restart, we can end up in a situation where a time before the
planner's base is requested, in this case negative time, so there's no
earlier time to return from get_state. It feels like get_state should
always return some kind of state, but it's not clear how to do that
and avoid accidentally pretending there are no resource available. This
avoids asking for that invalid state by checking the precondition in the
planner, if an
at
before theplan_start
is requested it's treated asan invalid argument.
Fixes #1035 (comment)