New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qmanager: defer jobs with unmatchable constraints #1188
qmanager: defer jobs with unmatchable constraints #1188
Conversation
9a54116
to
0c792c1
Compare
I see the coverage failure, and will see what I can do about that, but that's going to be a few days. Looks like the uploader we're using is deprecated, and the actual error is because the uploader is using too old a version of gcov compared to the version of gcc on bookworm, but it only matters on qmanager for some reason. 🤷 |
We switched to |
That would be awesome if you could @grondo! |
0c792c1
to
7fea593
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, I'm not that familiar with the code in question, but these look like quite clever fixes!
Not sure if you came up with any way to test the deferral of unreservable jobs due to other jobs with no timelimit though. Would it be enough to submit a job without a timelimit and using all resources, submit another job, then cancel the first job and ensure the 2nd is started? (I'm sure I'm oversimplifying, apologies if so).
I think I have a test to use, I just have to incorporate it into sharness. Thankfully the unreservable aspect is an easy test, submit two jobs requiring the same resource with no time limit first one that runs at least until the second is considered, wait for both. If they both succeed it's all good. The performance regression case I'm not sure how to test in a way that's reliable. Maybe with the updates @milroy is adding for stats, that way we could watch for number of failed matches and see if it's repeatedly trying to schedule when it can't? We may have to revisit that one. |
Ok, added the test I was using locally basically as-is. I didn't see anywhere that it clearly fit, but if anyone knows a spot I can merge it into some other file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few nits which should be quick to fix.
I do have some questions about the behavior of some of the checks in this PR. I don't think they warrant changes, but I'd like to understand the behavior a bit better before the PR gets merged.
8f8c5bf
to
4b75098
Compare
Ok, I think this is all cleaned up. Here are the highlights:
|
4b75098
to
790f16b
Compare
resource/traversers/dfu.cpp
Outdated
// no schedulable point found even at the end of the time, return EBUSY | ||
errno = EBUSY; | ||
return -1; | ||
} | ||
if (*at < 0 or *at >= graph_end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this really matters, but *at == graph_end
satisfies both the if
condition above and this one.
if (*at < 0 or *at >= graph_end) { | |
if (*at < 0 or *at > graph_end) { |
This looks cleaner and easier to understand; thanks! I also like that you renamed the I'm wondering if there is a way to include a test for jobs with unsatisfiable constraints being reconsidered many times. You could submit a few jobs that |
I was thinking much the same. If we can get the stats PR in today it would be much easier to write a deterministic test for it with our current setup. In fact I think my existing test would work, it just needs some extra jobs added and a stats check.
…---
Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer>
On May 1, 2024 at 12:21:52 AM PDT, Daniel Milroy ***@***.***> wrote:
This looks cleaner and easier to understand; thanks! I also like that you renamed the ov variable. Fluxion has too many inscrutable variables.
I'm wondering if there is a way to include a test for jobs with unsatisfiable constraints being reconsidered many times. You could submit a few jobs that require a down node and then undrain the node. After the stats update PR #1187<https://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/pull/1187__;!!G2kpM7uM-TzIFchu!3ysUsHM_4wNKdWvUhB_nrCia0qqXTkYf2ZDn8JgwDElXd9UmwK_pVHSCXY1ndll2B0KHnw_OSZqJoantred-UCnycl4$> gets merged you could then check for the number of failed matches.
—
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/pull/1188*issuecomment-2088084833__;Iw!!G2kpM7uM-TzIFchu!3ysUsHM_4wNKdWvUhB_nrCia0qqXTkYf2ZDn8JgwDElXd9UmwK_pVHSCXY1ndll2B0KHnw_OSZqJoantred-72JDVLo$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNIZUJDBPU5LO24BO3TZACJYRAVCNFSM6AAAAABG3XX4EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGA4DIOBTGM__;!!G2kpM7uM-TzIFchu!3ysUsHM_4wNKdWvUhB_nrCia0qqXTkYf2ZDn8JgwDElXd9UmwK_pVHSCXY1ndll2B0KHnw_OSZqJoantred-c34ixAk$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
abd0f6e
to
d422100
Compare
The stats PR got merged. I agree that just adding some extra jobs should be sufficient to test this. |
b4c8be7
to
8fd75e0
Compare
Ok, the test has been extended. We'll have to see how well it holds. As far as I understand the system, it should reliably hit 10, but it's possible it's partially timing dependent. If we find that it's unreliable we may want to adjust the test to <= something. |
8fd75e0
to
29063a7
Compare
@milroy, any chance I could convince you to take a (hopefully) last pass over here? I think this is about where we need it. |
@trws: yes, looking now. |
29063a7
to
89f38b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few commit messages need to be cleaned up, but once that's done I think this PR is ready to merge.
Great job fixing this tricky problem!
problem: Jobs with constraints that can't be matched because nodes are down or drained are currently considered every time we enter the scheduling loop. If they reach the head of the queue, which is likely because we currently only configure one sched queue, they get re-considered over and over despite the fact they can't run, which greatly slows down scheduling, and can cause severe blocking observed up to 20 seconds of delay for a single submission. solution: Add a new "m_blocked" member to the qmanager base which holds jobs which return EBUSY from an alloc_orelse_reserve. This state can only happen when the job is blocked by a constraint requiring a node in an unusable state. The jobs in m_blocked are moved into m_pending (ready to be considered) by the notify callback. Currently they are moved regardless of what status changes occurred (a node being drained causes them to move) but it's a relatively small cost to move them back afterward and simplifies the logic considerably. The duration for 0.0 duration jobs is now set to the remaining time rather than the total time during meta build time.
problem: "ov" is meaningless and impenetrable solution: expand it to overhead
problem: some parsers of the presets format break with includes that don't exist solution: remove them
problem: cmake defaults to the Debug build type, which turns off many optimizations by default, making profiling and importantly some debugging tasks much harder. solution: use the RelWithDebInfo build type, which more closely matches the default used by autoconf in core, and should be used by default.
problem: jobs that are satisfiable but unreservable, in queues that use reservation, would fail with either a match failure or an EINVAL return from resource. This seems to only happen when `schedule` returns -1, an errno of ENOENT and the `at` value equal to the end of the graph. Solution: Rather than making that EINVAL, it now passes out EBUSY, which causes the new code for blocked constraints to pull the job out of pending and place it in blocked. To allow the job to proceed, we move blocked jobs back into the pending queue whenever a job is removed from the queue due to completion or cancellation. We probably consider them more than necessary, but this seems to solve the issue and still preserves the performance improvement.
problem: ensure we don't end up with unreservable jobs treated as unsatisfiable again solution: add a test that runs three jobs that are only allowed to run on the same resource, and each take all remaining time. This forces jobs two and three to be unreservable until one completes. It's technically sensitive to timing, but in this constrained case I would be surprised if it proves to be flaky. Still should set something up we can use in sharness for setting up jobs that wait on a trigger on a domain socket so we can easily make this kind of test fully deterministic.
problem: while the sched PYTHONPATH is set reliably by a mix of cmake and maybe-installtest, the flux PYTHONPATH is not solution: wrap the test in an invocation of `flux env` to set the PYTHONPATH appropriately allowing the python shebang in python tests to succeed fixes flux-framework#1172
problem: we didn't have a test to reproduce the issue with blocked jobs being constantly reconsidered solution: with the new failed stats support, after the fix there should be no more than 10 failures to match, 14 is somewhat deterministic for this test if the issue comes back.
89f38b7
to
80457fa
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1188 +/- ##
========================================
Coverage ? 74.0%
========================================
Files ? 102
Lines ? 14611
Branches ? 0
========================================
Hits ? 10822
Misses ? 3789
Partials ? 0
|
This is the in-progress PR for the constraint job blocking problem. Full description below, but we're still lacking two things I really want to have:
problem: Jobs with constraints that can't be matched because nodes are
down or drained are currently considered every time we enter the
scheduling loop. If they reach the head of the queue, which is likely
because we currently only configure one sched queue, they get
re-considered over and over despite the fact they can't run, which
greatly slows down scheduling, and can cause severe blocking observed up
to 20 seconds of delay for a single submission. This change also
exposed a bug with the duration calculation for duration=0.0 jobs, which
used to be set to the full duration of the graph rather than the
remaining duration of the graph.
solution: Add a new "m_blocked" member to the qmanager base which holds
jobs which return EBUSY from an alloc_orelse_reserve. This state can
only happen when the job is blocked by a constraint requiring a node in
an unusable state. The jobs in m_blocked are moved into m_pending
(ready to be considered) by the notify callback. Currently they are
moved regardless of what status changes occurred (a node being drained
causes them to move) but it's a relatively small cost to move them back
afterward and simplifies the logic considerably. The duration for 0.0
duration jobs is now set to the remaining time rather than the total
time during meta build time.