New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: rework increment/decrement of running job counts #325
Conversation
Problem: The multi-factor priority plugin currently increments cur_run_jobs for a job when it enters job.state.priority. This is incorrect because a job will not necessarily enter RUN state after it receives a priority. Jobs enter SCHED state after receiving a priority and could stay there while waiting for the requested resources before actually running, or re-enter job.state.priority on a reprioritization of all jobs after a flux-accounting update is sent to the plugin, resulting in a double-increment of currently running jobs. Remove the increment of a user/bank combo's current running jobs count in priority_cb ().
Did you first try adding a callback for |
Oh, whoops. I didn't even listen to my own comment in that thread. 🤦 I'm sorry about that. Yeah, looks like adding it to |
OK, I think I've updated both the commit and PR description now to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 👍 Just some initial comments inline. Mostly very minor stuff.
src/plugins/mf_priority.cpp
Outdated
int userid; | ||
struct bank_info *b; | ||
|
||
flux_t *h = flux_jobtap_get_flux (p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is h
unused here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, h
is unused here, thanks for catching this. Just force-pushed a fix to remove this line from the callback.
src/plugins/mf_priority.cpp
Outdated
const char *topic, | ||
flux_plugin_arg_t *args, | ||
void *data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like some whitespace errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, thanks. I always forget to adjust the whitespace for these function headers when I copy/paste the callbacks. 🤦 Just pushed up a change to fix this.
src/plugins/mf_priority.cpp
Outdated
b = static_cast<bank_info *> (flux_jobtap_job_aux_get ( | ||
p, | ||
FLUX_JOBTAP_CURRENT_JOB, | ||
"mf_priority:bank_info")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, but this might be more readable as:
diff --git a/src/plugins/mf_priority.cpp b/src/plugins/mf_priority.cpp
index 79eebc9..7f6c61b 100644
--- a/src/plugins/mf_priority.cpp
+++ b/src/plugins/mf_priority.cpp
@@ -924,10 +924,10 @@ static int run_cb (flux_plugin_t *p,
flux_t *h = flux_jobtap_get_flux (p);
- b = static_cast<bank_info *> (flux_jobtap_job_aux_get (
- p,
- FLUX_JOBTAP_CURRENT_JOB,
- "mf_priority:bank_info"));
+ b = static_cast<bank_info *>
+ (flux_jobtap_job_aux_get (p,
+ FLUX_JOBTAP_CURRENT_JOB,
+ "mf_priority:bank_info"));
src/plugins/mf_priority.cpp
Outdated
if (flux_jobtap_job_event_posted (p, FLUX_JOBTAP_CURRENT_JOB, "alloc")) { | ||
b->cur_run_jobs--; | ||
b->cur_active_jobs--; | ||
} else { | ||
b->cur_active_jobs--; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this logic would be clearer by
- first decrement active jobs count, this makes it clear that count is always decremented.
- check if no
alloc
event posted, then nothing more to do, so return - proceed to decrement
cur_run_jobs
and check for any held jobs that need to be released
i.e.:
b->cur_active_jobs--;
// Nothing more to do if this job was never running
if (!flux_jobtap_job_event_posted (p, FLUX_JOBTAP_CURRENT_JOB, "alloc"))
return 0;
// This job was running, decrement running jobs count and check if a job can be released
b->cur_run_jobs--;
@@ -0,0 +1,110 @@ | |||
#!/bin/bash | |||
|
|||
test_description='Test comparing job counts when submitting jobs that take up all resources' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done. Did you verify that before the changes in this PR this test fails? (just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did - here are the results of running this test with the plugin before any changes:
query_1.json
expected
cur_running_jobs: 1
cur_active_jobs: 3
failed
cur_running_jobs: 3
cur_active_jobs: 3
query_2.json
expected
cur_running_jobs: 1
cur_active_jobs: 3
failed
cur_running_jobs: 5
cur_active_jobs: 3
query_3.json
expected
cur_running_jobs: 1
cur_active_jobs: 3
failed
cur_running_jobs: 8
cur_active_jobs: 3
query_4.json
expected
cur_running_jobs: 1
cur_active_jobs: 2
failed
cur_running_jobs: 7
cur_active_jobs: 2
query_5.json
expected
cur_running_jobs: 0
cur_active_jobs: 0
failed
cur_running_jobs: 5
cur_active_jobs: 0
Problem: The plugin needs a way to know when to increment the current running jobs count for a user/bank combo now that it does not increment the count when the job enters job.state.priority. Add a new callback to the plugin to increment the current running jobs count in job.state.run.
Problem: The inactive_cb () incorrectly handles decrementing job counts when a job enters job.state.inactive. The plugin currently decrements both running and active job counts even when a job did not run (e.g it was cancelled before ever receiving an "alloc" event). Rework the way the plugin decrements active and running job counts in inactive_cb (). Check if the job received an "alloc" event to determine if it actually ran. If True, decrement both the running and the active job counts, and proceed to check if there are any held jobs to release. If False, then the job never actually ran, so only decrement the active job count and return from inactive_cb ().
Add a sharness test that tests the scenarios brought up in issue#262, which revealed problems with how the plugin handled running and active job counts. Add tests in this file that check for the following: - Submit a job that is running and two more that are scheduled, but do not run. Check that job counts are correct (1 running, 3 active). - Run flux account-priority-update while the jobs are running/scheduled and ensure that job counts are correct (1 running, 3 active). - Change the priority of one of the scheduled jobs, reprioritize all jobs, and ensure job counts are still correct (1 running, 3 active). - Cancel one of the scheduled jobs and ensure job counts are still correct (1 running, 2 active). - Cancel the remaining running and scheduled jobs and ensure job counts are still correct (0 running, 0 active).
Thanks for the review @grondo. Just force-pushed some fixes to this PR based on your feedback above, including the whitespace adjustments and re-work to the logic in |
Codecov Report
@@ Coverage Diff @@
## master #325 +/- ##
==========================================
- Coverage 83.76% 83.67% -0.10%
==========================================
Files 23 23
Lines 1226 1231 +5
==========================================
+ Hits 1027 1030 +3
- Misses 199 201 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks! Will set MWP on this |
Problem
As noted in #262, the plugin currently incorrectly handles calculating running job counts for associations (user/bank combos). It increments the running job count in
job.state.priority
, which could result in a double increment if the job re-enters PRIORITY state after a reprioritization of jobs viaflux account-priority-update
.The plugin also incorrectly handles decrementing running and active job counts when a job enters
job.state.inactive
. Even if a job did not run, the plugin still decrements the association's current running jobs count and looks to release any held jobs from a max running jobs dependency, which could result in incorrect running job counts.This PR looks to rework the way the plugin increments and decrements running and active job counts for an association. It adds a new callback for when a job enters
job.state.run
and increments the current running jobs count there.It also reworks the way the plugin decrements active and running job counts in
inactive_cb ()
. It checks if the job received analloc
event to determine if it actually ran. If True, it decrements both the running and the active job counts, and proceeds to check if there are any held jobs to release. If False, then the plugin determines that the job never actually ran, so it only decrements the active job count andreturn
s frominactive_cb ()
.A new sharness test is also added that tests some of the problematic scenarios brought up in #262. Specifically, the test submits one job to run and two jobs that remain in SCHED state. It then simulates a couple of different cases and checks that the active and running job counts are expected. The scenarios include:
flux account-priority-update