Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: rework increment/decrement of running job counts #325

Merged
merged 4 commits into from Mar 14, 2023

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Mar 14, 2023

Problem

As noted in #262, the plugin currently incorrectly handles calculating running job counts for associations (user/bank combos). It increments the running job count in job.state.priority, which could result in a double increment if the job re-enters PRIORITY state after a reprioritization of jobs via flux account-priority-update.

The plugin also incorrectly handles decrementing running and active job counts when a job enters job.state.inactive. Even if a job did not run, the plugin still decrements the association's current running jobs count and looks to release any held jobs from a max running jobs dependency, which could result in incorrect running job counts.


This PR looks to rework the way the plugin increments and decrements running and active job counts for an association. It adds a new callback for when a job enters job.state.run and increments the current running jobs count there.

It also reworks the way the plugin decrements active and running job counts in inactive_cb (). It checks if the job received an alloc event to determine if it actually ran. If True, it decrements both the running and the active job counts, and proceeds to check if there are any held jobs to release. If False, then the plugin determines that the job never actually ran, so it only decrements the active job count and returns from inactive_cb ().

A new sharness test is also added that tests some of the problematic scenarios brought up in #262. Specifically, the test submits one job to run and two jobs that remain in SCHED state. It then simulates a couple of different cases and checks that the active and running job counts are expected. The scenarios include:

  • updating the plugin with flux-accounting DB information while jobs are running/scheduled via flux account-priority-update
  • changing the priority of a scheduled job while jobs are running/scheduled
  • canceling one of the scheduled jobs
  • canceling both running/scheduled jobs

Problem: The multi-factor priority plugin currently increments
cur_run_jobs for a job when it enters job.state.priority. This is
incorrect because a job will not necessarily enter RUN state after it
receives a priority. Jobs enter SCHED state after receiving a priority
and could stay there while waiting for the requested resources before
actually running, or re-enter job.state.priority on a reprioritization
of all jobs after a flux-accounting update is sent to the plugin,
resulting in a double-increment of currently running jobs.

Remove the increment of a user/bank combo's current running jobs count
in priority_cb ().
@cmoussa1 cmoussa1 added improvement Upgrades to an already existing feature bug-fix A proposal for something that isn't working labels Mar 14, 2023
@grondo
Copy link
Contributor

grondo commented Mar 14, 2023

It adds a new handler for a job's alloc event by subscribing to all events for the job in job.new. Once the job receives the alloc event, the current running jobs counter for the association is incremented. The addition of this new handler removes the need to increment the running jobs count in job.state.priority.

Did you first try adding a callback for job.state.run and increment the running jobs counter there? (This is the approach noted in this comment) That might be more lightweight than subscribing to all job events.

@cmoussa1
Copy link
Member Author

Did you first try adding a callback for job.state.run and increment the running jobs counter there? (This is the approach noted in #262 (comment)) That might be more lightweight than subscribing to all job events.

Oh, whoops. I didn't even listen to my own comment in that thread. 🤦 I'm sorry about that. Yeah, looks like adding it to job.state.run works, so I'll update the commit and PR description right now and force up a push. Thanks for reminding me!

@cmoussa1
Copy link
Member Author

OK, I think I've updated both the commit and PR description now to use job.state.run instead of adding a new event handler to the plugin. I think I'll mark this as ready for review whenever convenient.

@cmoussa1 cmoussa1 requested a review from grondo March 14, 2023 15:56
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍 Just some initial comments inline. Mostly very minor stuff.

int userid;
struct bank_info *b;

flux_t *h = flux_jobtap_get_flux (p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is h unused here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, h is unused here, thanks for catching this. Just force-pushed a fix to remove this line from the callback.

Comment on lines 918 to 920
const char *topic,
flux_plugin_arg_t *args,
void *data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some whitespace errors here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, thanks. I always forget to adjust the whitespace for these function headers when I copy/paste the callbacks. 🤦 Just pushed up a change to fix this.

Comment on lines 927 to 930
b = static_cast<bank_info *> (flux_jobtap_job_aux_get (
p,
FLUX_JOBTAP_CURRENT_JOB,
"mf_priority:bank_info"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, but this might be more readable as:

diff --git a/src/plugins/mf_priority.cpp b/src/plugins/mf_priority.cpp
index 79eebc9..7f6c61b 100644
--- a/src/plugins/mf_priority.cpp
+++ b/src/plugins/mf_priority.cpp
@@ -924,10 +924,10 @@ static int run_cb (flux_plugin_t *p,
 
     flux_t *h = flux_jobtap_get_flux (p);
 
-    b = static_cast<bank_info *> (flux_jobtap_job_aux_get (
-                                                    p,
-                                                    FLUX_JOBTAP_CURRENT_JOB,
-                                                    "mf_priority:bank_info"));
+    b = static_cast<bank_info *>
+        (flux_jobtap_job_aux_get (p,
+                                  FLUX_JOBTAP_CURRENT_JOB,
+                                  "mf_priority:bank_info"));

Comment on lines 986 to 993
if (flux_jobtap_job_event_posted (p, FLUX_JOBTAP_CURRENT_JOB, "alloc")) {
b->cur_run_jobs--;
b->cur_active_jobs--;
} else {
b->cur_active_jobs--;

return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this logic would be clearer by

  1. first decrement active jobs count, this makes it clear that count is always decremented.
  2. check if no alloc event posted, then nothing more to do, so return
  3. proceed to decrement cur_run_jobs and check for any held jobs that need to be released

i.e.:

    b->cur_active_jobs--;
    // Nothing more to do if this job was never running
    if (!flux_jobtap_job_event_posted (p, FLUX_JOBTAP_CURRENT_JOB, "alloc"))
        return 0;

    // This job was running, decrement running jobs count and check if a job can be released
    b->cur_run_jobs--;

@@ -0,0 +1,110 @@
#!/bin/bash

test_description='Test comparing job counts when submitting jobs that take up all resources'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Did you verify that before the changes in this PR this test fails? (just curious)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did - here are the results of running this test with the plugin before any changes:

query_1.json

expected

cur_running_jobs: 1
cur_active_jobs: 3

failed

cur_running_jobs: 3
cur_active_jobs: 3


query_2.json

expected

cur_running_jobs: 1
cur_active_jobs: 3

failed

cur_running_jobs: 5
cur_active_jobs: 3


query_3.json

expected

cur_running_jobs: 1
cur_active_jobs: 3

failed

cur_running_jobs: 8
cur_active_jobs: 3


query_4.json

expected

cur_running_jobs: 1
cur_active_jobs: 2

failed

cur_running_jobs: 7
cur_active_jobs: 2


query_5.json

expected

cur_running_jobs: 0
cur_active_jobs: 0

failed

cur_running_jobs: 5
cur_active_jobs: 0

Problem: The plugin needs a way to know when to increment the current
running jobs count for a user/bank combo now that it does not increment
the count when the job enters job.state.priority.

Add a new callback to the plugin to increment the current running jobs
count in job.state.run.
Problem: The inactive_cb () incorrectly handles decrementing job counts
when a job enters job.state.inactive. The plugin currently decrements
both running and active job counts even when a job did not run (e.g it
was cancelled before ever receiving an "alloc" event).

Rework the way the plugin decrements active and running job counts in
inactive_cb (). Check if the job received an "alloc" event to determine
if it actually ran. If True, decrement both the running and the active
job counts, and proceed to check if there are any held jobs to release.
If False, then the job never actually ran, so only decrement the active
job count and return from inactive_cb ().
Add a sharness test that tests the scenarios brought up in issue#262,
which revealed problems with how the plugin handled running and active
job counts. Add tests in this file that check for the following:

- Submit a job that is running and two more that are scheduled, but do
not run. Check that job counts are correct (1 running, 3 active).
- Run flux account-priority-update while the jobs are running/scheduled
and ensure that job counts are correct (1 running, 3 active).
- Change the priority of one of the scheduled jobs, reprioritize all
jobs, and ensure job counts are still correct (1 running, 3 active).
- Cancel one of the scheduled jobs and ensure job counts are still
correct (1 running, 2 active).
- Cancel the remaining running and scheduled jobs and ensure job counts
are still correct (0 running, 0 active).
@cmoussa1
Copy link
Member Author

Thanks for the review @grondo. Just force-pushed some fixes to this PR based on your feedback above, including the whitespace adjustments and re-work to the logic in inactive_cb ().

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #325 (1323fce) into master (77733b2) will decrease coverage by 0.10%.
The diff coverage is 77.77%.

❗ Current head 1323fce differs from pull request most recent head 6ca6866. Consider uploading reports for the commit 6ca6866 to get more accurate results

@@            Coverage Diff             @@
##           master     #325      +/-   ##
==========================================
- Coverage   83.76%   83.67%   -0.10%     
==========================================
  Files          23       23              
  Lines        1226     1231       +5     
==========================================
+ Hits         1027     1030       +3     
- Misses        199      201       +2     
Impacted Files Coverage Δ
src/plugins/mf_priority.cpp 84.38% <77.77%> (-0.38%) ⬇️

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cmoussa1
Copy link
Member Author

Thanks! Will set MWP on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix A proposal for something that isn't working improvement Upgrades to an already existing feature merge-when-passing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants