New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-manager: fix for job priority not reset after a duplicate urgency update #4941
Conversation
Problem: The job manager prioritization code contains an optimization that avoids posting a priority event if the priority did not change after reprioritization. However, this causes jobs that are in PRIORITY state to get stuck there, since they normally transition to SCHED as a result of the priority event. This occurs if a job's urgency is updated to the same value, e.g. if a held job is held again. Force a priority event to be emitted in states other than SCHED, even if the priority is unchanged. This allows jobs kicked back to PRIORITY state to transition to SCHED, even if a duplicate urgency value is set. Fixes flux-framework#4940
t/t2214-job-manager-hold-limited.t
Outdated
test_expect_success 'job-manager: hold job 3, 5 again (issue #4940)' ' | ||
flux job urgency $(cat job3.id) 16 && | ||
flux job urgency $(cat job3.id) 16 && | ||
flux job urgency $(cat job3.id) 16 && | ||
flux job urgency $(cat job3.id) 16 && | ||
flux job urgency $(cat job5.id) 16 | ||
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be urgency 0/hold instead of 16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! 🤦 nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed that blunder and force pushed the result.
Problem: None of the job manager job "hold" tests have a reproducer for holding an already held job as described in issue flux-framework#4940. Sprinkle some `flux job urgency id 0` in the tests on already held jobs to act as a reproducer for flux-framework#4940.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
Thanks! I've set MWP. |
This PR fixes the issue described in #4940, and adds some tests to the
hold
suite of sharness tests to set urgency to hold already held jobs to ensure we don't see this particular issue again.