-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-manager: drop sched.free response requirement #5786
Conversation
Problem: `flux queue status -v` shows a count of pending sched.free requests, but once sched.free no longer requires a response, they will never be "pending". Drop pending free requests from the output. Update sharness tests that expected that in the output. Drop `free_pending` from the job-manager.alloc-query response payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I guess it isn't an issue if the scheduler has an error handling a free request since we couldn't/didn't do anything with the error before?
Problem: sched.free requires a response, but this adds complexity to the job manager and has little benefit. Code that drops the sched.free response from flux-core is proposed in flux-framework/flux-core#5786. Drop the free response.
Oof just noticed a typo - an extra log message that shouldn't be there. Will force push. Before this, a free error would cause the scheduler interface to be torn down (scheduler stopped). Now we just leave it to schedulers to log something. I think that is probably fine - the scheduler can choose to abort if it wants to, or log. (They both currently do log at LOG_ERR level anyway). |
Problem: the sched.free RPC requires a response but the response is unnecessary for correct job management. Moreover, the coupling between the RPC response and the job state machine makes some anticipated changes harder to design. Make sched.free a "fire and forget" RPC. Simply post the free event immediately instead of deferring it until the response is received. Update libschedutil to make schedutil_free_respond() a no-op. Update sched-simple to not directly respond when there is a free error. This change should be accompanied by an update to RFC 27.
c1d21db
to
a1abce4
Compare
Thanks - I'll set MWP. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #5786 +/- ##
===========================================
+ Coverage 54.85% 83.32% +28.47%
===========================================
Files 461 509 +48
Lines 74808 82479 +7671
===========================================
+ Hits 41033 68724 +27691
+ Misses 33775 13755 -20020
|
Problem: the
sched.free
RPC requires a response but the response is unnecessary for correct job management.Moreover, the coupling between the RPC response and the job state machine makes some anticipated changes harder to design.
Make
sched.free
a "fire and forget" RPC. Simply post thefree
event immediately instead of deferring it until the response is received..Update
flux queue status -v
output to not show pending free rpcs, since they can no longer be pending.This change should be accompanied by an update to RFC 27.