Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
libschedutil: extend hello protocol #2231
Per discussion in flux-framework/flux-sched#493.
The job-manger - scheduler "hello" handshake supports scheduler state recovery. The initial interface in libschedutil issued a callback containing only
In addition to simply marking resources allocated, flux-sched wants to reconstruct queue entries for running jobs, so it needs the same metadata that was passed in with the alloc request. This is low overhead since the data is part of the active job record that job-manager keeps in memory. This PR extends the hello handshake to send an array of "job objects" instead of an array of job ID's. The job object contains job ID, priority, userid, and submit time.
The hello_f callback prototype is expanded to include the new information in addition to
Problem: flux-sched wants to reconstruct queue entries for running jobs when restarting, but "hello" response only includes an array of job ID's. Return an array of job objects in response to the hello request, and in each object include the same data that would have been received in an alloc request. Update the hello response parser in libschedutil to parse the new message format.
Add job id, priority, userid, and submit time to the hello_f callback, now that this information is supplied in the hello response message. Update users in sched-simple. Update sched-dummy test code.
grondo left a comment
This is great!
In the interest of expediency I don't necessarily suggest it here, but even though sched-dummy doesn't use the extra metadata it might be good to add some sanity checks in its
@@ Coverage Diff @@ ## master #2231 +/- ## ========================================== - Coverage 80.72% 80.71% -0.02% ========================================== Files 202 202 Lines 32295 32297 +2 ========================================== - Hits 26069 26067 -2 - Misses 6226 6230 +4