-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-manager / job-info: get job data changes from job-manager to job-info so job listings will have up to date information #3208
job-manager / job-info: get job data changes from job-manager to job-info so job listings will have up to date information #3208
Conversation
Just read through #3052 again.
That makes a lot of sense. The one thought that occurs to me regarding a combined protocol for "job updates" is that we did invest some design effort already in job eventlogs. Maybe rather than creating a new update protocol, we should just publish eventlog entries (adding jobid to context)? For that matter, would it be better to just "cc" all job eventlog updates to an in-memory circular buffer in the job manager that job-info could consume via a streaming RPC? Call it the job manager's "journal". Published events are easy to work with but the big downside is that they are broadcast across the instance, for naught if there is only the one consumer. For the annotations and upcoming fair share priority changes that probably would not be appropriate to land in the job's eventlog, we still add an eventlog entry to the in-memory journal? Edit: a generic API for consuming eventlogs could potentially be used by job-info to consume this "eventlog" too. |
I like this idea. The eventlog entry is most of what is needed anyways for the update, so it makes sense.
At one point in time I did something similar to this for the annotations. IIRC it worked, but I disliked it b/c unlike most streaming RPCs this streaming RPC had to "last forever". i.e. gotta always re-connect to
Sounds like a decent idea. |
OK, let me open an issue on that idea. |
b2eefb4
to
6908a65
Compare
re-pushed with a rough implementation (needs to be cleaned up) where the job-info module gets information about job priority changes via a new Currently the streaming RPC sends out one eventlog entry every time there is a priority change. This can be bad / slow b/c there can be a resulting re-sort of the pending job queue in the job-info module everytime it receives a priority change. I was going to update the job-manager to "batch" send priority changes, not so dissimilar from the job-manager's current job state transitions batching. BUT then I realized, this same problem exists in the job-manager. Changing a job's priority is a single RPC ( As discussed in #3052 there are multiple issues in play. How to update many priorities in the job-manager at once (RPC wise and internal data structure wise), how to send that data to the job-info manager (both to be done + done efficiently), and how to re-sort in the job-info module (data structure wise). Many of these issues / solutions are still TBD. So I think that the current non-optimal implementation may be a good round 1 implementation to atleast solve the initial problem brought up in #3052. But we should split out new issues for the performance issue on multiple fronts. I suspect how to update a large number of job priorities at one time in the job-manager will lead us to how to update large number of job priorities in job-info as well. |
If the priority of one job changes, the reordering of the alloc queue will not change hte priorities of other jobs, so that's still one event, right? |
@garlick correct. Sorry, perhaps I didn't explain my thoughts correctly. Basically the issue I was thinking about was, this is how to change the priority of multiple jobs right now (obviously in command line form, but underneath its just the RPCs):
this will lead to 5 re-sorts of the alloc queue. Which is probably not good. And in my PR the above will lead to 5 events sent to the job-info module, leading to 5 re-sorts of the job listing pending queue. Also probably not good. |
The list is stored sorted, so it's just a remove and insert, lt least in the job manager case. We might be generating many eventlog entries in this "journal" thing if we change a lot of priorities at once, e.g. via bank scheduling, but I'm not super worried about say thousands of short messages sent point to point on the same node. Maybe I'm not envisioning the full horror though. (Like with a million jobs in the queue, and updating the banks ever 5 minutes?) As you say batching can help... |
p.s. I was picturing that we would just add code to send all events that go to job eventlogs to this journal, and then we have a generic method of tracking all changes to all jobs, rather than a one off for priority. Maybe that's where you're going but wouldn't it be easier to just attach to the current batch mechanism and send out all events now? |
@garlick ohhh, you call |
hmmm, I didn't think about that, b/c that's a lot of events given the majority aren't needed by job-info. Perhaps if there was a filtering mechanism, so that only certain eventlog entries are sent out on the "updates" requests? |
Great idea. A filter option in the request (like allow/deny list) makes a lot of sense! |
f10c7c8
to
54d3a8d
Compare
Just re-pushed. This version renames the I think this infrastructure works out really well with the circular buffer in #3209 being considered for the future.
|
54d3a8d
to
38432db
Compare
That sounds great. I was picturing using the actual newline separated, RFC-defined eventlog format in basically a cbuf, but I'm on board with this approach if it works out better. Hopefully the events in the array are objects not string-encoded JSON? (Double encoding JSON should probably be an official anti-pattern by now :-) |
Yup, arrays of objects, and super cheap to just The one inefficiency is the regularly looping over the "filters" for comparison.
I could do some hashing tricks in the future to eliminate the regular looping, but figure this was fine for now. |
38432db
to
d911885
Compare
Just re-pushed, doing some cleanup + fixes + adding more tests. I removed WIP b/c we're approaching something that I think will be our final solution. |
e4369ab
to
b5d22ae
Compare
re-pushed adding 1 more test for "reconnection" of the job-info module to the job-manager being reloaded. Perhaps this additional test was overkill. |
LMK when you're ready for a review @chu11. |
b5d22ae
to
cdd06d4
Compare
Call flux_log() instead of flux_log_error() on functions that do not set errno.
Add forgotten unsubscribe of 'job-annotations' during teardown.
Do not re-order the alloc queue if the priority of the job did not change with a priority change request.
Move the job-manager.disconnect event from wait.c to job-manager.c in preparation for additional disconnect needs in the future.
When first loading job data for listings, in addition to parsing the "submit" event, also search for and look for "priority" events, just in case the job has updated its priority.
Support new job-manager.events rpc. The rpc will stream job eventlogs as they occur. Streamed events can be configured via a filters option in the RPC. Support an associated job-manager.updates-cancel RPC as well.
Add job manager stats callback to output the number of listeners currently active in the job-manager.
Store the timestamp when a priority of a job was changed.
I hope so, otherwise this mechanism wasn't the good idea I thought it was... |
5f0ad64
to
30d15a8
Compare
Request and listen to job-manager events from the job-manager.events streaming RPC. Filter out all events except for the "priority" event. Handle changes to job priorities and re-order pending queue as needed. Fixes flux-framework#3052
With the job-info module now dependent on the job-manager.event RPC stream, the job-info module needs to be loaded after the job-manager module is loaded. To avoid unnecessary errors, the job-info module is now unloaded before the job-manager as well.
With job-info module now dependent on job-manager, update tests for this fact.
Add new tests to test the job-manager.events and job-manager.events-cancel handlers. Add new testing tool t/job-manager/event_stream to stream events from the job-manager.
Add new tests to ensure that priority changes made to jobs in the job-manager eventually reach the job-info module.
30d15a8
to
bb888be
Compare
re-pushed with all of the fixes talked about above, squashed |
It minimally solves the "update job-info with info about changes in the job-manager" which is the important thing. I think it'll work out, is just 1 or 2 steps ahead and hard to see the end at the moment. |
Codecov Report
@@ Coverage Diff @@
## master #3208 +/- ##
========================================
Coverage 81.40% 81.40%
========================================
Files 292 292
Lines 44398 44632 +234
========================================
+ Hits 36141 36334 +193
- Misses 8257 8298 +41
|
If you're ready I'm fine with MWP! |
@garlick thanks |
Per discussion in #3052, the
job-manager
did not have a way to inform thejob-info
module that a job's priority had changed, thus leading to invalid job listings.Several issues were discussed in #3052, this PR just tried to look at the notification of the priority change between
job-manager
&job-info
.For prototyping purposes, I implemented about the most straightforward thing I could think of, adding a new
job-updates
event that thejob-manager
publishes. It publishes a change record of "jobid, timestamp, field that changed, value", such as "JOBID, TIMESTAMP, "priority", 14" would be a potential change. I transfer the "value" as a json object. In the future, I believe a json null can serve as "cache invalidate", when the data that has changed isn't known to thejob-manager
.In the
job-info
module, it has to keep track of the timestamp of when thepriority
field was last set, to avoid any racing that can occur.b/c the priority field is (currently) the only job data that can change, I think this implementation is ok. But as more data can change, this approach will become cumbersome, managing timestamps of when all of the job data was last updated. So a refactoring would be necessary.
The
job-updates
andjob-annotations
events could be combined into one event. If we go with this approach, I think that would be a commit to add to the top of this series.Any thoughts / comments?