New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-manager: add job submit debug flag #2047
Conversation
de49f62
to
f7a253c
Compare
In general, I really like the idea of Some general thoughts while poking at this PR -- a lot of these are more thinking out loud rather than directed comments
|
One other thought -- what if debug events were named |
Thanks for the feedback.
I did propagate the "submit flags" to the eventlog, and added them to the
Good idea, I'll make that change.
Good thought. There are some open issues on how to manage configuration that we may need to work through first though. This is a good use case to consider in that discussion...
It's a fair point, though I'm concerned about keeping the size of the job manager's
It would be valuable to be able to synchronize on these events in a sharness test, so maybe that's a good idea! |
So were you thinking that the current flux_future_t *flux_job_submit (flux_t *h, const char *jobspec,
int priority, int flags); should be changed to something like: flux_future_t *flux_job_submit (flux_t *h, const char *jobspec,
int priority, const char *flags_list); and that the list of flags would go right into the submit event as is? |
Ah, good point! I wonder if internally the job manager could still store the flags it cares about as a mask, but set the flags from a json array? At this point sending the integer flags is probably fine though! |
Hm, I wasn't thinking in terms of API but rather the message format. I'm not sure my idea translates well to that style of api sorry. |
My thinking got a bit off track here.. For any subsystem that is reading jobspec anyway, extra options and flags selected by a user can be placed in the |
Restarted ubuntu builder that failed here:
I'm not sure what is segfaulting, unless maybe it's the broker as the instance is shut down? |
Add a new flag to job.h for flux_job_submit(), FLUX_JOB_DEBUG, to enable eventlog debugging. Then add flux job submit [--flags=[debug,...]] option to allow it to be set on the command line at submit time. Fixes flux-framework#2033.
Add 'flags' to job's submit eventlog entry, and include flags with job-manager.submit RPC. Flags are checked for validity at ingest, and rejected immediately if invalid. Allow FLUX_JOB_DEBUG through.
Now that job-ingest is sending over submit flags and logging them with the submit event, the job-manager can capture these flags in 'struct job'. Update unit tests.
If job was submitted with debug flag, log debug.alloc-request and debug.free-request events to the job's eventlog when alloc and free requests are sent. These are in addition to the alloc and free events that are logged when the responses are received.
Enhance t2203-job-manager-dummysched.t sharness test: * submit all jobs with --flags=debug option * make existing event pattern matching more precise * with two state=SCHED jobs in queue, check that only one logged an 'debug.alloc-request' (mode=single)
0cdd897
to
c8abbaf
Compare
Want to add a |
I squashed down the incremental development and forced a push. |
Yes good idea! |
Add the same flag to submitbench that was just added to flux-job submit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to merge
Sounds good, thanks! |
Codecov Report
@@ Coverage Diff @@
## master #2047 +/- ##
==========================================
+ Coverage 80.4% 80.42% +0.02%
==========================================
Files 191 191
Lines 30240 30252 +12
==========================================
+ Hits 24313 24329 +16
+ Misses 5927 5923 -4
|
This PR adds a debug flag to the submission API, and a
--debug
option toflux job submit
as proposed in #2033.This causes the job manager to emit some "extra" events that are not justified for every job but which might be useful to provide context when debugging. For now, it just adds
alloc-request
andfree-request
debug events indicating that thesched.alloc
orsched.free
request has been sent. Normally events are only logged when the response is received.The sharness test with the dummy scheduler is modified to verify that these events are added as expected for sched mode=single.
This scheme may be too simple, but perhaps its a reasonable first step?