-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-top: limit jobs and summary to specific queue #4847
flux-top: limit jobs and summary to specific queue #4847
Conversation
c4fcebf
to
c7c1402
Compare
Haven't taken a look yet, but I did want to comment on this:
Hm, this might be non-trivial and also means this PR can't be merged until that support has landed in flux-sched. Did you considering doing the filtering in flux-top instead? This could greatly simplify the implementation, and would work without changing the scheduler status RPC. |
Yeah, I don't think there is a use case for multiple queues besides the obvious one of displaying all queues.
I think it would actually be very advantageous to filter jobs in |
i hadn't considered it, but after skimming |
that's a good idea, and it would actually allow us to "flip" through queues on sub-instances that are selected. let me look into doing it this way. |
c7c1402
to
e973814
Compare
re-pushed, implementing the "filtering" within I don't think I'll do the 'Q' (jump between queues) support in this PR, punt that to a follow up PR. |
We could probably punt the runtime queue selection for awhile. I was thinking to allocate left-arrow/right-arrow (maybe |
464ad37
to
4064136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested on my system with two queues and seems to work well!
A few comments inline.
src/cmd/top/joblist_pane.c
Outdated
@@ -146,6 +146,11 @@ void joblist_pane_draw (struct joblist_pane *joblist) | |||
"user", | |||
"uri", &uri) < 0) | |||
fatal (0, "error decoding a job record from job-list RPC"); | |||
|
|||
if (joblist->top->queue | |||
&& strcmp (joblist->top->queue, queue) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to convert to streq()
here (top.c already uses it).
src/cmd/top/summary_pane.c
Outdated
rl_constraint = rlist_copy_constraint (rl_all, | ||
sum->top->queue_constraint, | ||
&error); | ||
if (!rl_constraint) | ||
fatal (errno, "failed to create constrained rlist"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider including error.text
as part of the failure message?
src/cmd/top/summary_pane.c
Outdated
static int resource_count (json_t *o, | ||
static int resource_count (struct summary_pane *sum, | ||
json_t *o, | ||
const char *name, | ||
int *nnodes, | ||
int *ncores, | ||
int *ngpus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but since this function is counting resources returned from sched.resource-status
maybe the function declaration would be more clear with the response payload first (as before), then add a new json_t *constraint
argument rather than having the caller pass in all of struct summary_pane
when it's only used to find the constraint?
src/cmd/top/summary_pane.c
Outdated
if (sum->top->queue) { | ||
json_t *queues = NULL; | ||
json_t *q = NULL; | ||
size_t index; | ||
json_t *value; | ||
if (json_unpack (o, "{s:o}", "queues", &queues) < 0) | ||
fatal (0, "error getting job-list.job-stats queues"); | ||
if (json_is_array (queues) == 0) | ||
fatal (EPROTO, "job-list.job-stats queue not an array"); | ||
json_array_foreach (queues, index, value) { | ||
const char *name; | ||
if (json_unpack (value, "{s:s}", "name", &name) < 0) | ||
fatal (EPROTO, "job-list.job-stats queue name not found"); | ||
if (!strcmp (name, sum->top->queue)) { | ||
q = value; | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: move his block to its own function, something like
json_t *parse_queue_stats (json_t *o, const char *name)
and have it return NULL on failure, then handle the failure in one place as a protocol error rather than having three separate error message for what is basically the same unlikely problem.
if (!(f = flux_rpc (top->h, "config.get", NULL, FLUX_NODEID_ANY, 0)) | ||
|| flux_rpc_get_unpack (f, "o", &o) < 0) | ||
fatal (errno, "Error fetching flux config"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: fetch the config object and store it within struct top
before calling setup_constraint()
? Seems like it might come in handy later e.g. when multiple queues/constraints are supported.
src/cmd/top/top.c
Outdated
json_t *array; | ||
size_t index; | ||
json_t *value; | ||
if (!json_is_array (requires)) | ||
fatal (0, "queue requires configuration invalid"); | ||
if (!(array = json_array ())) | ||
fatal (0, "Error creating json array"); | ||
json_array_foreach (requires, index, value) { | ||
if (json_array_append (array, value) < 0) | ||
fatal (0, "Error appending to json array"); | ||
} | ||
if (!(top->queue_constraint = json_pack ("{s:o}", "properties", array))) | ||
fatal (0, "Error creating queue constraints"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: instead of copying requires
, just take a reference on it e.g.
constraint = json_pack ("{s:O}", "properties", requires);
4064136
to
b91f380
Compare
rebased and re-pushed with fixups |
b91f380
to
60bad34
Compare
I've lost my context on this one. Want to squash the fixups and I'll give it another pass? |
60bad34
to
8f0c752
Compare
squashed and repushed |
8f0c752
to
4903b9a
Compare
Codecov Report
@@ Coverage Diff @@
## master #4847 +/- ##
===========================================
+ Coverage 54.25% 82.82% +28.57%
===========================================
Files 401 425 +24
Lines 69596 74707 +5111
===========================================
+ Hits 37756 61873 +24117
+ Misses 31840 12834 -19006
|
4903b9a
to
3172067
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thanks!
Problem: In the near future we may need to parse optional fields from a job-list.job-stats response. This is difficult given that fields are directly parsed in flux_rpc_get_unpack(). Solution: Refactor to use flux_rpc_get_unpack() to get the whole payload of the job-list.job-stats response. Then use json_unpack() to parse that object.
Problem: There is no way to filter jobs and summary data in flux-top by queue. Solution: Support a --queue option. Fixes flux-framework#4606
Problem: flux-top's --queue option isn't documented. Add documentation in the flux-top(1) manpage.
Problem: There is no coverage for the flux-top --queue option. Add some test coverage in t2801-top-cmd.t.
3172067
to
3e056ec
Compare
per #4606. I'm really glad I added that testing in #4833, caught a couple of issues :P
i assume
--queue
doesn't recursively goto into sub-instances. Possibly sub-instances may have same queue names setup, but it's not what i think is desired by the--queue
option.right now
flux-top
determines if the "QUEUE" column should be displayed if any jobs have been submitted to any queue. Maybe this should be changed to just determining if any queues have been configured, i.e. ifqueue.*
actually is configured? While what is implemented is perhaps more "correct", its heavier weight and not sure its really needed except outside of testing scenarios (i.e. queue configuration changed between job submissions). But for the time being I left it in.will probably have to add support in
flux-sched
for theresource-status
RPC.i somewhat lazily chose not to add "queues" (plural) support. Didn't think there was a big use case at the moment.
big issue,
job-list
only filters jobs by single queue, so would have to update job-list module or do some type of "build up display of multiple responses". Edit: OR get all jobs and just filter them influx-top
, which dunno if I'd want to do that.getting resources of multiple queues should be easy, just add to the constraints list
tiny annoying to support in job-stats, b/c gotta add the queue stats together
I'm thinking I'll put into an issue as a longer-term todo.