New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-list: allow list-id to wait for job state #5213
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup! Just a few minor comments.
@@ -39,7 +40,9 @@ def set_and_check_context(self, key, value, type): | |||
kd2 = flux.kvs.KVSDir(self.f) | |||
nv = kd2[key] | |||
if isinstance(value, bytes) and type is str: | |||
self.assertEqual(value.decode('utf-8'), nv) | |||
self.assertEqual(value.decode("utf-8"), nv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think normally these files are not subject to formatting by black
. That might be why the formatting changed Did you run black
directly on the file?
Actually, it could be a mistake that the python test files aren't being formatted. The .pre-commit-config.yaml
has:
files: '^(src/bindings/python/flux|src/cmd|t/.*\\.py)'
exclude: "^(src/bindings/python/_flux/|src/bindings/python/flux/utils/)"
So it does appear that inclusion of python files under t/
was intended. I think the double backslash is preventing this regex from matching though. I suppose we should fix this in a separate PR. I leave whether you keep the unrelated formatting change in this PR up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - I have a PR ready that fixes .pre-commit-config.yaml
and all the fallout...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I guess I ran black on it, leading to that change.
t/python/t0005-kvs.py
Outdated
elif isinstance(value, bytes) and type is dict: | ||
self.assertEqual(ast.literal_eval(value.decode("utf-8")), nv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment -- at least, I don't understand exactly why ast.literal_eval
is needed here.
Also instead of self.assertEqual
should you use self.assertDictEqual
?
goto enomem; | ||
|
||
return 0; | ||
|
||
enomem: | ||
zlistx_destroy (&list_isd); | ||
idsync_wait_list_destroy ((void **)&iwl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast to (void **)
is unnecessary here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i get a error/warning with it :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok, my bad then!
@@ -54,6 +54,7 @@ static struct idsync_data *idsync_data_create (flux_t *h, | |||
flux_jobid_t id, | |||
const flux_msg_t *msg, | |||
json_t *attrs, | |||
flux_job_state_t state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message: it might be fine, but I was stumbling over reading of:
Solution: Support an option job state in the job-list.list-id service.
Is it supposed to be either "Support an optional job state in the..." or maybe "Support a job state option in the..."?
@@ -145,6 +145,13 @@ static struct optparse_option list_inactive_opts[] = { | |||
OPTPARSE_TABLE_END | |||
}; | |||
|
|||
static struct optparse_option list_ids_opts[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message:
However, the flux job list-ids does not support anything to use it.
Suggestion: "the flux job list-ids command does not support any way to use it" ?
src/cmd/flux-job.c
Outdated
@@ -1471,12 +1480,27 @@ int cmd_list_ids (optparse_t *p, int argc, char **argv) | |||
if (!(h = flux_open (NULL, 0))) | |||
log_err_exit ("flux_open"); | |||
|
|||
/* if no job state specified by user, pick first job state of | |||
* depend, which means will return immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more correct to say "will return as soon as the job-list module is aware of the job" instead of "immediately"?
c3ef2db
to
3c0c043
Compare
rebased picking up the black fixes on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! LGTM!
This could actually be really handy even outside of testing, nice! |
Problem: In python/t0005-kvs.py an illegally stored json object was stored as binary and tested. A legally stored json object stored as binary was not tested. Add a legally stored json object in binary for coverage.
Problem: In t2800-jobs-cmd.t some convenience functions are implemented that are identical to some helper functions in helper files. Solution: Use the helper functions provided in job-list-helper.sh and remove the unnecessary function declarations.
Problem: In the idsync_add_waiter() function, the cleanup path cleaned up the wrong memory. Solution: Clean up the zlistx_t that was possibly created within the function, not the data structure passed in as a parameter.
Problem: In the idsync API a list is stored in a hash and an externally stored jobid in a list entry is used as the key. This currently is fine because the entire list is read back and destroyed in idsync_check_waiting_id(). However, if we ever want to read/delete some list entries but not others, the externally stored jobid makes the hash lookup unsafe, as the storage of the key can no longer be guaranteed. Solution: Store the list in a new struct that also stores the jobid. Also, adjust the name of a cleanup function as a result.
Problem: It'd be convenient if the job-list.list-id service not only waited for a job id to be legal but to also waited for the job to pass a certain job state. Solution: Support an optional job state in the job-list.list-id service. Return when a job passes this job state or reaches the inactive state. Fixes flux-framework#2864
Problem: The job-list.list-ids RPC now supports the ability to return after a specific job state has passed. However, the flux job list-ids command does not support any way to use it. Solution: Add a --wait-state option which allows the caller to tell list-ids to only return after a specific job state has been hit.
Problem: There is no coverage of job-list.list-ids and its ability to return only after a specific job state has been reached. Add coverage in t2260-job-list.t.
Problem: The job-list module has eventual consistency for the jobs that are submitted to the job-manager queue. A number of helper functions have been written to check for this consistency before running tests. These functions are no longer necessary due to the new `flux job list-ids --wait-state` option. Use the flux job list-ids --wait-state option when appropriate. Remove no longer necessary code.
3c0c043
to
75a64be
Compare
Codecov Report
@@ Coverage Diff @@
## master #5213 +/- ##
===========================================
- Coverage 94.35% 83.68% -10.67%
===========================================
Files 84 444 +360
Lines 7560 75766 +68206
===========================================
+ Hits 7133 63405 +56272
- Misses 427 12361 +11934
|
Problem: It'd be convenient if the
job-list.list-id
service not only waited for a job id to be legal but to also waited for the job to passa certain job state. Then it can also be used to "wait on"
job-list
for a number of eventual consistency checks in the testsuite.Fixes #2864
Also stuck on a testsuite coverage for a prior PR that I just forgot to do.