New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flux-job status for jobs with exceptions before start #2784
Conversation
LGTM, need to add "Fixes #2782" to commit message though. |
src/cmd/flux-job.c
Outdated
stat->status = 256; | ||
stat->exit_code = 1; | ||
stat->exception = true; | ||
strncpy (stat->ex_type, type, sizeof(stat->ex_type) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not a request for a change, but a question for my own understanding and edification)
If the length of type
is > sizeof(stat->ex_type) - 1
and stat
wasn't zero'd when alloc'd, then stat->ex_type
will not be NULL-terminated, right? Tracing back to stat
's allocation, it looks like it was calloc
d, so I don't think it is ultimately an issue here. For my own understanding though, is best-practice to NULL-terminate the string "just to be safe"? Or is it to fine to rely on structs and char[]
s to have been calloc
'd so that you don't have to worry about this edge-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's often fine to rely on zero-initialized buffers when it's obvious that's the case.
Either NUL terminate, or make sure the array was zeroed beforehand. In this
case we know the memory was initialized to zero.
Thought about just mallocing each type string as needed, but that would
require a conditional free at the end of cmd_status... Maybe less
controversial?
…On Fri, Feb 28, 2020, 12:18 PM Stephen Herbein ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/cmd/flux-job.c
<#2784 (comment)>
:
> +
+ if (json_unpack (context, "{s:s s:i s?:s}",
+ "type", &type,
+ "severity", &severity,
+ "note", ¬e) < 0)
+ log_err_exit ("error decoding exception context");
+ if (severity == 0) {
+ /* Note: the exit_code and status will be overridden
+ * by the finish event if this job is still running
+ * o/w, for a non-running job with a fatal exception
+ * the default exit code is simply 1.
+ */
+ stat->status = 256;
+ stat->exit_code = 1;
+ stat->exception = true;
+ strncpy (stat->ex_type, type, sizeof(stat->ex_type) - 1);
(Not a request for a change, but a question for my own understanding and
edification)
If the length of type is > sizeof(stat->ex_type) - 1 and stat wasn't
zero'd when alloc'd, then stat->ex_type will not be NULL-terminated,
right? Tracing back to stat's allocation, it looks like it was callocd
<https://github.com/grondo/flux-core/blob/04abb6288b3812154de4a2e29223e9f624be727a/src/cmd/flux-job.c#L1947>,
so I don't think it is ultimately an issue here. For my own understanding
though, is best-practice to NULL-terminate the string "just to be safe"? Or
is it to fine to rely on structs and char[]s to have been calloc'd so
that you don't have to worry about this edge-case?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2784?email_source=notifications&email_token=AAFVEUSWXAZVLGVSWDGWPETRFFWQNA5CNFSM4K5Z56JKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXNMNDI#pullrequestreview-366659213>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUR4SVO4KHKKTCVKBFDRFFWQNANCNFSM4K5Z56JA>
.
|
This looks good to me. I initially thought it may be better to return a higher exit code than 1 so that this kind of error can bubble up better in a tool like flux-tree. But then, that might just be too arbitrary. I'm good as is. |
Maybe an option to set the exit code for exceptions?
…On Fri, Feb 28, 2020, 1:51 PM Dong H. Ahn ***@***.***> wrote:
This looks good to me. I initially thought it may be better to return a
higher exit code than 1 so that this kind of error can bubble up better in
a tool like flux-tree. But then, that might just be too arbitrary. I'm good
as is.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2784?email_source=notifications&email_token=AAFVEUUO7AP3M55UN2JXKATRFGBOTA5CNFSM4K5Z56JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENKJJCY#issuecomment-592745611>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUTLZMCSIP43E4SVRQLRFGBOTANCNFSM4K5Z56JA>
.
|
If easy enough... yeah. I can definitely make use of this within |
9985b62
to
7f19833
Compare
Ok, I've added a fixup commit that adds |
LGTM! |
Codecov Report
@@ Coverage Diff @@
## master #2784 +/- ##
==========================================
+ Coverage 81.05% 81.05% +<.01%
==========================================
Files 250 250
Lines 39411 39428 +17
==========================================
+ Hits 31944 31958 +14
- Misses 7467 7470 +3
|
Problem: flux-job status works by waiting for the job finish event and grabbing the status key from its context. However, jobs that get an exception before start will not have a finish event. In these cases `flux job status` erroneously reports an exit status of 0. Additionally process job exceptions in flux-job status, and set the exit status to 1 or the value of a new option `--exception-exit-code` in these cases. Capture the exception type to report with `flux job status -v` if used. Fixes flux-framework#2782
Enhance t2501-job-status.t with jobs that get exceptions.
Ok, squashed the incremental development and force-pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Had to restart asan builder. 🙁 |
Thanks! |
Fix
flux job status
for jobs that get an exception before thestart
event (e.g. unsatisfiable request, or canceled).flux job status
only waits for thefinish
event, so currently reports any job without a finish event as exit code 0 (success). (As seen in flux-framework/flux-sched#602)