Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-info: refactor #2112

Merged
merged 9 commits into from Apr 4, 2019

Conversation

@chu11
Copy link
Contributor

commented Apr 2, 2019

spliced out from #2107, minor cleanup / fixes / new tests and refactoring in preparation for the big work in #2107.

Respond to rpc with errno before cleanup, otherwise errno could
be set to the wrong value.
@chu11 chu11 force-pushed the chu11:jobinfo-refactor branch from 7401c58 to 28452af Apr 2, 2019
Copy link
Member

left a comment

Looks good, couple comments.

goto error;
}
zlist_freefn (ctx->lookups, l, lookup_ctx_destroy, true);
l = NULL;

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

unnecessary?

This comment has been minimized.

Copy link
@chu11

chu11 Apr 3, 2019

Author Contributor

nah, it is a sort of personal style to say "this is now owned by someone else", but given we return right after, no need to set it.

activekvsdir=$(flux job id --to=kvs-active $jobid) &&
inactivekvsdir=$(echo $activekvsdir | sed 's/active/inactive/') &&
flux kvs move ${activekvsdir}.eventlog ${inactivekvsdir}.eventlog &&
move_inactive $jobid &&

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

should there be a flux job wait-event <id> depend (or whatever event) here to ensure the submitted job has reached its final state before moving? Otherwise the eventlog may appear in both directories.

This comment has been minimized.

Copy link
@chu11

chu11 Apr 3, 2019

Author Contributor

i don't load sched-simple or jog-manager in these tests, so I think this is fine. But just to be on the safe side for the future we could add a wait-event. But what should we wait for to be safe?

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

Right now there are two events: submit and depend but that behavior could change. In particular once we have dependencies, the depend event won't happen until after the dependency manager responds, and if you don't load it here, it won't.

To avoid being fragile with respect to ongoing development in the job system, it might be best to cancel the job and wait for finish. At that point you know there will be no more updates to the job.

(This pattern is repeated in other subtests and in the job-info-security test script)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 3, 2019

Author Contributor

ahh, I now see that the job-manager module is now loaded in t/rc/rc1-job, which now makes this possible.

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

I think it has been that way for some time. In fact if you submit jobs while the job manager is not loaded, you will get an error on job submission. (A misleading one it turns out - I will open an issue)

This comment has been minimized.

Copy link
@chu11

chu11 Apr 3, 2019

Author Contributor

ok, I think I'm just getting confused, b/c of some tests load the job modules (e.g. t2202).

This comment has been minimized.

Copy link
@chu11

chu11 Apr 3, 2019

Author Contributor

the finish event requires sched-simple to be loaded as well. Perhaps we should wait for clean?

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

Right! Sorry, that's what I meant.

@@ -69,6 +78,28 @@ test_expect_success 'flux job eventlog fails (wrong user)' '
unset_userid
'

test_expect_success 'flux job eventlog works (owner, inactive)' '
jobid=$(submit_job) &&
move_inactive $jobid &&

This comment has been minimized.

Copy link
@garlick

garlick Apr 3, 2019

Member

Same comment here - need to wait for job manager to finish appending to eventlog?

@chu11 chu11 force-pushed the chu11:jobinfo-refactor branch from 28452af to 1882feb Apr 3, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

just re-pushed with suggested changes

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

oh wait, I just realized, the flux job wait-event clean should actually be in the submit_job(), not the move_inactive(). wait 1 sec before review.

chu11 added 8 commits Apr 1, 2019
Make common function to move the entire active directory to inactive
rather than just one path.  Most tests do not need to move just the one
key.
When submitting jobs in t2204-job-info and t2205-job-info-security, immediately
cancel the job and wait for the clean event.  This ensures that no other changes
will occur to the eventlog and all test manipulation of the eventlog will be safe.
Instead of passing a flag to job-info.eventlog-lookup to watch
an eventlog, create a new RPC target job-info.eventlog-watch.  Remove
job info flags in watch as a result, adjust rpc results that are returned to
callers.
Split eventlog lookup & watch infrastructure into two distinct
sets of functions / callbacks.  This is in prepration for lookups
of job data beyond just the eventlog.
Refactor code base into multiple files.
Rename job-info.eventlog-lookup target to job-info.lookup.  The key name
(e.g. eventlog) for the job is sent in the rpc request instead of being
assumed by the target.  Make lookup_key() function take key name for
construction of path to lookup in KVS.
@garlick

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

I'm going to be offline for the afternoon, but just for the record, I'm good with this PR once @chu11 is good with it. If @SteVwonder or someone else wants to press the button once he's ready, that works for me!

@chu11 chu11 force-pushed the chu11:jobinfo-refactor branch from 1882feb to f0ab1c0 Apr 3, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Apr 3, 2019

added one more commit which adds a job cancel and wait-event clean after a job has been submitted.

@codecov-io

This comment has been minimized.

Copy link

commented Apr 3, 2019

Codecov Report

Merging #2112 into master will decrease coverage by 0.03%.
The diff coverage is 71.19%.

@@            Coverage Diff             @@
##           master    #2112      +/-   ##
==========================================
- Coverage   80.29%   80.26%   -0.04%     
==========================================
  Files         197      201       +4     
  Lines       31546    31618      +72     
==========================================
+ Hits        25331    25377      +46     
- Misses       6215     6241      +26
Impacted Files Coverage Δ
src/modules/job-info/job-info.c 72.91% <100%> (+1.62%) ⬆️
src/common/libjob/job.c 78.14% <100%> (ø) ⬆️
src/modules/job-info/util.c 100% <100%> (ø)
src/modules/job-info/lookup.c 64.47% <64.47%> (ø)
src/modules/job-info/watch.c 68.03% <68.03%> (ø)
src/modules/job-info/allow.c 86.2% <86.2%> (ø)
src/modules/barrier/barrier.c 76.55% <0%> (-2.07%) ⬇️
src/common/libflux/message.c 81.27% <0%> (-0.13%) ⬇️
... and 3 more
@garlick

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Ready to merge?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

yup, and I got the next PR ready to go :-)

@garlick

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Sorry for the delay! In it goes.

@garlick garlick merged commit 18a16ac into flux-framework:master Apr 4, 2019
3 of 4 checks passed
3 of 4 checks passed
codecov/patch 71.19% of diff hit (target 80.29%)
Details
Mergify — Summary 1 potential rule
Details
codecov/project 80.26% (-0.04%) compared to 3746a1b
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.