Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-manager: combine coincident eventlog updates into single KVS commit #2062

Merged
merged 7 commits into from Mar 5, 2019

Conversation

@garlick
Copy link
Member

commented Mar 5, 2019

This removes some duplicated code in the job manager for appending to a job's eventlog, replacing it with a more sophisticated mechanism to batch KVS eventlog updates that occur close together in time.

Logic is added to ensure that any in-flight commits are finalized (and callbacks invoked, if any) if the job manager module is unloaded while busy, thus ensuring no data is lost from KVS job records.

garlick added 2 commits Mar 4, 2019
Problem: job-ingest no longer puts priority in
job.active.<jobid>.priority, but the job-manager
still updates this key when priority changes.

Drop the 'priority' key update from the KVS txn.

Update job-manager sharness test to look for
priority event, not key.
Problem: the job->exception_pending flag is unused.

Drop flag, update unit tests.
@garlick garlick requested a review from grondo Mar 5, 2019
@garlick

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2019

Coverage is not great here, but looking through the codecov diff, the missing coverage seems to be mostly unlikely errors.

Just added a commit that avoids a memory leak found by inspection. That should be squashed before this is merged.

@garlick

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2019

Hmm, hit a failure here in centos builder:

ERROR: t1008-kvs-eventlog
=========================
ok 1 - flux kvs eventlog append --timestamp works
PASS: t1008-kvs-eventlog.t 1 - flux kvs eventlog append --timestamp works
ok 2 - flux kvs eventlog append works
PASS: t1008-kvs-eventlog.t 2 - flux kvs eventlog append works
ok 3 - flux kvs eventlog get --watch --count=N works
PASS: t1008-kvs-eventlog.t 3 - flux kvs eventlog get --watch --count=N works
not ok 4 - flux kvs eventlog get --watch returns append order
FAIL: t1008-kvs-eventlog.t 4 - flux kvs eventlog get --watch returns append order
#	
#		flux kvs eventlog append --timestamp=1 test.d foo bar &&
#		echo "1.000000 foo bar" >get_d.exp
#		flux kvs eventlog get --unformatted --watch --count=20 test.d >get_d.out &
#		pid=$! &&
#		for i in $(seq 2 20); do \
#			flux kvs eventlog append --timestamp=$i test.d foo bar; \
#			echo "$i.000000 foo bar" >>get_d.exp; \
#		done &&
#		wait $pid &&
#		test_cmp get_d.exp get_d.out
#	
# failed 1 among 4 test(s)
2019-03-05T14:55:28.604523Z broker.err[0]: Run level 2 Exited with non-zero status (rc=1) 1.1s
flux-start: 0 (pid 10558) exited with rc=1
@grondo

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

benchmark results look good:

sched-bench: On branch master: v0.11.0-407-gbc15084
sched-bench: starting test with 4096 jobs. broker.pid=30432
sched-bench: ingested 4096 jobs in 15.98s (256.32 job/s)
sched-bench: allocated 4096 jobs in 21.932s (186.76 job/s)
sched-bench: elapsed time: 22.326s (183.46 job/s)
sched-bench: On branch eventlog_batch: v0.11.0-415-g0281bbc
sched-bench: starting test with 4096 jobs. broker.pid=32504
sched-bench: ingested 4096 jobs in 15.94s (256.96 job/s)
sched-bench: allocated 4096 jobs in 17.198s (238.17 job/s)
sched-bench: elapsed time: 17.623s (232.43 job/s)
@dongahn

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Great work!

Just curious. Does the time include executing the test program as well (sleep 0)? Is the program a serial code requiring one core? Also what was the number of nodes being used?

@grondo

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Just curious. Does the time include executing the test program as well (sleep 0)? Is the program a serial code requiring one core? Also what was the number of nodes being used?

Sorry, I should explain the benchmark first!

No jobs are actually being executed in the benchmark. The benchmark creates a fake input to our simple scheduler of 128 nodes with 32 cores per node. The test then submits 4096 single core jobs, and times the rate at which they are ingested (with t/ingest/submitbench -f 16 -r 4096) and the time it takes for all jobs to be allocated resources (i.e. the time of the first job's submit event to the last job's alloc event).

@garlick

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2019

Thanks for checking the performance @grondo! I think the test failure is unrelated so I'll go ahead and squash down that one commit.

garlick added 5 commits Mar 3, 2019
Problem: several subsystems in the job-manager independently
commit events to the job KVS eventlog, duplicating code and
missing an opportunity to batch temporally close commits
together for reduced overhead.

Add an new eventlog update mechanism, similar to the KVS update
batching in job-ingest.  Accept updates via two functions:
event_log(), and a varargs version, event_log_fmt(). These
functions add to a KVS transaction that grows in memory until
a timer expires, then the transaction is committed in one RPC.

Callbacks can optionally be registered with each event.
The callbacks are called upon completion of the transaction,
with access to the future used for the commit, so they can
determine whether it succeeded or failed.

A hardwired tunable, initially set to 10ms, determines how long
a batched transaction waits until it is flushed.  This value
may be tuned as we gain experience with the system.

If job-manager is trying to unload with commits in flight, block
until those complete, and invoke any registered callbacks before
tearing down.
Replace internal one-off event log commit function
with the new shared ones.
Replace internal one-off event log commit function
with the new shared ones.
Replace internal one-off event log commit function
with the new shared ones.
Drop util_eventlog_append() and util_attr_set()
as these functions are no longer used.

Update unit test.
@garlick garlick force-pushed the garlick:eventlog_batch branch from 0281bbc to a874f01 Mar 5, 2019
@grondo
grondo approved these changes Mar 5, 2019
Copy link
Contributor

left a comment

LGTM. I also ran through the benchmark under valgrind as a sanity check (at 5.2 job/s) and no errors or leaks reported.

@codecov-io

This comment has been minimized.

Copy link

commented Mar 5, 2019

Codecov Report

Merging #2062 into master will decrease coverage by 0.04%.
The diff coverage is 74.45%.

@@            Coverage Diff            @@
##           master   #2062      +/-   ##
=========================================
- Coverage   80.44%   80.4%   -0.05%     
=========================================
  Files         191     192       +1     
  Lines       30286   30345      +59     
=========================================
+ Hits        24365   24400      +35     
- Misses       5921    5945      +24
Impacted Files Coverage Δ
src/modules/job-manager/util.c 85.93% <ø> (+6.81%) ⬆️
src/modules/job-manager/priority.c 84.21% <100%> (-1.51%) ⬇️
src/modules/job-manager/raise.c 69.11% <100%> (-0.33%) ⬇️
src/modules/job-manager/event.c 70.37% <70.37%> (ø)
src/modules/job-manager/job-manager.c 58.76% <71.42%> (-0.38%) ⬇️
src/modules/job-manager/alloc.c 73.76% <92.3%> (+0.39%) ⬆️
src/modules/barrier/barrier.c 76.55% <0%> (-2.07%) ⬇️
src/broker/modservice.c 78.84% <0%> (-0.97%) ⬇️
src/modules/connector-local/local.c 73.48% <0%> (-0.15%) ⬇️
src/common/libflux/message.c 81.51% <0%> (-0.13%) ⬇️
... and 2 more
@grondo grondo merged commit 3a01140 into flux-framework:master Mar 5, 2019
2 checks passed
2 checks passed
Mergify — Summary 1 potential rule
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@grondo

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Merged. Thanks!

@dongahn

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

Sorry, I should explain the benchmark first!

No jobs are actually being executed in the benchmark. The benchmark creates a fake input to our simple scheduler of 128 nodes with 32 cores per node. The test then submits 4096 single core jobs, and times the rate at which they are ingested (with t/ingest/submitbench -f 16 -r 4096) and the time it takes for all jobs to be allocated resources (i.e. the time of the first job's submit event to the last job's alloc event).

Ah I see. This is really a good testing method. I might as well hook the same tester into the resource-matching service. This way I can measure the performance differences between simple and advanced schedulers.

@garlick garlick deleted the garlick:eventlog_batch branch Mar 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.