Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkz: avoid blocking RPCs in reactor callbacks #1411

Merged
merged 10 commits into from Apr 2, 2018

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Apr 2, 2018

Here are some changes libkz to defer the kz_ready_f callback until data is actually available, so it doesn't have to be fetched with a synchronous RPC from the callback. I hope this will solve the excessive re-queue calls noted in #1406.

There is some other unrelated cleanup too - just what I needed to fix up to keep my sanity in this old code.

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

garlick added some commits Apr 1, 2018

libkz: [cleanup] preserve errno in kz_destroy
Problem: kz_destroy() can alter the errno value.

Save/restore errno at the top/bottom of this function.

Also avoid redundant checks for non-NULL in calls
to destructors that do that internally.
libkz: [cleanup] avoid excessive malloc for keys
Problem: keys are computed using dynamically allocated
memory.

Use a buffer in the kz handle to build keys, which are
of a predictable size for any given kz handle.
libkz: drop unused KZ_FLAGS_TRUNC flag
Problem: logic for KZ_FLAGS_TRUNC is a bit dubious, and
it appears that truncation should be the default behavior
when opening an existing kz stream for writing anyway.

Drop this flag and associated logic.

Update sharness test and 'kzutil', used only for testing.
libkz: drop unnecessary KZ_FLAGS_NOEXIST flag
Problem: the KZ_FLAGS_NOEXIST is always used, thus the
code for handling open when that flag is not supplied is
never used.

Drop the code and the flag and update users.
libkz: if ready_cb is NULL, disable KVS watcher
Problem: if user calls kz_set_ready_cb() with a NULL handler,
the internal KVS watch callback is not disabled.

Add logic to unwatch the kz directory if the user registers
a NULL handler.
libkz: avoid synchronous RPC in watch callback
Problem: internal KVS watch callback makes synchronous
RPC calls.

Have the watch callback start a KVS lookup.  Call the
user's kz_ready_f handler in the KVS lookup continuation.
@coveralls

This comment has been minimized.

Copy link

coveralls commented Apr 2, 2018

Coverage Status

Coverage increased (+0.009%) to 78.911% when pulling 0371415 on garlick:libkz_async into 7f04a25 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Apr 2, 2018

Codecov Report

Merging #1411 into master will increase coverage by <.01%.
The diff coverage is 78.64%.

@@            Coverage Diff            @@
##           master   #1411      +/-   ##
=========================================
+ Coverage   78.59%   78.6%   +<.01%     
=========================================
  Files         163     163              
  Lines       29995   30003       +8     
=========================================
+ Hits        23575   23583       +8     
  Misses       6420    6420
Impacted Files Coverage Δ
src/bindings/lua/flux-lua.c 81.27% <100%> (-0.77%) ⬇️
src/modules/wreck/wrexecd.c 75.68% <66.66%> (-0.09%) ⬇️
src/common/libkz/kz.c 70.53% <76.66%> (-2.29%) ⬇️
src/common/libflux/message.c 81.36% <0%> (-0.36%) ⬇️
src/common/libflux/reactor.c 93.73% <0%> (+0.28%) ⬆️
src/broker/overlay.c 74.45% <0%> (+0.31%) ⬆️
src/common/libflux/handle.c 84.15% <0%> (+0.49%) ⬆️
src/common/libutil/base64.c 95.77% <0%> (+0.7%) ⬆️
... and 6 more
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Apr 2, 2018

This is impressive work, @garlick! -- and a lot less change than I was expecting!

I tested this with 128 broker instance (over 4 real nodes) on ipa and it looks stable to me, running up to 1024 tasks. Unfortunately, anything higher than that hits #1407. Unsurprisingly, the I/O on this branch seems much faster (though I didn't do any actual quantitative tests).

@grondo if you have a chance to look at the changes in the lua bindings and wrexecd I'd appreciate it. The goal there was to avoid looping on kz_get() in the user callback since only the first call will be non-blocking.

Yeah, even without your changes looping on kz_get() until EAGAIN/EWOULDBLOCK was probably really bad behavior in an event loop, and should have been removed no matter the solution.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Apr 2, 2018

Update: Maybe we have an fd leak somewhere, setting ulimit -n 8192 seems to reduce or eliminate the incidence of #1407 (at least on this branch). Now I'm able to run up to 4096 tasks (but above that getting fork() errors -- have to also increase max user processes)

It still takes ~3m to read all the I/O via the reactor loop via kz/lua code, but the job completed in 10s (meaning all tasks exited with exit status saved in KVS, all I/O complete and committed to KVS)

$ tiime flux wreckrun -n4096 hostname
<snip 4096 lines of output>
real	3m12.563s
user	1m31.552s
sys	1m25.873s
$ flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1   4096 exited     2018-04-02T11:27:06      10.086s  [0-127] hostname

However, this is so much better than the extreme runtimes reported by @trws

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Apr 2, 2018

@garlick, if this is ready to go in , it seems like a no-brainer at this point!

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 2, 2018

Sure, thanks for the extra testing!

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 2, 2018

Yeah, that sounds like a great improvement for sure! I'll test this with the splash branch on a few hundred nodes shortly.

@grondo grondo merged commit 0d32cd3 into flux-framework:master Apr 2, 2018

4 checks passed

codecov/patch 78.64% of diff hit (target 78.59%)
Details
codecov/project 78.6% (+<.01%) compared to 7f04a25
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.009%) to 78.911%
Details

@garlick garlick deleted the garlick:libkz_async branch Apr 2, 2018

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 2, 2018

This certainly helps, but I'm sorry to say that it doesn't seem as though this fixed it completely. Without this patch, a 180 node single task per node job would take minutes, now a 200 node job completes in less than two seconds. Unfortunately the 400 node job still takes 5 minutes to start, and more than 10 to run with output redirected. Without it redirected, about 3 seconds. There's some kind of degenerate case we haven't managed to squash, maybe setting up all of those watches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.