Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkz: defer kvs_watch until necessary #1424

Merged
merged 15 commits into from Apr 6, 2018

Conversation

Projects
None yet
6 participants
@garlick
Copy link
Member

garlick commented Apr 4, 2018

As discussed in #1420, this PR changes libkz so that the kz_ready_f registration does not immediately trigger a kvs_watch() on the stream directory. Instead, it uses chained KVS lookups and continuations to iterate over the stream until the available "blocks" are exhausted. At that point if EOF is not reached, a kvs_watch() is installed as before. However if EOF is reached, the kvs_watch() is entirely avoided.

This should allow flux wreck attach to process the output of a completed job without ever using kvs_watch(), provided that each stream has reached EOF (which might not be the case until we fix #1419). Hopefully this is more scalable.

It might also change the timing for the "redirected" case since the kvs_watch() calls (with embedded synchronous RPC) will only occur during I/O processing, not during initialization, and possibly more staggered in time.

I haven't tested this yet at scale but it seems to be working in my small tests. Will try to do some scale testing tomorrow.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

Oops, here's a test failure in lua/t1003-iowatcher.t I need to run down:

not ok 4 - iowatcher returns error correctly
      got: userdata: 0x9370a94
 expected: nil
@coveralls

This comment has been minimized.

Copy link

coveralls commented Apr 4, 2018

Coverage Status

Coverage increased (+0.002%) to 78.902% when pulling 4ea96cd on garlick:libkz_nowatch into af89104 on flux-framework:master.

@garlick garlick force-pushed the garlick:libkz_nowatch branch from 7dcf93c to 83857c8 Apr 4, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

I made lua/t1003-iowatcher.t a todo for the moment. The test tries to create an I/O watcher watching a non-directory and expects the iowatcher creation to fail. That used to happen at kz_set_ready_cb() time when flux_kvs_watch() was called, which caused a synchronous lookup to occur. Now that the initial lookup result is handled in a continuation, the users callback will be called, and when it calls kz_get(), that will fail.

It's not clear to me looking at the iowatcher implementation whether kz_get() errors are checked. If not, that will need to be fixed and may help us find other reader issues.

Then the test will have to start the reactor and check for error in the reactor callback.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Apr 4, 2018

Codecov Report

Merging #1424 into master will increase coverage by <.01%.
The diff coverage is 75.47%.

@@            Coverage Diff            @@
##           master   #1424      +/-   ##
=========================================
+ Coverage   78.59%   78.6%   +<.01%     
=========================================
  Files         163     163              
  Lines       30082   30105      +23     
=========================================
+ Hits        23643   23663      +20     
- Misses       6439    6442       +3
Impacted Files Coverage Δ
src/common/libkz/kz.c 82.55% <73.95%> (+12.02%) ⬆️
src/bindings/lua/flux-lua.c 80.97% <90%> (-0.31%) ⬇️
src/cmd/cmdhelp.c 79.13% <0%> (-5.22%) ⬇️
src/modules/cron/cron.c 76.68% <0%> (-3.04%) ⬇️
src/common/libkvs/kvs_watch.c 90.12% <0%> (-1.29%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/broker/modservice.c 79.61% <0%> (-0.98%) ⬇️
src/broker/module.c 83.79% <0%> (-0.28%) ⬇️
src/common/libflux/message.c 81.6% <0%> (+0.11%) ⬆️
... and 4 more
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Apr 4, 2018

Sorry you had to deal with that @garlick. Your approach seems good to me

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

NP!

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 4, 2018

This looks really good! How hard would it be to have an option that says "even if it isn't done, don't establish the watchers, just stop" with this change in?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

This looks really good! How hard would it be to have an option that says "even if it isn't done, don't establish the watchers, just stop" with this change in?

Probably not hard. I'll look at that.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

@trws if you have the output of one of these completed jobs in a KVS somewhere, it would be interesting to know before and after numbers on flux wreck attach.

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 4, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 4, 2018

I have an instance up that’s run 18 jobs, up to 4000 tasks each.
Several have non-trivial output, would it be useful to pull something
out of there?

Sure - if it's not too much trouble, just time flux-wreck attach on master and this branch on the same (completed) job.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Apr 5, 2018

Just gave changes to kz iowatcher a quick look and it seems good to me. Just to be clear, new iowatcher callback denotes EOF when both data and err are nil, otherwise if err is not nil, then an error occurred? (I know sorry about the bad semantics of the original function.)

Just another warning that the fd iowatcher has completely different semantics (e.g. data parameter passed to callback is a Lua table, not just a string, etc.) This is because it was based on the old zio code, so I didn't want to bother fixing it yet. Just didn't want to you to be surprised if you went looking there (and hopefully you don't have to.)

Sorry that is all I have time for tonight.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 5, 2018

Thanks @grondo!

Well that sounds right, or (what I was thinking was) you'd first check err and if non-nil, it's an error otherwise, proceed as before.

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 5, 2018

Well, the timing results are pretty conclusive. With this, over ssh, attach on a 400-line output job with 400 ranks takes 0.7 seconds. The current master version has so far been running for five minutes, and I just got another line out of it, so it might eventually finish? The last two lines were a minute each in coming.

For one that actually finished reasonably, the current master took about 1.5 seconds on average, this took about 0.5 (again with this version at the disadvantage of connecting over ssh).

In a possibly related note, do either of you know if it's possible we leave watchers around for cancelled or killed wreck attach commands? I've noticed long-running instances start to get slow, spending time processing kvs events even when the kvs isn't being actively poked from outside. Made me wonder if we're somehow leaking watchers when a client dies an untimely death or something (but then again, I'm way past loopy at this point so... hey, another line!

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 5, 2018

do either of you know if it's possible we leave watchers around for cancelled or killed wreck attach commands?

In that case where the reactor has gone away and the broker connection dropped, the KVS should get a disconnect message and take care of cleaning out any state for that connection. To get a count of service side watcher state for the local rank, run:

flux module stats --parse "namespace.primary.#watchers" kvs

FWIW there is a regression test for this: t/t1103-apidisconnect.t

garlick added some commits Apr 2, 2018

libkz: drop unused kz_gopen()
Problem: kz_gopen() is not used but its implementation
contributes to complexity.

Drop this interface and associated logic.
libkz: [cleanup] factor out lookup_next()
Problem: code to kvs_lookup next key with continuation
is duplicated in continuation and watch callback,
and soon in another function.

Factor out common code to lookup_next()

Also, drop some LOG_DEBUG messages and add sequence
number to some LOG_ERR messages to make them more
informative.
libkz: ensure lookups stop after callback unregistered
Problem: KVS lookups continue without incrementing kz->seq
if callback is unregistered with with kz->seq < kz->last_dir_size
resulting in infinite loop.

Terminate lookups if callback is NULL in the lookup continuation.
libkz: update kz->eof in kz_get_json()
Prepare for use of kz->eof flag by ensure it is set
when end of stream is reached, even if data is being
consumed in "raw" mode with kz_get_json.
libkz: terminate watcher at EOF
If internal kz->eof flag is set, terminate KVS watcher.
libkz: avoid setting up watch until blocked
Since KVS watches are costly, avoid setting one up
until any existing data in the stream has been
read and consumed.  If EOF is reached, then the
KVS watch is avoided entirely.
libkz: fix error handling in getnext_blocking()
Problem: zio decode errors were ignored in getnext_blocking().

Handle that error.
libkz: defer errors from continuations
Problem: errors that occur in context of a KVS continuation
in the read path were not saved so they could be returned to
the user on the next kz_get(), kz_get_json(), or kz_close() call.

Check for saved error in kz_get(), kz_get_json(), and kz_close().
When a continuation encounters an error, save it, and call the
user's callback.

A couple of inappropriate log messages were removed also
and some error paths further cleaned up.
bindings/lua: add err arg to iowatcher callback
Problem: if an iowatcher internally gets a kz_get()
error, it has no way to inform the user.

Add an err argument to the iowatcher callback that is nil
when there is no error, and set to an error string when
there is one.
libkz: check for invalid arguments
Problem: many of the public kz functions will crash
if passed NULL arguments.

Add checks for absurd argument values.
lua/t1003-iowatcher.t: fix test error check
Problem: lua iowatcher test 5 'iowatcher returns error
correctly' fails after kvs_watch deferral change.

This test attempts to create an iowatcher on a non-directory
and expects iowatcher creation to fail.  Now that the
the lookup is no longer performed in the context of
iowatcher creation, creation succeeds.

Change the test's iowatcher callback to stop the reactor
if 'err' is non-nil.  Then change he test to run the reactor
and expect an ENOTDIR failure.
test/kz: [cleanup] rename kzutil to kzcopy
Problem: kzutil --attach is not used by any tests.

Drop the attach code which is hard wired for wreck's
stdio naming convention, and is not used by any test.
This leaves kzutil --copy as the main mode, so drop
the --copy option and rename the utility to kzcopy.

Update users.
test/kz: [cleanup] drop kzcopy --blocksize 4096
Problem: sharness test driver for kzcopy explicitly sets
blocksize for I/O to 4096 which is the default.

Drop the -b 4096 option from kzcopy calls.

garlick added some commits Apr 4, 2018

test/kz: add kzcopy --non-blocking and tests
Problem: test coverage is thin for KZ_FLAGS_NONBLOCK.

Add a kzcopy --non-blocking option that uses the kz
ready_cb_f callbacks to read a kz stream.

Add a couple more sharness tests to exercise this option.
test/kz: add TAP unit test in libkz dir
Problem: kz EINVAL checks have no test coverage.

Add a TAP test to cover invalid arguments.

(Unfortunately not a lot can be covered here because
kz requires a broker connection and working KVS to
get very far).

@garlick garlick force-pushed the garlick:libkz_nowatch branch from 9571a03 to 4ea96cd Apr 5, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 5, 2018

Rebased, improved commit messages somewhat, and squashed some incremental development.
I'd vote that this goes in now and further improvements (such as a flag to avoid "following" the stream when EOF was not written) be deferred to another PR.

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 5, 2018

We are now up to 9 hours and 46 minutes. The old version has printed 366 lines.

@trws

This comment has been minimized.

Copy link
Member

trws commented Apr 5, 2018

Agreed, I can even hack something in for splash short term.

The stat command is telling me there are 740 active kvs watches. The big flux instance is currently almost unusable, and I'm trying to figure out why. It's running, it's responding, but its spending all of its time in the kvs module processing transactions. The reason this surprises me is that long-running attach and a wreck ls are the only things running, and it's pegged at 100% of one CPU continually.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 5, 2018

@chu11, would you mind having a quick look at this and press the button if you find nothing amiss?

The test coverage is somewhat improved from where it was, but commensurate with how much effort I think we want to put into this old stuff (in other words, not quite par). I think I'm OK with that.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Apr 6, 2018

Ping @chu11?

@chu11 chu11 merged commit 95f8d8f into flux-framework:master Apr 6, 2018

3 of 4 checks passed

codecov/patch 75.47% of diff hit (target 78.59%)
Details
codecov/project 78.6% (+<.01%) compared to af89104
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.002%) to 78.902%
Details

@garlick garlick deleted the garlick:libkz_nowatch branch Apr 6, 2018

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.