Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: Refactor in preparation for private namespaces #1286

Merged
merged 20 commits into from Dec 6, 2017

Conversation

Projects
None yet
6 participants
@chu11
Copy link
Contributor

chu11 commented Nov 13, 2017

As described in #1197, here is the part 1 pull request where we refactor the KVS to not initialize itself automatically and we store KVS metadata in a hash table. All in preparation for private namespace support.

After doing this, I'm debating if this should be a separate PR, b/c there's going to be a fair amount of code paths that I don't believe can be tested until private namespace support is added.

We'll see what the coverage looks like after it goes through travis and determine if we should have this as part of the mega-PR later on for private namespaces.

@chu11 chu11 requested a review from garlick Nov 13, 2017

@chu11 chu11 force-pushed the chu11:issue1197-part1 branch from 9afc727 to 3fcb23b Nov 13, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 13, 2017

oops, forgot to rebase on master, re-pushed

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage decreased (-0.07%) to 78.523% when pulling 3fcb23b on chu11:issue1197-part1 into 3c3890c on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 13, 2017

oops, seems like I have a mem-leak, forgot to call zhash_freefnn after an insert.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 13, 2017

Restarted 4 failed builders - all had write errors (#1145)

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 13, 2017

I see what you mean about these changes feeling a bit incomplete and hard to test. Feel free to keep going here. Couple comments:

  • it might be good to point out in the second commit's message that the getroot call is (still) synchronous, and that it recurses up the TBON to the root;
  • does getroot_request_cb() respond with an error to the original request if its internal RPC fails?
  • You could define the primary namespace key in the public kvs.h since we may need the API to provide it in RPCs; also, the key could could be simpler - something like "primary" would suffice.
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 13, 2017

Those builders are still failing - I think maybe travis is having a bad day? We haven't seen those write errors for a long time; now all of a sudden they're back in force!

@garlick garlick closed this Nov 13, 2017

@garlick garlick reopened this Nov 13, 2017

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 13, 2017

Codecov Report

Merging #1286 into master will increase coverage by <.01%.
The diff coverage is 76.56%.

@@            Coverage Diff             @@
##           master    #1286      +/-   ##
==========================================
+ Coverage   78.25%   78.25%   +<.01%     
==========================================
  Files         154      154              
  Lines       27717    27941     +224     
==========================================
+ Hits        21689    21865     +176     
- Misses       6028     6076      +48
Impacted Files Coverage Δ
src/common/libkvs/kvs_commit.c 77.77% <100%> (+1.3%) ⬆️
src/common/libkvs/kvs_watch.c 88.54% <100%> (+0.99%) ⬆️
src/common/libkvs/kvs.c 100% <100%> (ø) ⬆️
src/common/libkvs/kvs_lookup.c 77.69% <100%> (+0.32%) ⬆️
src/modules/kvs/kvs.c 65.81% <74.65%> (+0.85%) ⬆️
src/cmd/flux-kvs.c 80.49% <85.71%> (+0.59%) ⬆️
src/common/libflux/content.c 86.66% <0%> (-3.34%) ⬇️
src/common/libflux/rpc.c 92.56% <0%> (-1.66%) ⬇️
src/common/libutil/dirwalk.c 93.57% <0%> (-0.72%) ⬇️
src/common/libkvs/kvs_txn.c 74.71% <0%> (-0.57%) ⬇️
... and 10 more
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 13, 2017

Whoops clicked wrong button! Sorry!

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage decreased (-0.06%) to 78.537% when pulling 3fcb23b on chu11:issue1197-part1 into 3c3890c on flux-framework:master.

@morrone

This comment has been minimized.

Copy link
Contributor

morrone commented Nov 13, 2017

The valigrind failure looks consistent in buildbot...any idea what might be groing wrong there?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Nov 13, 2017

@morrone, does the buildot write out the contents of t[0-9]*.output on failure? The full output of the valgrind test would be in t5000-valgrind.output when it fails. (If FLUX_TESTS_LOGFILE=t is set in the environment of the tests).

In travis we were cat-ing these files in the after_failure: section.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 13, 2017

it might be good to point out in the second commit's message that the getroot call is (still) synchronous, and that it recurses up the TBON to the root;

sounds like a good idea

does getroot_request_cb() respond with an error to the original request if its internal RPC fails?

I believe it does. getroot_rpc() would fail, returning an errno back to getroot(), which will return an errno to the caller (such as get_request_cb()) which then returns errno back to the original rpc.

You could define the primary namespace key in the public kvs.h since we may need the API to provide it in RPCs; also, the key could could be simpler - something like "primary" would suffice.

sounds like a good idea

@chu11 chu11 force-pushed the chu11:issue1197-part1 branch from 3fcb23b to 6da5022 Nov 14, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 14, 2017

re-pushed with above fixes + the mem-leak fix. Went ahead and squashed since the fixes were so tiny.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 14, 2017

catching up on the thread, I think the mem-leaks were just a dumb mistake on my part. Not anything in travis or anything.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 14, 2017

Coverage Status

Coverage decreased (-0.07%) to 78.53% when pulling 6da5022 on chu11:issue1197-part1 into 3c3890c on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 14, 2017

So the 65.16% diff coverage is not as bad as I thought it would be. Some refactoring I did late in this PR made a big difference. And if I do #1287, that'll increase it a bit more.

I suppose this could be mergeable if we decided to merge so the private namespace PR is smaller. I could go either way.

Most of the error paths are basically untestable until private namespace support is done. So I'm leaning towards waiting till private namespace is done.

@morrone

This comment has been minimized.

Copy link
Contributor

morrone commented Nov 14, 2017

@grondo, ah, I'll need to look into something similar for buildbot.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Nov 15, 2017

Most of the error paths are basically untestable until private namespace support is done. So I'm leaning towards waiting till private namespace is done.

Sounds reasonable to me.

@chu11 chu11 force-pushed the chu11:issue1197-part1 branch from 6da5022 to 6302cdd Nov 30, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Nov 30, 2017

Just did a big push. Basically, with the exception of "remove namespace", this accomplishes steps 1 through 5 listed in #1197.

I know there are more tests to be made, but I wanted to see how good/bad my test coverage was at this point and perhaps get some initial comments. (TODO: tests to cover root error paths, watch tests,
multi-namespace stress tests, wait-version tests, probably more once I see how the coverage is)

The initial mechanism to specify a namespace is via environment variable, as that's easier than modifying a bajillion API functions to also take a namespace parameter.

I had considered not implementing "remove namespace" in this PR (as discussed in #1197), but I think it has to be done now. As it well make the testing better (i.e. don't have to keep on making more and more namespaces). So I'll do that later on.

@chu11 chu11 force-pushed the chu11:issue1197-part1 branch from 6302cdd to b314e64 Nov 30, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 30, 2017

Coverage Status

Changes Unknown when pulling b314e64 on chu11:issue1197-part1 into ** on flux-framework:master**.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 6, 2017

yup, lemme squash

chu11 added some commits Nov 9, 2017

modules/kvs; Refactor kvs initialization of root
In core KVS file, instead of initializing root whenever the module
is loaded, instead initialize root whenever the first kvs get, watch,
sync, or fence request is made.

In other callbacks, adjust appropriately.  In some functions, such
as kvs.unwatch, do nothing if the root is initialized.  In functions
such as dropcache, nothing changes b/c it is not relevant if the root
is initialized or not.  In stats.get, return all zeroes to user.  In a
getroot request, it must initialize itself before returning the root
to the requestor.  The getroot request remains synchronous and recurses
up to the root of the TBON to initialize.
modules/kvs: Refactor storage of root metadata
In preparation for future private namespace support, do not store
KVS root metadata (i.e. reference, sequence number) in a variable
within the KVS context.

Instead store it within a hash within the kvs context.  Determining
if a root has been initialized is now done via a lookup within this
hash.
modules/kvs: Refactor kvs watchlist
Move watchlist from kvs context and into kvs root, as in the future
a watchlist will exist per namespace.

Adjust some functions to iterate over all watchlists, not just the
singular one from before.
modules/kvs: Refactor kvs commit mgr
Move commit mgr from kvs context into kvs root, as in the future
commits will occur in a per namespace context.

Adjust some functions to iterate over all potential commit mgrs, not
just the singular one from before.
kvs: Add namespace parameters in rpcs
In KVS request rpcs, add namespace parameter that specifies
which namespace the RPC belongs to.  For the time being,
it is always the primary namespace.

In KVS server, adjust code that previously hardcoded the
primary namespace to instead use the namespace provided in
the rpc.

Rename KVS_PRIMARY_ROOT_KEY to KVS_PRIMARY_NAMESPACE as consequence.
modules/kvs: Refactor kvs.disconnect
Adjust kvs.disconnect for fact multiple namespaces could exist in the
future by iterating through all potential namespaces when a client
drops.
modules/kvs: Refactor kvs.stats.get
Refactor how kvs.stats.get works under namespaces, as the original
did not make much sense.  Instead, output stats independent of
namespaces (cache stats) and output stats for each namespace.

Adjust tests that check stats accordingly.
modules/kvs: Minor refactor kvsroot
Store name of the namespace in the kvsroot.

Also store back pointer to kvs_ctx_t in kvsroot.
modules/kvs: Minor refactor commit mgr aux data
Refactor to store the kvsroot of the enclosing commit
mgr as the commit mgr's aux data, instead of storing the
kvs_ctx_t as the aux data.  This works out better and removes
some unnecessary code.
modules/kvs: Refactor error & setroot events
Refactor error and setroot events to pass around and parse namespace.
modules/kvs: Refactor event names and subscription
Modify kvs.setroot and kvs.error events to be specific to each
namespace, so that nodes without the namespace don't receive
the events.

Update event subscription to only subscribe to the specific events
meaningful to the module.

In addition, don't subscribe to any events until the first root
is initialized.
modules/kvs: Refactor kvsroot
Support future flags option in each kvsroot.
common/libkvs: Add new unit tests
Add new unit tests for common/libkvs/kvs.c functions.
common/libkvs: Support FLUX_KVS_NAMESPACE
Support an environment variable FLUX_KVS_NAMESPACE, which allows
users to select a KVS namespace to use other than the default one.

Add some unit tests for internal private functions appropriately.

Update documentation appropriately.
modules/kvs: Return error for illegal namespaces
Return ENOTSUP if a kvs.getroot reaches rank0 and a namespace
cannot be found.  Adjust some error messages appropriately.
t/kvs: Add new namespace tests
Add new tests in t1004-kvs-namespace.t.
doc/man3: Document new potential errnos
Document errnos that map to illegal namespaces.

@chu11 chu11 force-pushed the chu11:issue1197-part1 branch from 5ba630b to 26e6dc7 Dec 6, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 6, 2017

just pushed with all appropriate changes squashed

@coveralls

This comment has been minimized.

Copy link

coveralls commented Dec 6, 2017

Coverage Status

Coverage increased (+0.03%) to 78.613% when pulling 26e6dc7 on chu11:issue1197-part1 into 54205c6 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 6, 2017

Hmmm, one build hung after

PASS: t4000-issues-test-driver.t 4 - t0900-wreck-invalid-cores.sh

Old issue search shows this has shown up atleast one other time (#1265). Restarting.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 6, 2017

Thanks @chu11. Excellent work!

@garlick garlick merged commit 57e24fa into flux-framework:master Dec 6, 2017

3 checks passed

buildbot/core_standard Build done.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 78.613%
Details

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.