Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs-watch: fix read-your-writes consistency #1863

Merged
merged 11 commits into from Dec 12, 2018

Conversation

Projects
None yet
3 participants
@chu11
Copy link
Contributor

chu11 commented Dec 10, 2018

Per discussion in #1838 on fixing read-your-writes consistency in the kvs-watch module.

  • add a new rpc target called kvs.lookup-plus (name changeable, couldn't think a better one) that returns the root_ref & root_seq used during a lookup. In addition, it returns the root_ref &
    root_seq used even on ENOENT.

  • for kvs-watch.lookup, only lookup via kvs.lookup-plus. The first lookup will always be done on the current kvs root, and latter lookups will be based on root refs & seqs from setroot events. With info from kvs.lookup-plus (most notably the root_seq), we can ensure read-your-writes consistency because no older references will be sent to users. A few unique corner cases had to be handled for the ENOTSUP (namespace missing) case.

  • for kvs-watch.getroot, a similar process is done, but a list of potential root refences & sequences have to be stored in a new list b/c such infrastructure was not yet available.

  • various refactoring done along the way to get to the point that the above could be done.

    • kvs.lookup now takes an optional root_seq as well as the optional root_ref. The root_seq is only for convenience, so that it can be returned back to the caller of a kvs.lookup-plus call.

    • commit structures in kvs-watch now have reference counters, so they will not be deleted on destruction (commit structs are stored in namespaces and possibly watcher lists).

    • kvs-watch no longer parses raw data out of lookup responses, it instead parses out json objects.

@chu11 chu11 force-pushed the chu11:issue1838 branch from bbe7895 to 684fea5 Dec 10, 2018

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 10, 2018

realized i forgot to destroy a watcher on a getroot failure. Also, travis found an unitialized variable warning. I went ahead and squashed these fixes since they were tiny.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Dec 10, 2018

Codecov Report

Merging #1863 into master will increase coverage by <.01%.
The diff coverage is 80.55%.

@@            Coverage Diff             @@
##           master    #1863      +/-   ##
==========================================
+ Coverage   80.05%   80.05%   +<.01%     
==========================================
  Files         196      196              
  Lines       34960    35105     +145     
==========================================
+ Hits        27986    28105     +119     
- Misses       6974     7000      +26
Impacted Files Coverage Δ
src/modules/kvs/lookup.c 82.21% <100%> (+0.26%) ⬆️
src/modules/kvs-watch/kvs-watch.c 75.33% <75%> (-1.17%) ⬇️
src/modules/kvs/kvs.c 66.56% <94.23%> (+0.95%) ⬆️
src/common/libflux/mrpc.c 87.55% <0%> (-1.21%) ⬇️
src/common/libutil/veb.c 98.28% <0%> (-0.58%) ⬇️
src/cmd/flux-event.c 77.97% <0%> (ø) ⬆️
src/modules/connector-local/local.c 73.77% <0%> (+0.14%) ⬆️
src/cmd/flux-module.c 83.96% <0%> (+0.23%) ⬆️
... and 2 more

@chu11 chu11 force-pushed the chu11:issue1838 branch from 684fea5 to 0232582 Dec 11, 2018

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 11, 2018

re-pushed, just added some comments

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 11, 2018

I've scanned through this and I think I understand it! Nice work. It is a little taxing trying to follow the kvs-watch.c code though, keeping in mind its interactions with kvs and the API. Do you see any opportunities to break this up into separate C modules, sort of like what was done in job-manager module?

Thoughts on testing read-your-writes e.g. as suggested in #1832 ?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 12, 2018

I've scanned through this and I think I understand it! Nice work. It is a little taxing trying to follow the kvs-watch.c code though, keeping in mind its interactions with kvs and the API. Do you see any opportunities to break this up into separate C modules, sort of like what was done in job-manager module?

I haven't put too much thought into the at the moment, as it was taxing just to get to this point :-) Let me ponder a bit. Would you think that should go in this PR or a follow up PR?

But I am glad that it mostly makes sense!

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 12, 2018

Thinking about potential cleanups. I could see putting a nice chunk of getroot handling / lookup handling in separate C files, but the common code of the main namespace monitor, watchers, etc. obviates need for shared code in other c modules. I'm not entirely sure it's a big net win. I see some potential to combine infrastructure of the getroot and lookup handling, but it'll be messy and maybe not a net win either.

One thought I had was perhaps having a namespace monitor for getroots and namespace monitor for lookups. By decoupling the common code, that could make things simpler overall and we could stick lots of code in separate modules. This would be nice especially for struct watcher (and maybe other data structures) as there are now an increasing number of fields just for getroots or just for lookups. Hypothetically, why not even have a kvs-watch-geroot module and a kvs-watch-lookup module?

Another thought I had. How often do you expect kvs-watch with getroot to be done? B/c with the new kvs.lookup-plus, we can completely remove all of the kvs-watch getroot handling. A getroot watch can be done with the kvs-watch.lookup code and the key .. The problem with this is the rpc count would be way higher / noisier b/c you'd effectively be doing a lookup on every single change to the KVS.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 12, 2018

I don't think we have a real use case for flux_kvs_getroot (flags=FLUX_KVS_WATCH), so that option is pretty appealing!

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 12, 2018

I don't think we have a real use case for flux_kvs_getroot (flags=FLUX_KVS_WATCH), so that option is pretty appealing!

You mean collapsing it into kvs-watch.lookup?

If we don't have a real use case, should we just get rid of it?

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 12, 2018

You mean collapsing it into kvs-watch.lookup?

Right.

If we don't have a real use case, should we just get rid of it?

I feel like I might be missing something, but I can't think of a good reason not to at this point, and in keeping with the spirit of removing half-baked API in this release, maybe we should nix it, especially if it would reduce complexity in the kvs-watch module.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 12, 2018

maybe we should nix it, especially if it would reduce complexity in the kvs-watch module.

Ok, how about we merge this PR in since it's a good "we fixed read-your-writes" stopping point, then I'll remove it in a later PR b/c it'll also involve cleaning up libkvs. I'll rebase once I merge in #1867.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 12, 2018

That sounds good, then we'll at least capture the extra work you did to make read-your-writes work with getroot+watch.

chu11 added some commits Dec 6, 2018

modules/kvs: Add lookup convenience functions
Add lookup_get_root_ref() and lookup_get_root_seq() to get the
root ref & root seq used in a lookup.
modules/kvs: Add root_seq to lookup_create
Add a root_seq option to lookup_create().  This parameter is solely
for convenience so it can be "connected" to the root_ref being
passed in and retrieved later during lookup_get_root_seq().
modules/kvs: Create kvs.lookup-plus rpc target
Create a new kvs.lookup-plus rpc target to return the root_ref
and root_seq in a lookup response.  In addition, return ENOENT
error along with a root_ref & root_seq.
modules/kvs: Create lookup_common()
Create new function abstracting most of lookup_request_cb() into
a common lookup function.
modules/kvs-watch: Functionize some code
Refactor getroot / lookup handling into functions.
modules/kvs: Allow root_seq input to kvs.lookup
Allow user to optionally pass in root_seq to kvs.lookup.
modules/kvs-watch: Refactor kvs.lookup parsing
Instead of parsing / storing the raw data kvs.lookup response,
instead parse / store the json value from the response.
modules/kvs-watch: Fix lookup read-your-writes
Fix read-your-writes consistency with the kvs-watch module.

The primary technique is to do a lookup against the current
root head of the kvs, and not the cached root reference.

Several corner cases must be handled for special circumstances.
modules/kvs-watch: Add refcount to commit struct
Add a reference counter to internal commit struct, call commit_decref()
instead of commit_destroy() in appropriate places.
modules/kvs-watch: Rename getroot_continuation()
Rename function to namespace_getroot_continuation() to differentiate
it from a getroot continuation that will be added later.
modules/kvs-watch: Fix getroot read-your-writes
Fix read-your-writes consistency with the kvs-watch module and
getroot watches.

The primary technique is to do a getroot call against the current
root head of the kvs, and not a cached root reference.

Fixes #1838

@chu11 chu11 force-pushed the chu11:issue1838 branch from 0232582 to 3d70ccc Dec 12, 2018

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Dec 12, 2018

rebased

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Dec 12, 2018

NIce work here @chu11. As discussed, cleanup could happen in another PR and potentially after we tag 0.11.0 if we get down to the wire.

@garlick garlick merged commit 43c1c01 into flux-framework:master Dec 12, 2018

3 checks passed

codecov/patch 80.55% of diff hit (target 80.05%)
Details
codecov/project 80.05% (+<.01%) compared to 41515ad
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.