Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: add kvs-watch module #1622

Merged
merged 16 commits into from Aug 22, 2018

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Aug 15, 2018

(N.B. just realized I didn't finish reworking commit messages - please hold off on any detailed review)

This PR is a precursor to reworking the KVS watch mechanism. It adds a new module kvs-watch that currently only supports "watching" the root hash of a namespace. (I thought this was complicated enough for a PR, and the generic watch could wait for another PR).

This PR adds the following API:

/* Request the current KVS root hash for namespace 'ns'.
 * If flags = FLUX_KVS_WATCH, a response is sent each time the root
 * hash changes.  In that case, the user must call flux_future_reset()
 * after consuming the response to re-arm the future for the next response.
 */
flux_future_t *flux_kvs_getroot (flux_t *h, const char *ns, int flags);

/* Decode KVS root hash response.
 *
 * treeobj - get the hash as an RFC 11 "dirref" object.
 * blobref - get the raw hash as a n RFC 10 "blobref".
 * sequence - get the commit sequence number
 * owner - get the userid of the namespace owner
 */
int flux_kvs_getroot_get_treeobj (flux_future_t *f, const char **treeobj);
int flux_kvs_getroot_get_blobref (flux_future_t *f, const char **blobref);
int flux_kvs_getroot_get_sequence (flux_future_t *f, int *seq);
int flux_kvs_getroot_get_owner (flux_future_t *f, uint32_t *owner);

/* Cancel a FLUX_KVS_WATCH "stream".
 * Once the cancel request is processed, an ENODATA error response is sent,
 * thus the user should continue to reset and consume responses until an
 * error occurs, after which it is safe to destroy the future.
 */
int flux_kvs_getroot_cancel (flux_future_t *f);

My plan for regular watch API is to simply allow the FLUX_KVS_WATCH flag to be passed to flux_kvs_lookup().

I added a new flux kvs getroot subcommand, a man page, and a sharness test.

Also, since flux_kvs_get_version() is a synchronous RPC interface that accomplishes the same thing as flux_kvs_getroot() + flux_kvs_getroot_get_sequence(), and had no users except tests, I went ahead and eliminated the former and also the flux kvs version subcommand, and updated tests accordingly.

garlick added some commits Jun 29, 2018

modules/kvs: publish key array with setroot event
Problem: new watch implementation needs to know which
keys are potentially modified in a KVS transaction.

Convert the array of 'operation' objects to an array
of keys when the transaction is created, and include
it in the setroot event.
modules/kvs/test: use real op in unit test
Problem: unit test that passes a string instead
of a transaction op as a shortcut fails when ops
are processed into keys.

Construct a real operation object instead of taking
a shortcut with the json_string.
libflux/rpc: add flux_rpc_get_matchtag()
Problem: It is difficult to implement multi-response
streaming RPC services that allow the client to terminate
the stream, because the flux_rpc() user (client) cannot
reconstruct the (sender-id, matchtag) tuple that uniquely
identifies a request on the server end.

Add an accessor for the flux_rpc() request matchtag.
The client can send a new request to the server that
contains this matchtag in its payload.  The server can
then extract the sender-id from the request route stack
and use the tuple to look up its state for the "stream"
and cancel it, sending an ENODATA response per RFC 6.

Note: this is not a basis for the "RPC cancellation"
feature that has been deferred for some time, since it
requires the client and server to implement the cancel
RPC out of band with respect to the original RPC.
However, practical experience with this may inform
a future design.
libkvs: add FLUX_KVS_WATCH flag
Add FLUX_KVS_WATCH to be used with the new flux_kvs_getroot()
to indicate that, after the current root has been returned,
each change to the root should generate another response.

Eventually, flux_kvs_lookup() will accept this flag to allow
keys to be watched through the lookup interface.
libkvs/getroot: add new API functions for getroot
Add API functions for sending a kvs.getroot RPC
to the kvs service to get the current root of a
KVS namespace.

If the FLUX_KVS_WATCH flag is set, divert the RPC
to the kvs-watch service, which in addition to
responding with the current root, will respond
each time the root changes after that.

flux_kvs_getroot_cancel() sends a cancellation
request to "unwatch" a namespace.  Unlike
flux_kvs_unwatch(), which finds the request to
cancel by sender and KVS key, this function
cancels by sender and matchtag, which can be
more precise.
modules/kvs: publish owner with setroot event
Problem: new watch implementation needs to know the
owner of a namespace in order to authenticate a user's
request to watch it.

Include the namespace owner with the setroot event.

@garlick garlick force-pushed the garlick:getroot branch from 3fc9ed6 to 8693eb6 Aug 15, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 16, 2018

Coverage Status

Coverage increased (+0.05%) to 79.578% when pulling 896ac45 on garlick:getroot into 62a8aa5 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 16, 2018

restarted a builder that failed with write error

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 16, 2018

well that builder hit another write error, but I just noticed your comment that you're pushing some updates so I'll leave it for now.

@garlick garlick force-pushed the garlick:getroot branch from 8693eb6 to 0c57055 Aug 16, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 16, 2018

OK, just added a long comment to the kvs-watch module commit (3bd9ab4). That was the most egregious omission.

I'll likely need to do some work on the test coverage as it seems a bit low.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 16, 2018

Restarted two builders that failed on write error.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 17, 2018

Also, since flux_kvs_get_version() is a synchronous RPC interface that accomplishes the same thing as flux_kvs_getroot() + flux_kvs_getroot_get_sequence(), and had no users except tests, I went ahead and eliminated the former and also the flux kvs version subcommand, and updated tests accordingly.

On second thought, I think this belongs in a separate PR. I'll snip that part off.

@garlick garlick force-pushed the garlick:getroot branch from 0c57055 to 501893e Aug 17, 2018

@chu11
Copy link
Contributor

chu11 left a comment

Other than 1 nit I commented on, I think you're missing a unit test addition for kvstxn_get_keys() in test/kvstxn.c. And I don't think I saw a test for kvs-watch.stats.get?

@@ -305,6 +325,13 @@ static struct optparse_subcommand subcommands[] = {
0,
NULL
},
{ "getroot",
"[-s] [-o]",

This comment has been minimized.

@chu11

chu11 Aug 17, 2018

Contributor

I think there are missing options here.

Also, b/c you can only specify one of s, o, or b, shouldn't it be [-s | -b | -o]? (and in the manpage too).

This comment has been minimized.

@garlick

garlick Aug 17, 2018

Author Member

Yep.

{ "wait",
"version",
"Block until the KVS reaches version",
cmd_wait,
0,
NULL
},
{ "getroot",
"[-s] [-o]",

This comment has been minimized.

@chu11

chu11 Aug 17, 2018

Contributor

I think there are missing options here.

Also, b/c you can only specify one of s, o, or b, shouldn't it be [-s | -b | -o]? (and in the manpage too).

This comment has been minimized.

@garlick

garlick Aug 17, 2018

Author Member

Yes good catch!

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Aug 17, 2018

@garlick looks good, although there's one subtlety I thought of. As I think about it, there's no unit test, so I'm not 100% of what the behavior is or isn't. We should definitely add a test though.

In the old kvs_watch, there is an explicit json_equal() call to ensure that the value has actually changed. So I think that a

flux kvs put a=1
flux kvs put a=1

wouldn't lead to a watch change.

I think in your code the setroot send all keys in the transaction, but it doesn't matter if the value actually changed?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 17, 2018

In the old kvs_watch, there is an explicit json_equal() call to ensure that the value has actually changed.

This PR lays the groundwork for watching keys, but the actual code for watching keys will come in another PR. I'm open to suggestions on what the semantics should be with respect to keys that are involved in a commit that doesn't change their value. If we want to maintain the old semantics, we can simply ignore a getroot update where the key value is the same as before.

For "getroot", I thought it should report every commit, even if the root blobref doesn't change. What do you think? (I guess the main use for the getroot API will be testing).

I'm working on tests now. Thanks for that feedback!

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Aug 17, 2018

This PR lays the groundwork for watching keys, but the actual code for watching keys will come in another PR.

Ahh ok. Perhaps I was thinking too far out, should wait until the implementation is done.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 17, 2018

For "getroot", I thought it should report every commit, even if the root blobref doesn't change

Just to weigh in, it would be less surprising to me if the getroot watch returned all commit updates as implemented.

Perhaps at a later time there could be an optimization to send a flag which says "only send me an update if the ref actually changed."

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 17, 2018

Ahh ok. Perhaps I was thinking too far out, should wait until the implementation is done.

Would appreciate your opinion if you do find time to ponder it though.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Aug 17, 2018

As @grondo says, it makes more sense to send all the keys that were updated. Re-reading #1556, I think I see the overall future design better now and I think I see how things will work.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 17, 2018

Sorry I had forgotten to reference #1556. Thanks for mentioning that.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 18, 2018

Codecov Report

Merging #1622 into master will increase coverage by 0.04%.
The diff coverage is 81.11%.

@@            Coverage Diff             @@
##           master    #1622      +/-   ##
==========================================
+ Coverage   79.36%   79.41%   +0.04%     
==========================================
  Files         171      173       +2     
  Lines       31474    31885     +411     
==========================================
+ Hits        24979    25320     +341     
- Misses       6495     6565      +70
Impacted Files Coverage Δ
src/common/libflux/rpc.c 92.68% <100%> (-0.66%) ⬇️
src/modules/kvs/kvs.c 65.87% <100%> (ø) ⬆️
src/modules/kvs-watch/kvs-watch.c 78.03% <78.03%> (ø)
src/modules/kvs/kvstxn.c 79.14% <78.94%> (-0.01%) ⬇️
src/cmd/flux-kvs.c 81.34% <80%> (-0.1%) ⬇️
src/common/libkvs/kvs_getroot.c 92% <92%> (ø)
src/common/libflux/handle.c 83.66% <0%> (-2.23%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/common/libkvs/kvs_txn.c 74.71% <0%> (-0.57%) ⬇️
... and 13 more
@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Aug 20, 2018

Overall looks good. Noticed you forgot to update:

*getroot* [-w] [-c count] [-s] [-o] [-b]::

in the manpage to be [-s | -o | -b].

It seems some unit tests to test for invalid inputs in many kvs_getroot.c functions should eek out a few more lines of code coverage.

For some reason, it appears remove_cb wasn't covered at all, but you definitely have a test for namespace removal and that test works.. Not sure if the codecov isn't showing the right thing.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 20, 2018

Thanks @chu11

That "namespace removal" test is disabled because it was hanging so I need to fix and repush.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 21, 2018

How's that look @chu11? LMK and I'll squash the incremental devel.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Aug 22, 2018

LGTM, squash away

garlick added some commits Jul 5, 2018

modules/kvs-watch: new module for KVS watch service
Add a new module 'kvs-watch' to track KVS commit events
and notify interested parties when the KVS changes.

This was implemented separately from the main KVS module
to avoid adding more complexity.  Eventually this service
can replace the current KVS watch machinery, but for now
it is built up independently.

The initial version implements a "kvs-watch.getroot"
method that sends an immediate response containing the
current root, and additional responses each time the watched
namespace changes.  It interacts with the main kvs module by
sending it an initial "kvs.getroot" RPC, and then tracking
changes via published "kvs.setroot" events.  Watcher state
is shared among multiple clients.  Event subscriptions are
established on-demand.

The user may make a "kvs.getroot" RPC to receive the
current root information in one response, or a
"kvs-watch.getroot" RPC to receive the current and
future changes in multiple responses.  These services
use compatible protocols.  The API will select which
service to used based on a flags setting.  Eventually
this approach will allow the now separate flux_kvs_watch()
API to be folded in with flux_kvs_lookup().

The stream of responses continues until the request is
explicitly cancelled or the client disconnects.  Explicit
cancellation consists of sending a "kvs-watch.cancel"
request containing the requestor UUID and matchatg.  This
request does not receive a response;  the response stream
is terminated with an error response (errnum = ENODATA)
as specified in RFC 6.  The sender should wait for this
error response before recycling the original request matchtag.
cmd/flux-kvs: add getroot subcommand
New flux kvs subcommand, usage:

flux kvs [OPTIONS] getroot [--watch] [--count=COUNT]
                           [--owner|--sequence|--blobref]

@garlick garlick force-pushed the garlick:getroot branch from c24c616 to 896ac45 Aug 22, 2018

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 22, 2018

Squashed. Ready to merge I think?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 22, 2018

Probably unrelated to this PR, but when I was playing on this branch I accidentally performed:

$ flux kvs watch -d foo &
$ flux kvs put foo=bar

and got a strange (jansson parse?) error in flux-kvs

flux-kvs: foo: invalid token near 'bar' (line 1 column 3)

Should I open an issue? Seems like expected behavior would be that flux kvs watch -d would return sensible error when watched key isn't a directory?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 22, 2018

Yeah not related, but sure open an issue, seems like a bug.

@chu11 chu11 merged commit 4a2674b into flux-framework:master Aug 22, 2018

4 checks passed

codecov/patch 81.11% of diff hit (target 79.36%)
Details
codecov/project 79.41% (+0.04%) compared to 62a8aa5
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.05%) to 79.578%
Details
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 22, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.