Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KVS blobref access functions #801

Merged
merged 14 commits into from Sep 12, 2016

Conversation

Projects
None yet
5 participants
@garlick
Copy link
Member

garlick commented Sep 9, 2016

This PR implements the KVS functions suggested in issue #778 for accessing the blobref of a key, and for creating a key with a blobref in place of a value. The new functions are

/* Get blobref associated with a key.  Caller must free.
 * If value is stored "by value", return EINVAL.
 */
int kvs_get_value_blobref (flux_t h, const char *key, char **blobref);
int kvs_get_directory_blobref (flux_t h, const char *key, char **blobref);

/* Create a reference to a value/directory object stored in content cache.
 */
int kvs_put_value_blobref (flux_t h, const char *key, const char *blobref);
int kvs_put_directory_blobref (flux_t h, const char *key, const char *blobref);

In addition the kvs.get and kvs.commit protocols were cleaned up, switching from a dictionary with a plethora of boolean flags to JSON requests more like those used in other parts of Flux, and an integer flags mask. And the client side "get" variants were refactor a little to reduce duplication.

Commands were added for testing the new functions. Here's a command to create a root snapshot

$ flux kvs put-blobref -d root-snapshot-1 `flux kvs get-blobref -d .`
$ flux kvs dir 
resource.
root-snapshot-1.
$

Finally kvs_copy() was modified to link blobrefs rather than recursively copy entire directory hierarchies. (kvs_move() is implemented on kvs_copy() so it benefits too).

I still need to add some tests here.

@garlick garlick added the review label Sep 9, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 9, 2016

Forgot to mention one other semi-gratuitous change in this PR: when calling kvs_put() on an object larger in size than a blobref, we used to do a synchronous content store in the kvs_put(). I changed that so even large objects are sent as part of the commit message. I'm not entirely sure what the effect will be, but the rationale was you could use kvs_put_value_blobref() to write a truly large value if there was concern about overwhelming the KVS module on rank 0, but that for fairly small objects, latency for a put+commit would be reduced by the RTT of the content store op (which recurses to rank 0).

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 9, 2016

Current coverage is 74.80% (diff: 83.33%)

Merging #801 into master will increase coverage by 0.03%

@@             master       #801   diff @@
==========================================
  Files           145        145          
  Lines         24968      24878    -90   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          18668      18609    -59   
+ Misses         6300       6269    -31   
  Partials          0          0          
Diff Coverage File Path
••••••• 76% src/modules/kvs/libkvs.c
••••••• 79% src/cmd/flux-kvs.c
•••••••• 86% src/modules/kvs/json_dirent.c
••••••••• 91% src/modules/kvs/kvs.c
•••••••••• 100% src/modules/kvs/proto.c

Powered by Codecov. Last update ec5bd1f...c94f490

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 9, 2016

Coverage Status

Coverage decreased (-0.2%) to 74.789% when pulling ce8eb5a on garlick:issue_778 into 3fd9da0 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 9, 2016

Wow, this is very cool!

Just for my own curiosity, why are the calls split between values and dirs? It would be slick if there were only 2 calls here, but I understand if there is a technical reason -- just curious about it.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 9, 2016

Put needs the two variants (or it could easily be one function with flags) so that we don't have to try to parse every blob to decide if it's a directory or not. Its directory entry needs to tag it as a value or a directory, so we need to know that.

It's less clear with get, but I didn't want to make it easy for someone to do a get on a key that they thought was a directory, inadvertantly get a value, and then assign it to a directory.

Which gets right to one of my concerns about this PR, which is we're allowing "users" direct access to the metadata, thus we now have to be concerned about making the kvs module robust enough to not crash if random data is assingned to a directory. I'll try to add some tests for that.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 9, 2016

Another somewhat awkward thing about these functions is that get returns an error (EINVAL) if the entry is stored by value rather than by reference, and the heuristic used to decide whether that will be the case is an internal detail that perhaps should not be exposed.

I wonder if it would be better all around if this switched to two functions

int kvs_get_entry (flux_t h, const char *key, char **entry);
int kvs_put_entry (flux_t h, const char *key, const char *entry);

where "entry" is not a blobref but the directory entry. Then they could be agnostic with regard to type, and also whether it is a blobref or an actual value, or the proposed RFC11 "treeobj" where a value could be a list of blobrefs.

The catch is then you can't go directly to the content store with the result, so maybe we would need some public functions for picking apart the entries.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 9, 2016

What you propose does sound cleaner -- can you give some examples of what an "entry" looks like? Would this also expose internal details, or is the entry an instance of a standard format described in the RFC?

I think a way to create or get an entry, manipulate it, and link it back in the KVS actually might end up being quite powerful! (If I understand correctly)

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 9, 2016

Current "entries" looks like this:

{"FILEREF": "sha1-8714e0ef31edb00e33683f575274379955b3526c"}
{"DIRREF": "sha1-6eadd3a778e410597c85d74c287a57ad66071a45"}
{"FILEVAL": 42}
{"DIRVAL": {}}
{"LINKVAL": "c.d"}

DIRVAL value is a dict of names versus entries. FILEVAL value is any JSON object.

An approach might be to provide entry get/set now with the expectation that initial users would treat entries opaquely (only put an entry obtained from a get), then fill in offline manipulation functions when switching to RFC 11 treeobj.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 9, 2016

That sounds like a nice solution to me!

@garlick garlick force-pushed the garlick:issue_778 branch from ce8eb5a to ed13413 Sep 12, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.1%) to 75.073% when pulling ed13413 on garlick:issue_778 into 3fd9da0 on flux-framework:master.

@garlick garlick force-pushed the garlick:issue_778 branch from ed13413 to 591e06e Sep 12, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.08%) to 75.05% when pulling 591e06e on garlick:issue_778 into 3fd9da0 on flux-framework:master.

@garlick garlick force-pushed the garlick:issue_778 branch from 591e06e to 817af4d Sep 12, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.2%) to 75.121% when pulling 817af4d on garlick:issue_778 into 3fd9da0 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

This has been reworked as described above. Since the values are supposed to be treated as opaque, I named it "treeobj" even though we haven't yet moved to RFC 11 treeobj KVS metadata.

int kvs_get_treeobj (flux_t h, const char *key, char **treeobj);
int kvs_put_treeobj (flux_t h, const char *key, const char *treeobj);

To make root snapshot:

$ flux kvs put-treeobj snap=$(flux kvs get-treeobj .)

I think the change from blobref to treeobj is a good one. For example, it allowed kvs_copy() to be simplified. (getobj() and kvs_put_dirent() are internal functions that avoid the unnecessary JSON encode/decode required with kvs_get_treeobj() and kvs_put_treeobj()).

int kvs_copy (flux_t h, const char *from, const char *to)
{
    JSON dirent;
    if (getobj (h, from, KVS_PROTO_TREEOBJ, &dirent) < 0)
        return -1;
    if (kvs_put_dirent (h, to, dirent) < 0) {
        Jput (dirent);
        return -1;
    }
    return 0;
}

I've added tests, including various mangled treeobj's passed to "put".

There is still a problem where writing a treeobj containing a dangling blobref can cause the KVS to hang when that object is traversed in a kvs_get(). This was covered in #792, but it's worse with this PR since this condition can now be created artificially with kvs_put_treeobj(). I prefer not to fix here as it would require some refactoring that is needed for #792 (will fix that in a new PR).

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

If treeobj will potentially have a set of accessors and thus perhaps graduate to a "type" would it be better to have it typedef'd to some abstract type now? I don't feel strongly about this so if there is some benefit to keeping it as a string, that is fine with me.

This is looking very nice to me

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

would it be better to have it typedef'd to some abstract type now

I should ask a more general question then: do we want a C API to pick apart these objects, or should we just define the JSON structure in the RFC 11 and let users use their chosen JSON library to manipulate them? It sometimes seems like overkill to add iterators and accessors to abstract times built from JSON, propagate those to bindings, etc., especially when bindings may have great JSON integration. Thoughts?

"put" would still have validation of course.

garlick added some commits Sep 7, 2016

modules/kvs: commit large values to object store
When unrolling the rootdir copy in commit processing, if
any FILEVAL directory objects are encountered that serialize
to a string longer than SHA1_STRING_SIZE, initiate store to
local cache/content store and convert to a FILEREF.

The KVS module can do this in parallel with other commit processing,
whereas the library code does it synchronously.
modules/kvs: client sends large objects by value
Drop conditional code in kvs_put() that would write large
objects to the content store (synchronously), then commit
it by reference rahter than by value.  Instead commit
all objects by value and let the kvs rank 0 module decide what
should be storeed as an independent object in commit processing.
modules/kvs: factor out kvs_put_dirent()
Restructure kvs_put() code so that kvs_put_dirent()
is exposed internally.  This function will be used to
implement kvs_put_*_blobref().
modules/kvs: add kvs_put_treeobj() function
Add kvs_put_treeobj() which links a key to a directory
entry object.  For now the only way to get a directory
entry object will be from kvs_get_treeobj(), to be
added in a later commit.

Add some cursory checks on the directory entry argument
to ensure incorrect metadata is added to the KVS.

Fixes #778
modules/kvs: simplify proto for kvs.get, kvs.watch
Clean up the JSON protocol for kvs.get and kvs.watch.
Eliminate dictionary method left over from when multiple
keys were supported by the protocol, and use explicit
"key" and "val" fields.

Add an integer "flags" field to replace the optional
".arg_*" dictionary flags.

Update codec unit tests.
test/kvs: kvs disconnect test must use codec
Problem: helper for kvs sharness test, "watch_disconnect"
was constructing its own kvs.watch request messages,
which broke when the protocol changed.

Link the test against src/modules/kvs/proto.o so that
this is no longer hardwired.
modules/kvs: move root lookup to lookup() func
Refactor common code to look up the root directory
from watch_request_cb() and get_request_cb() into the
lookup() function.  Call lookup() with a NULL root value
to indicate that the root should be looked up.
modules/kvs: add KVS_PROTO_TREEOBJ flag
If the KVS_PROTO_TREEOBJ flag is passed in to kvs.get or
kvs.watch, the dirent is returned in place of a value.
modules/kvs: avoid duplication in kvs_get variants
Factor out private "getobj" function that accepts a flags
argument reimplement kvs_get_<type>() functions using it.
This reduces duplicated code and prepares for a simple
implementation of kvs_get_blobref functions.
modules/kvs: add kvs_get_treeobj()
Add kvs_get_treeobj(), which given a key, obtains a directory
entry object.  The directory entry object may be passed
to kvs_put_treeobj().

garlick added some commits Sep 8, 2016

modules/kvs: reimplement kvs_copy(), kvs_move()
kvs_copy() and kvs_move() are trivially reimplemented
using treeobj operations.  We simply overwrite the destination
with a copy of the source's directory entry.

@garlick garlick force-pushed the garlick:issue_778 branch from 817af4d to 52a9331 Sep 12, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

Rebased after merge of lua fixes.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

I should ask a more general question then: do we want a C API to pick apart these objects, or should we just define the JSON structure in the RFC 11 and let users use their chosen JSON library to manipulate them? It sometimes seems like overkill to add iterators and accessors to abstract times built from JSON, propagate those to bindings, etc., especially when bindings may have great JSON integration. Thoughts?

Yes, keeping treeobj as JSON seems great to me! I hadn't quite wrapped my brain around what the objects would eventually look like and how they would be used, and the description of them as "opaque" got my thinking sideways, sorry! I think your plan to treat them like the rest of the JSON messages we use in Flux is most wise.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

My goal was to have the objects be pretty simple. A first cut is outlined in RFC11.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.08%) to 75.16% when pulling 52a9331 on garlick:issue_778 into ec5bd1f on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

By the way, rereading what I wrote above, I did refer to object manipulation functions coming later, so my apologies to you for letting my thoughts drift and not clarifying!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

No my fault, I was confused by "values are supposed to be treated as opaque" statement.

Is this close to a merge candidate?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

Is this close to a merge candidate?

I think so but happy to address any concerns.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

Not sure exactly why yet, but I'm getting one failure from t1000-kvs-basic.t on this branch:
(on TOSS3 system)

expecting success: 
    flux kvs unlink $TEST &&
    dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs $TEST.a - &&
    flux kvs get-treeobj $TEST.a | grep -q "FILEREF"

0+1 records in
0+1 records out
12 bytes (12 B) copied, 6.2178e-05 s, 193 kB/s
not ok 49 - kvs: get-treeobj: returns value ref for large value
#   
#       flux kvs unlink $TEST &&
#       dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs $TEST.a - &&
#       flux kvs get-treeobj $TEST.a | grep -q "FILEREF"
#   

(Also, should you use /dev/urandom here in case entropy is low test won't hang?)

Same test in a session, looks like a FILEVAL is created not a FILEREF?

(flux-70996-) grondo@jade188:~/git/f$ dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs x.a -
0+1 records in
0+1 records out
16 bytes (16 B) copied, 7.1836e-05 s, 223 kB/s
(flux-70996-) grondo@jade188:~/git/f$ flux kvs get x.a
{ "data": "gtMAs7gkRJApE0HrIeBqvQ==" }
(flux-70996-) grondo@jade188:~/git/f$ flux kvs get-treeobj x.a
{"FILEVAL":{"data":"gtMAs7gkRJApE0HrIeBqvQ=="}}

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

Hmm does look like the prob is short read on /dev/random. In this case we
may as well use /dev/zero - just need a decent size blob.

On Sep 12, 2016 11:53 AM, "Mark Grondona" notifications@github.com wrote:

Not sure exactly why yet, but I'm getting one failure from
t1000-kvs-basic.t on this branch:

expecting success:
flux kvs unlink $TEST &&
dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs $TEST.a - &&
flux kvs get-treeobj $TEST.a | grep -q "FILEREF"

0+1 records in
0+1 records out
12 bytes (12 B) copied, 6.2178e-05 s, 193 kB/s
not ok 49 - kvs: get-treeobj: returns value ref for large value

flux kvs unlink $TEST &&

dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs $TEST.a - &&

flux kvs get-treeobj $TEST.a | grep -q "FILEREF"

(Also, should you use /dev/urandom here in case entropy is low test won't
hang?)

Same test in a session, looks like a FILEVAL is created not a FILEREF?

(flux-70996-) grondo@jade188:/git/f$ dd if=/dev/random bs=4096 count=1 | flux kvs copy-tokvs x.a -
0+1 records in
0+1 records out
16 bytes (16 B) copied, 7.1836e-05 s, 223 kB/s
(flux-70996-) grondo@jade188:
/git/f$ flux kvs get x.a
{ "data": "gtMAs7gkRJApE0HrIeBqvQ==" }
(flux-70996-) grondo@jade188:~/git/f$ flux kvs get-treeobj x.a
{"FILEVAL":{"data":"gtMAs7gkRJApE0HrIeBqvQ=="}}


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#801 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKX25BqAvVJeIVdkkxucnKFwyXCOmVKks5qpZ-mgaJpZM4J4lh9
.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Sep 12, 2016

Just FYI -- lack of entropy with random came up recently as a user issue and /dev/urandom (which doesn't block) was also recommended with the following link on security: http://www.2uo.de/myths-about-urandom

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

Yeah, I knew that one but apparently it didn't stop me from typing it. Fix coming.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.08%) to 75.16% when pulling 1051a53 on garlick:issue_778 into ec5bd1f on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

Great, thanks for the fix. Want to squash it then I'll merge?

@garlick garlick force-pushed the garlick:issue_778 branch from 1051a53 to c94f490 Sep 12, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Sep 12, 2016

Squashed.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 12, 2016

Thanks! BTW, this functionality + a little magic will really help in testing as in #777.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 12, 2016

Coverage Status

Coverage increased (+0.04%) to 75.121% when pulling c94f490 on garlick:issue_778 into ec5bd1f on flux-framework:master.

@grondo grondo merged commit 3719509 into flux-framework:master Sep 12, 2016

4 checks passed

codecov/patch 83.33% of diff hit (target 74.76%)
Details
codecov/project 74.80% (+0.03%) compared to ec5bd1f
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 75.121%
Details

@grondo grondo removed the review label Sep 12, 2016

@garlick garlick deleted the garlick:issue_778 branch Sep 12, 2016

@garlick garlick referenced this pull request Sep 13, 2016

Closed

Sched support for RFC 14 #192

@garlick garlick referenced this pull request Oct 26, 2016

Closed

0.5.0 release notes #879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.