Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: rework internal metadata format to conform to RFC 11 design #1154

Merged
merged 27 commits into from Aug 25, 2017

Conversation

Projects
None yet
5 participants
@chu11
Copy link
Contributor

chu11 commented Aug 18, 2017

This is a series of patches by myself and @garlick to integrate Tree Object Format specified in RFC 11 (https://github.com/flux-framework/rfc/blob/master/spec_11.adoc) in the KVS.

This PR is not a full implementation of RFC11, but is a simplified one, as to make the initial implementation in flux-core a bit easier. It doesn't include the hdir type and all arrays of blobrefs are implemented with a maximum length of 1. To some extent, it's like the RFC11 equivalent of the older implementation.

A lot of code here. After this weekend I want to review this all again with fresh eyes. But we'll start with this and hopefully this it's pretty close to what we want.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 18, 2017

Coverage Status

Coverage increased (+0.08%) to 77.927% when pulling 56d8166 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 18, 2017

Codecov Report

Merging #1154 into master will increase coverage by 0.16%.
The diff coverage is 78.7%.

@@            Coverage Diff             @@
##           master    #1154      +/-   ##
==========================================
+ Coverage   77.56%   77.73%   +0.16%     
==========================================
  Files         158      158              
  Lines       28758    29074     +316     
==========================================
+ Hits        22307    22601     +294     
- Misses       6451     6473      +22
Impacted Files Coverage Δ
src/common/libkvs/kvs_commit.c 86.66% <ø> (ø) ⬆️
src/common/libkvs/kvs_dir.c 92.5% <100%> (+12.77%) ⬆️
src/cmd/flux-kvs.c 76.62% <100%> (ø) ⬆️
src/bindings/lua/flux-lua.c 81.67% <100%> (-0.1%) ⬇️
src/common/libkvs/kvs_lookup.c 68.04% <62.16%> (-16.17%) ⬇️
src/modules/kvs/commit.c 77.77% <73.01%> (+1%) ⬆️
src/modules/kvs/lookup.c 84.69% <74.19%> (-2.4%) ⬇️
src/modules/kvs/kvs.c 63.5% <75%> (+0.3%) ⬆️
src/common/libkvs/kvs_watch.c 87.22% <81.81%> (-0.22%) ⬇️
src/common/libkvs/kvs_txn.c 77.12% <82.43%> (+19.15%) ⬆️
... and 18 more
@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 18, 2017

Coverage Status

Coverage increased (+0.1%) to 77.955% when pulling 807dacc on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 18, 2017

Coverage Status

Coverage increased (+0.2%) to 78.031% when pulling 175ba7a on chu11:rfc11-final into 0c63795 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 18, 2017

Oh yeah, it's probably worth mentioning, soak hasn't been run yet and should.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 18, 2017

Got a valgrind error on one run. Not entirely sure what happened.

==26802== Memcheck, a memory error detector
==26802== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26802== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26802== Command: /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/_build/sub/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/t/valgrind/valgrind-workload.sh 10
==26802== 
2017-08-18T23:13:53.073763Z broker.err[0]: rc1: flux-module: flux_open: No such file or directory
2017-08-18T23:13:53.119203Z broker.err[0]: Run level 1 Exited with non-zero status (rc=1) 0.9s
2017-08-18T23:13:53.384930Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] userdb: No such file or directory
2017-08-18T23:13:53.485824Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] cron: No such file or directory
2017-08-18T23:13:53.559402Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] job: No such file or directory
2017-08-18T23:13:53.670238Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] resource-hwloc: No such file or directory
2017-08-18T23:13:53.739372Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] aggregator: No such file or directory
2017-08-18T23:13:53.822058Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] kvs: No such file or directory
2017-08-18T23:13:53.899850Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] barrier: No such file or directory
2017-08-18T23:13:53.985997Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] content-sqlite: No such file or directory
==26802== 
==26802== HEAP SUMMARY:
==26802==     in use at exit: 81,357 bytes in 61 blocks
==26802==   total heap usage: 7,400 allocs, 7,339 frees, 2,239,782 bytes allocated
==26802== 
==26802== LEAK SUMMARY:
==26802==    definitely lost: 0 bytes in 0 blocks
==26802==    indirectly lost: 0 bytes in 0 blocks
==26802==      possibly lost: 0 bytes in 0 blocks
==26802==    still reachable: 81,357 bytes in 61 blocks
==26802==         suppressed: 0 bytes in 0 blocks
==26802== Reachable blocks (those to which a pointer was found) are not shown.
==26802== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26802== 
==26802== For counts of detected and suppressed errors, rerun with: -v
==26802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
not ok 1 - valgrind reports no new errors on single broker run
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 19, 2017

goto inval;
}
else if (!strcmp (type, "val")) {
}

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

I think that there can be a json_is_string() check here? B/c we are base64 encoding, the data should always be a string?

This comment has been minimized.

@garlick

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

what is the reference to "test_validate" in the commit message?

This comment has been minimized.

@chu11

chu11 Aug 22, 2017

Author Contributor

crud, I meant treeobj_validate. That commit message will be eliminated when I squash later on.

char *data;

if (json_unpack (obj, "{s:i s:s s:s}", "ver", &version,
"type", &type,

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

could use treeobj_unpack() here?

This comment has been minimized.

@garlick

garlick Aug 21, 2017

Member

agreed.

count = json_object_size (data);
}
else if (!strcmp (type, "symlink") || !strcmp (type, "value")) {
count = 1;

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

should be "val", not "value". A unit test should have caught this, so perhaps a unit test is missing.

/* get type-specific count.
* For dirref/valref, this is the number of blobrefs.
* For directory, this is number of entries
* For symlink or value, this is 1.

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

val instead of value for consistency.

return 0;
error:
json_decref (dirent);

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

do we have to do saved_errno here?

goto done;
rc = 0;
done:
json_decref (dirent);

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

do we have to do saved_errno here?

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

It might be clearer to have a dedicated error label that explicitly does a

error:
    saved_errno = errno;
    // ...cleanup...
    errno = saved_errno
    return -1

The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno in an exit path.

goto done;
rc = 0;
done:
json_decref (dirent);

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

do we have to do saved_errno here?

goto done;
rc = 0;
done:
json_decref (dirent);

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

do we have to do saved_errno here?

if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) {
free_ctx (ctx);
flux_future_destroy (f);
}

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

I think there should be a return NULL here?

}
if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) {
free_ctx (ctx);
flux_future_destroy (f);
}

This comment has been minimized.

@chu11

chu11 Aug 21, 2017

Author Contributor

I think there should be a return NULL here?

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 21, 2017

Couple general responses to your review comments above, in addition to the ones I made inline:

  • On saved_errno, I thought we determined free() and json_decref() didn't set errno, but it won't hurt to add that in
  • Missing return in aux_set error paths - grievous error! Good catch!
  • use of val vs value - yes cleaning that up is good.

If you already have these changes made please go ahead and tack them on; o/w let me know and I can do them

Apologies for taking longer than I thought to get back on this.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 21, 2017

@garlick I can't remember if we came to a conclusion if json_decref was safe or not, as it may be safe now, but would it in the future? I'll go ahead and do the remaining fix ups.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 21, 2017

@garlick Oh, for what its worth you decided to do it here. So we should probably do saved_errno around json_decref atleast for consistency.

void flux_kvs_txn_destroy (flux_kvs_txn_t *txn)
{
    if (txn) {
        int saved_errno = errno;
        json_decref (txn->ops);
        free (txn);
        errno = saved_errno;
    }
}
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 21, 2017

works for me.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 21, 2017

Coverage Status

Coverage increased (+0.008%) to 77.854% when pulling 8259d70 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 21, 2017

Coverage Status

Coverage increased (+0.03%) to 77.88% when pulling a3cacf1 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 22, 2017

Seem to have a low coverage diff. There's a few bad-input and missed code paths that I think we can improve on. Can add some unit tests.

Noticed flux_kvs_lookup_get_raw() was completely untested. Was going to leave flux-kvs comand raw input/output options for another PR, as its sort of independent of this PR (see chu11@c5b36e5), so perhaps the flux_kvs_lookup_get_raw() patches should be spliced out and put in there as well for another day?

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 22, 2017

Yeah that sounds like a good idea to me,

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 22, 2017

Coverage Status

Coverage increased (+0.09%) to 77.934% when pulling 7581258 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 22, 2017

Coverage Status

Coverage increased (+0.2%) to 78.039% when pulling 229cf18 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 22, 2017

Coverage Status

Coverage increased (+0.1%) to 77.952% when pulling 229cf18 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 22, 2017

BTW the valgrind error above is a recurrence of the one @grondo reported in #677 and added some code to address in PR #1148. If this continues to be a problem, we might want to increase the FLUX_LOCAL_CONNECTOR_RETRY_COUNT in the valgrind test script a bit more.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 22, 2017

code cov diff is now 76.61%, with a target of 77.49%. Once rebase + squash is done and patch for flux_kvs_lookup_get_raw() is spliced out we should be well above target (or close enough we don't care). So I think we're good here now.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 22, 2017

Yeah, nice work. Sometimes it is difficult to get patch coverage up dependent on what kind of error handling you have. Admirable job with the coverage here!

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 22, 2017

Yep, still doing some review/poking but I've not found anything worthy of a comment yet. I'll try to wrap that up this morning so we can move forward.

@garlick garlick referenced this pull request Aug 22, 2017

Closed

0.8.0 Release #1160

@garlick
Copy link
Member

garlick left a comment

Looks great on the whole. I've made a few minor comments inline.

errno = 0;
rc = flux_kvs_txn_pack (txn, 0xFFFF, "foo.bar.blorp", "s", "foo");
ok (rc < 0 && errno == EINVAL,
"3: flux_kvs_txn_pack(bad flags) fails with EINVAL");

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

Good addition, however I was misguidedly numbering the operations that are added to the transaction and referencing them later in tests, sort of a way to x-reference. This expected failure doesn't add an operation, so it shouldn't get a number. But maybe it would be a good idea to change the numbers to some unordered mnemonic lie "op-unlink"?

This comment has been minimized.

@chu11

chu11 Aug 22, 2017

Author Contributor

ahhh, i thought you were just plainly numbering from 1-9. So yeah, I'll adjust accordingly.

goto done;
rc = 0;
done:
json_decref (dirent);

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

It might be clearer to have a dedicated error label that explicitly does a

error:
    saved_errno = errno;
    // ...cleanup...
    errno = saved_errno
    return -1

The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno in an exit path.

goto inval;
}
else if (!strcmp (type, "val")) {
}

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

what is the reference to "test_validate" in the commit message?

if (j_dirent_validate (root_dirent) < 0
|| !(root_ref = json_object_get (root_dirent, "DIRREF"))) {
if (treeobj_validate (root_dirent) < 0
|| treeobj_is_dirref (root_dirent) < 0

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

treeobj_is_dirref() returns a boolean not -1 or 0

This comment has been minimized.

@chu11

chu11 Aug 22, 2017

Author Contributor

This was fixed in a prior commit, wondering why you didn't see it?

175ba7a

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

I was reviewing each commit and going back and forth to the final source. I guess I missed that this one was fix later - sorry about that.

@@ -42,7 +42,7 @@
#include "src/common/libutil/monotime.h"

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

Extra to/for in commit message.
Also maybe "main source module" would be better than "kvs"?

(sorry for placement of this comment - couldn't comment on the commit title directly)

@@ -2,29 +2,185 @@
#include "config.h"

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

improve commit message, like

test/kvs_dir: increase test coverage

the words in the body don't add anything, just drop.

@@ -38,7 +38,7 @@

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

perhaps say "internal lookup API" since there is an external one too

@@ -606,16 +629,30 @@ bool lookup (lookup_t *lh)
lh->errnum = ENOTDIR;
goto done;
}
reftmp = json_string_value (vp);
if (!(reftmp = treeobj_get_blobref (lh->wdirent, 0))) {

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

is it useful to check that the number of blobrefs is always one? Might be handy later when we make changes to allow more than one.

This comment has been minimized.

@chu11

chu11 Aug 22, 2017

Author Contributor

yeah, I think that's good check to add.

@@ -348,6 +348,88 @@ void test_dir (void)
json_decref (dir);

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

maybe replace comment about json_copy() with

json_copy(), which makes a shallow copy of a JSON object, cannot be used directly on a RFC 11 dir object, since the directory itself is one level down under the "data" key.

@@ -37,7 +37,7 @@
#include <jansson.h>

This comment has been minimized.

@garlick

garlick Aug 22, 2017

Member

(commit messge): how about

update internal commit API for RFC 11

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 23, 2017

just pushed a branch with a lot of the fixes listed above. Didn't fix the commit messages, as I'll do that when I squash all of these minor fixes into the earlier commits.

chu11 added some commits Aug 4, 2017

modules/kvs/test: Add new lookup tests
Add new unit tests, detecting if a valref points to an invalid
type, dirref points to an invalid type, caller passes in
illegal root reference, or if dirref/valref have multiple references.
common/libkvs: Add treeobj_copy_dir()
Add treeobj_copy_dir() to perform a shallow copy of a treeobj
dir.  json_copy() is not safe for copying.

Add unit tests appropriately.
modules/kvs: Refactor store_cache() logic
Refactor the logic of store_cache(), so that it no longer steals
references, but lets the caller keep ownership.  Simplifies logic
in several places and makes code easier to understand.
modules/kvs/test: Add new commit tests
Add test for invalid treeobj object as root and dirrefs that
have multiple references.
modules/kvs/test: Cleanup commit tests
Remove unnecessary code, in particular with error type tests.
t/kvs: Fix error in null tests
In original test of setting the string "null" within the KVS there
was an error that worked out by chance but was incorrect.

When doing a kvs put with the string "null", such as with:

flux kvs put key=null

this should create a json null object.  On the other hand:

flux kvs put key=\"null\"

should create a string "null".

Correct for this and create tests for both scenarios.
cmd/kvs,t/kvs,bindings/lua: Fix readlink w/ treeobj
With change to treeobj implementation, readlink code should be
adjusted, calling flux_kvs_lookup_get() instead of
flux_kvs_lookup_get_unpack().
t/kvs: Update t1002-kvs-extra tests for RFC 11
Update t1002-kvs-extra.t tests for RFC 11, notably adjusting
former dirent types (i.e. FILEVAL) for treeobj types (i.e. "val").
t/kvs: Add more tests to t1002-kvs-extra
Add additional bad treeobj test cases.
common/libkvs: Remove jansson_dirent
Remove jansson_dirent internal library, as it is no longer used.

Remove lingering includes of jansson_dirent.h.

@chu11 chu11 force-pushed the chu11:rfc11-final branch from 89e143e to de398d8 Aug 25, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 25, 2017

rebased

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 25, 2017

hit write error and #731 , restarting some builds

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 25, 2017

Coverage Status

Coverage increased (+0.2%) to 78.079% when pulling de398d8 on chu11:rfc11-final into 53e3f12 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 25, 2017

There are some calls to log_msg() added in this PR. Just double checking that the kvs module wants to log to stderr and not to the flux broker ring buffer?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 25, 2017

@grondo The flux handle isn't available in the internal kvs lookup/commit API, which is why I used the log_msg() function.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 25, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Aug 25, 2017

@garlick We could just pass it in as a parameter to any number of the internal lookup/commit API functions. But not being dependent on the flux handle was part of the initial refactoring goal for these APIs, so that the APIs were independently unit testable.

We could pass in the handle optionally, and only log if the handle is available? But I think that's sort of ugly. I'd rather just remove the log_msg() calls entirely.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 25, 2017

I think the logging, especially of unexpected situations, is pretty valuable - but so is the unit testability. Could we fix by adding a log wrapper of some sort?

I would also be OK with leaving this as is and opening an issue to come back to it later.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 25, 2017

Ok, I didn't see any other issues with the PR and tested it on a few different machines (at least with make check). If you guys are ok with pushing logging changes off I'll merge this one.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 25, 2017

I'm good.

@grondo grondo merged commit 1c99b02 into flux-framework:master Aug 25, 2017

4 checks passed

codecov/patch 78.7% of diff hit (target 77.56%)
Details
codecov/project 77.73% (+0.16%) compared to 53e3f12
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 78.079%
Details
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 25, 2017

Great! Thanks!

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 25, 2017

Nice work @chu11, and thanks @grondo for the feedback as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.