Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: [cleanup] improve isolation in internal cache #1274

Merged
merged 8 commits into from Nov 8, 2017

Conversation

Projects
None yet
4 participants
@garlick
Copy link
Member

garlick commented Nov 7, 2017

As discussed in #1264, cache_entry_set_raw() and cache_entry_set_treeobj() take ownership of a data/json_t pointer allocated in the caller. The caller must be trusted not to free or modify the data afterward, or the cache will be corrupted. This seems a bit unsafe.

Copy the data, increasing isolation between the cache and its callers.

In the treeobj case, avoid caching the json_t object directly, since it is hard to ensure that the caller won't change it later. We could json_deep_copy() it, but it is perhaps cheaper to skip caching it here, and instead allow the object to be regenerated from the encoded data on first use (if it gets used at all).

Plus one small optimization: avoid an extra call to treeobj_encode in the commit path, where we had to first encode the object to compute its hash key, then again to store it. Just encode it once, then compute the hash key, then store the raw data.

@garlick garlick requested a review from chu11 Nov 7, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.003%) to 78.517% when pulling 1695116 on garlick:cache_copy into d7300ec on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Nov 7, 2017

Codecov Report

Merging #1274 into master will increase coverage by 0.06%.
The diff coverage is 61.29%.

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
+ Coverage   77.88%   77.94%   +0.06%     
==========================================
  Files         154      154              
  Lines       29124    29097      -27     
==========================================
- Hits        22684    22681       -3     
+ Misses       6440     6416      -24
Impacted Files Coverage Δ
src/common/libkvs/kvs.c 65.15% <ø> (+1.11%) ⬆️
src/modules/kvs/commit.c 75.77% <52.94%> (+0.98%) ⬆️
src/modules/kvs/kvs.c 63.75% <59.09%> (+1.12%) ⬆️
src/modules/kvs/cache.c 90.44% <78.94%> (+1.72%) ⬆️
src/common/libflux/request.c 88.46% <0%> (-1.29%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/broker/modservice.c 80.58% <0%> (-0.98%) ⬇️
src/common/libflux/response.c 83.73% <0%> (-0.82%) ⬇️
src/modules/kvs/lookup.c 80% <0%> (-0.48%) ⬇️
src/broker/module.c 83.79% <0%> (-0.28%) ⬇️
... and 6 more
if (!(cpy = malloc (len)))
return -1;
memcpy (cpy, data, len);

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

nit, lingering empty line

* that alrdady has data stored.
* to set new data in a cache entry will silently succeed.
* A treeobj object passed to cache_entry_set_treeobj() will be
* json_decref()'d for a cache entry that alrdady has data stored.

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

typo "already"

* cache_entry_set_treeobj() will be json_decref()'d for a cache entry
* that alrdady has data stored.
* to set new data in a cache entry will silently succeed.
* A treeobj object passed to cache_entry_set_treeobj() will be

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

Is this correct "A treeobj object passed to cache_entry_set_treeobj() will be json_decref()'d for a cache entry that alrdady has data stored."? Doesn't seem to be the case anymore.

data = NULL;

ok (cache_entry_set_raw (e, NULL, 0) < 0
ok (cache_entry_set_raw (e, data, 4) < 0

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

actually, I think we want to keep the cache_entry_set_raw (e, NULL, 0) test, and also add cache_entry_set_raw (e, data, 4) as a new test to detect EBADE. Both are good corner case catches.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

@garlick overall looks good, lots of good cleanup. Just some nits I found along the way.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

Thanks - here come a few commits to address your issues. I'll squash once we're done here.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.04%) to 78.552% when pulling 88043ad on garlick:cache_copy into d7300ec on flux-framework:master.

@garlick garlick force-pushed the garlick:cache_copy branch from 88043ad to e937cc7 Nov 7, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

I went ahead and squashed those fixes. Hope that's OK...

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.04%) to 78.556% when pulling e937cc7 on garlick:cache_copy into d7300ec on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

LGTM, I'm not entirely sure why the merge button isn't green though. Everything seems to have passed. Perhaps something buggy in travis. I'm going to restart the builder.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

hmmm, travis and github don't seem synced up, it's really slow. Will merge whenever that's resolved.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

I restarted one builder that got stuck in the wreck tests (or maybe it was running slow and only got that far -hard to tell).

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

Uh, it looks like I at least missed a comment in store_initial_rootdir(). Possibly that code should be converted from treeobj_hash() + cache_entry_set_treeobj() to treeobj_encode() + cache_entry_set_raw() also...

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

ahhh, just like you did in store_cache(), yeah that could be enhanced too. Although, store_initial_rootdir() is a one time initialization, so avoiding the extra treeobj_encode() shouldn't matter too much.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

We can eliminate cache_entry_set_treeobj() that way though. Hang on, I've got a couple more commits for you to review.

@garlick garlick force-pushed the garlick:cache_copy branch from e937cc7 to 8dcaea2 Nov 7, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

Still need to cull some cache unit tests but wanted your opinion @chu11 on whether dropping set_treeobj() was a step in the right direction? It seemed like it was encouraging callers to encode once to compute the hash key, and again implicitly when setting the cache entry content. It seems better to pick one place to avoid the extra work and potentially getting something wrong in one and corrupting the cache.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.06%) to 78.573% when pulling 8dcaea2 on garlick:cache_copy into d7300ec on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.005%) to 78.52% when pulling e99c171 on garlick:cache_copy into d7300ec on flux-framework:master.

}
}

if (!json_is_null (rootdir))

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

i think it's important to leave a comment here indicating that if an error occurs in prime_cache_with_rootdir(), it's no big deal. We still set the setroot and things are generally fine.

This comment has been minimized.

@garlick

garlick Nov 7, 2017

Author Member

I did make the function return void and put a note about failure being OK over the function. More blatant warning needed you think?

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

yeah, just a bit more blatant is what I'm thinking.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

Good job on cleaning up set_initial_rootdir(), obviously didn't have to set dirty bit b/c the storage is synchronous.

Overall, I think trying to get rid of cache_entry_set_treeobj() is a good step to take. It removes a lot of additional complication in the cache.

void *data = NULL;
int len;

if (treeobj_validate (rootdir) < 0 || treeobj_is_dir (rootdir)) {

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

!treeobj_is_dir()?

This comment has been minimized.

@garlick

garlick Nov 7, 2017

Author Member

Just ensuring that the "rootdir" really is a dir, not some other object. Unlikely since the master would have already checked that. Drop?

This comment has been minimized.

@chu11

chu11 Nov 7, 2017

Contributor

I was thinking it should it be opposite logic. If is not-dir, then error out. Right now it errors if it is a dir.

This comment has been minimized.

@garlick

garlick Nov 7, 2017

Author Member

Oh, reverse logic, got it!

garlick added some commits Nov 6, 2017

modules/kvs: increase isolation in cache_entry_set_*
Problem: cache_entry_set_raw() and cache_entry_set_treeobj()
take ownership of a data/json_t pointer allocated in the
caller.  The caller must be trusted not to free or modify
the data afterward, or the cache will be corrupted.

Copy the data, making it easier to guarantee that the result
is not modified or freed by the original caller. In the
case of json_t, cache the encoded data but skip caching
the original object.  The object will be decoded from the
data (and then cached) upon first access.

Update users in kvs.c and commit.c.

Update unit tests.
modules/kvs: avoid extra treeobj_encode in store_cache
Problem: treeobj that is to be stored in the cache is encoded
once to calculate its hash, and again to be stored in raw
form.

Just encode the treeobj in raw form, then calculate its
hash and store it raw.
modules/kvs: streamline store_initial_rootdir
Problem: rootdir that is to be stored in the cache is encoded
once to calculate its hash, and again to be stored in raw form.

Change store_initial_rootdir() to use treeobj_encode(),
then cache_entry_set_raw().
modules/kvs: fix confusing root naming
Problem: the name "rootdir" is used to refer to the root
blobref, which is easily confused with the root directory
object.

Rename "rootdir" to "rootref" where appropriate.

Create a 'struct kvsroot' to hold the root blobref and
sequence number, since these are tightly coupled and
there will need to be more than one pair for when
planned support for multiple namespaces is implemented.
modules/kvs: change protocol for better naming
Problem: update kvs.sync, kvs.getroot, kvs.setroot (rpc),
and kvs.setroot (event) protocol uses naming inconsistent
with current norms in the code.

Change "rootdir" to "rootref", and "rootdirval" to "rootdir"
where appropriate.

@garlick garlick force-pushed the garlick:cache_copy branch from e99c171 to e4c605f Nov 7, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

@chu11: this look OK (last 2 commits)? If so I'll squash.

@chu11

This comment has been minimized.

Copy link
Contributor

chu11 commented Nov 7, 2017

yup, squash away

@garlick garlick force-pushed the garlick:cache_copy branch from e4c605f to ba26ea2 Nov 7, 2017

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Nov 7, 2017

OK, those are squashed now.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.07%) to 78.573% when pulling e4c605f on garlick:cache_copy into 4456339 on flux-framework:master.

garlick added some commits Nov 7, 2017

modules/kvs: setroot event handler cleanup
Problem: the important kvs.setroot event handler is a
bit complicated by the fact that the directory object is
optionally provided in the event message as an optimization.

Split the optimization off to a separate function,
don't try too hard to handle the case of an existing but
dirty/invalid cache entry, and recompute the hash after
encoding the directory rather than trusting the hash received
in the message to match.

This eliminates the last user of cache_entry_set_treeobj().
modules/kvs: drop cache_entry_set_treeobj()
This function no longer has any users except in tests.
Replicate it as a convenience function in the tests.
test/kvs/cache: drop some unnecessary tests
Drop tests that poke at cache_entry_set_treeobj() which
is now gone, or expect cache entries containing raw
versus treeobj data to behave differently in ways
that are no longer possible.
@coveralls

This comment has been minimized.

Copy link

coveralls commented Nov 8, 2017

Coverage Status

Coverage increased (+0.03%) to 78.541% when pulling ba26ea2 on garlick:cache_copy into 4456339 on flux-framework:master.

@chu11 chu11 merged commit 14943b8 into flux-framework:master Nov 8, 2017

3 of 4 checks passed

codecov/patch 61.29% of diff hit (target 77.88%)
Details
codecov/project 77.94% (+0.06%) compared to 4456339
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 78.541%
Details

@garlick garlick deleted the garlick:cache_copy branch Nov 8, 2017

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.