Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: refactor kvs cache to handle raw data as "primary" data #1246

Merged
merged 13 commits into from Oct 24, 2017

Conversation

Projects
None yet
4 participants
@chu11
Copy link
Contributor

chu11 commented Oct 20, 2017

This PR refactors the KVS cache to favor the storage of raw data over json objects. This accomplishes several goals.

  1. before this PR, KVS cache entries had a data "type" associated with it. This can lead to a "race" in which how the data was loaded into the KVS cache the first time and could lead to errors/bugs later. For example, if data was incorrectly loaded into the KVS cache as type json, a later attempt to read the data out of the KVS cache as raw data would fail, leading to an error.

  2. This should clean up the code and hopefully make it less confusing. Many unit tests were removed in this PR, so I feel the less confusing part was accomplished.

The primary idea behind this refactor to remove the "type" system with KVS cache entries and make the KVS cache primarily for storing raw data.

Users can set get/set json objects in the KVS cache, but the API is simply a set of convenience functions converting those json objects to/from their raw string form.

As an aside, I am sometimes anal when it comes to code "ordering". In commit chu11@368f858 I literally just move "json" code below "raw data" code, b/c I want "raw data" code to be listed first as it is now the "primary" way the KVS cache works. I know its more code change than may be necessary.

chu11 added some commits Oct 18, 2017

modules/kvs/cache: Remove type parameter
Remove cache entry "type" parameter / concept from external facing
API.  It is still used internally for several purposes.

Adjust / remove many cache unit tests as a result.

@chu11 chu11 force-pushed the chu11:issue1239 branch from c5961d2 to 3cc234f Oct 20, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Oct 20, 2017

One of the builds hung and timed out, restarted it. One hung and timed out in #1244 earlier today and @garlick suggested one did in #1243. I'm concerned there is a racy-hang somewhere in the tests. But it's hard to see where it could be given the output of travis.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Oct 20, 2017

This is a high value cleanup! I will try provide some review commands before monday.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 20, 2017

Coverage Status

Coverage decreased (-0.04%) to 78.538% when pulling 3cc234f on chu11:issue1239 into b31a944 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Oct 20, 2017

Codecov Report

Merging #1246 into master will decrease coverage by 0.05%.
The diff coverage is 71.26%.

@@            Coverage Diff             @@
##           master    #1246      +/-   ##
==========================================
- Coverage   77.99%   77.93%   -0.06%     
==========================================
  Files         154      154              
  Lines       28964    28902      -62     
==========================================
- Hits        22590    22525      -65     
- Misses       6374     6377       +3
Impacted Files Coverage Δ
src/modules/kvs/lookup.c 82.72% <50%> (-1.46%) ⬇️
src/modules/kvs/kvs.c 62.7% <53.12%> (-1.02%) ⬇️
src/modules/kvs/cache.c 88.83% <83.72%> (-3.54%) ⬇️
src/modules/kvs/commit.c 76.98% <87.5%> (+0.33%) ⬆️
src/common/libkvs/kvs_lookup.c 75.19% <0%> (-1.56%) ⬇️
src/common/libutil/blobref.c 97.22% <0%> (-1.39%) ⬇️
src/common/libkvs/kvs.c 64.11% <0%> (-0.97%) ⬇️
src/broker/overlay.c 73.88% <0%> (-0.32%) ⬇️
... and 9 more
@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Oct 20, 2017

Just missed having coverage on the diff. Looking through coverage diff, I don't think I can eek out a few more lines. It's all bad error path scenarios.

@garlick
Copy link
Member

garlick left a comment

I made a few comments inline.

@@ -59,8 +59,9 @@ struct cache_entry {
waitqueue_t *waitlist_valid;
void *data; /* value object/data */
int len;
bool data_valid; /* flag indicating if data set, don't use
* data == NULL as test */
bool entry_valid; /* flag indicating if data set, don't use

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

suggestion: rename to valid and use same declaration style as dirty (either bool or bitfield is fine with me).

This comment has been minimized.

@chu11

chu11 Oct 23, 2017

Author Contributor

agreed that this should be consistent, however I'll make a new issue and clean up in another PR. Some functions (such as cache_entry_clear_dirty()) return the current setting of the dirty bit as a 1 or 0. This should be a bool, not a 1 or 0. It'll be a bit more cleanup than I'd like in this PR.

else if (hp->type == CACHE_DATA_TYPE_RAW)
free (hp->data);
}
if (hp->o)

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

test for null is redundant. both free() and json_decref() are no-ops if argument is NULL.

@@ -84,42 +84,42 @@ struct cache_entry *cache_entry_create (void)
return hp;
}

struct cache_entry *cache_entry_create_json (json_t *o)
struct cache_entry *cache_entry_create_raw (void *data, int len)

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

It would appear this function has no users?

return hp;
}

struct cache_entry *cache_entry_create_raw (void *data, int len)
struct cache_entry *cache_entry_create_json (json_t *o)

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

It would appear this function has only one user in setroot_event_cb().

suggest changing that user to cache_entry_create() + cache_entry_set_json()

This comment has been minimized.

@chu11

chu11 Oct 23, 2017

Author Contributor

This sounds like a good idea. Was honestly surprised the cache_entry_create_raw() and cache_entry_create_json functions had almost no use.

else if (!hp->data) {
/* attempt to change already valid cache entry,
* cannot, must call cache_entry_clear_data() */
errno = EBADE;

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

this function doesn't set errno in all the -1 return cases (e.g. if args are invalid)

the second test for !hp->data is redundant with the first leg of the conditional.

maybe it would be better to change the hp->entry_valid leg of the outer conditional to simply

if (hp->entry_valid) {
    if (hp->o)
        json_decref (o); // already stored
    else
        hp->o = o;
    assert (hp->len > 0 && hp->data != NULL); // o/w would not be valid json
} else ...

This comment has been minimized.

@chu11

chu11 Oct 23, 2017

Author Contributor

ugh, why did I add the redundant check :-)

let me think about how to handle the logic here, as I seemed to have copied some logic from the set_raw() function, but perhaps it shouldn't have been copied over in that way.

json_decref (o);
else if (!hp->data) {
/* attempt to change already valid cache entry,
* cannot, must call cache_entry_clear_data() */

This comment has been minimized.

@garlick

garlick Oct 23, 2017

Member

cache_entry_clear_data() doesn't appear to have any users

This comment has been minimized.

@chu11

chu11 Oct 23, 2017

Author Contributor

Yeah created this function as a "just in case", knowing there were no real use cases at the time. Perhaps would be wise to remove. If anything, users can cache_remove_entry then recreate if they are in dire straits to put new data in the cache entry.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Oct 23, 2017

just re-pushed with some fixes. Removed cache_entry_clear_data(), cache_entry_create_raw(), and cache_entry_create_json().

Changed the logic in cache_entry_set_json() a bit to hopefully make more sense.

Removed unnecessary checks for NULL in cache_entry_destroy()

Will do the cleanup regarding consistent dirty bit and valid bit in a separate PR (#1247).

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 23, 2017

Coverage Status

Coverage decreased (-0.08%) to 78.494% when pulling c1f8414 on chu11:issue1239 into b31a944 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 24, 2017

Coverage Status

Coverage decreased (-0.07%) to 78.507% when pulling 67ca6a9 on chu11:issue1239 into b31a944 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Oct 24, 2017

coverage diff fell from about 77% to about 71%, I believe mostly due to high coverage functions that were removed.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Oct 24, 2017

IF you're feeling OK with this, I'd say squash it down and we can merge it.

chu11 added some commits Oct 18, 2017

modules/kvs/cache: Refactor data store/free
Internally refactor KVS cache code to store json object and
raw data memory in separate variables instead of one.
modules/kvs/cache: Code re-org
Make 'raw' KVS cache functions the primary function over 'json'
functions by moving them to be the first function listed over the
'json' equivalent.  There is no functional change in this commit,
only the movement and re-ordering of code.
modules/kvs: Refactor cache API
Refactor cache API to make all cache entries store raw data instead
of raw or json data (but not both).  All json functions are now convenience
functions operating under the assumption of raw data underneath.  For example,
cache_entry_create_json() and cache_entry_set_json() are convenience
functions that take a json object and extract the raw string out of
it.  The raw string is now the primary data of the cache entry.  The
cache_entry_get_json() function is a convenience function that returns
the json object equivalent of the raw data string stored within a
cache entry.

To avoid regularly encoding/decoding raw data into/from json objects,
a json object is cached in the cache entry.

Internally, the cache_data_type_t is no longer needed and has been
removed.

Update and add unit tests appropriately.
modules/kvs/lookup: Update for KVS cache changes
Update internal lookup API for KVS cache changes.

Adjust callback lookup_ref_f to not pass back raw_data flag, as it
is no longer relevant.  Adjust several log error messages to make more
sense given changes.

Update tests appropriately.  Most notably, valref's that previously
had a (at the time invalid) dirref reference within it, will now pass.
Valref blobrefs can now to point to anything.  If user wishes to
point to a random treeobj, they can.
modules/kvs/commit: Update for KVS cache changes
Remove cache type usage in internal commit API.  Take advantage
of fact all cache entries have raw data.

Adjust unit tests appropriately for loss of cache types.
modules/kvs: Update main kvs for cache changes
Handle the fact that KVS cache now automatically converts raw data
to json and json to raw data, so that the conversion no longer
needs to be done at this level.

Also adjust to removal of cache entry types.

Fixes #1239
t/kvs: Adjust kvs tests for cache changes
Most notably, valrefs that previously had a (at the time invalid)
dirref reference within it, will now pass.  Valref blobrefs can
now point to anything.
modules/kvs: Remove cache_entry_clear_data()
Function was unused except in tests, so remove it.

Remove / adjust unit tests appropriately.
modules/kvs: Remove unused/rarely used functions
Remove cache_entry_create_raw(), which was unused except in tests.

Remove cache_entry_create_json(), which was unused except for only
1 location.

Replace calls to these functions with cache_entry_create() and
call to either cache_entry_set_raw() or cache_entry_set_json().

Adjust unit tests appropriately.
modules/kvs/cache: Set EINVAL in several functions
In cache_entry_set_raw() and cache_entry_set_json(), set errno = EINVAL
when input is invalid.

Update unit tests appropriately.

@chu11 chu11 force-pushed the chu11:issue1239 branch from 67ca6a9 to b8b8c87 Oct 24, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Oct 24, 2017

squashed and re-pushed

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 24, 2017

Coverage Status

Coverage decreased (-0.03%) to 78.546% when pulling b8b8c87 on chu11:issue1239 into b31a944 on flux-framework:master.

@garlick garlick merged commit 0dfdcfc into flux-framework:master Oct 24, 2017

2 of 4 checks passed

codecov/patch 71.26% of diff hit (target 77.99%)
Details
codecov/project 77.93% (-0.06%) compared to b31a944
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.03%) to 78.546%
Details

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.