Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVS commit refactor #1105

Merged
merged 49 commits into from Jul 12, 2017

Conversation

Projects
None yet
4 participants
@chu11
Copy link
Contributor

chu11 commented Jul 11, 2017

This is a refactor of the "commit" side of the kvs.

There is only really one "epiphany" in this refactor. The original fence_t data structure was split into a fence_t and a commit_t data structure. By doing this, there is now a clear delineation between an actual "fence" that occurs when callers send RPCs and the data structures used behind the scene to actually commit data to the content store. IMO, splitting that up eliminated a lot of the monolithic-ness. Took me awhile to realize this.

After that, there isn't really anything magical per se, just the typical lineage of refactoring 1 tiny thing at a time: code cleanup, api change, move functions into a new file, create an API, etc. There is now a "fence" API and a "commit" API in new files that each have their own unit tests. The "commit" API has a "commit manager" that actually wraps around commit_t.

One interesting refactor of note is the function commit_process() which processes a commit and does all the commit mods and unrolling. It may return missing references to the user to load, or pointers to cache entries that are dirty and need to be backed up. By removing RPCs out of the commit logic, this allows the commit API to be separately unit-testable.

Still need to valgrind test & soak test. The introduction of the commit_t data structure adds some additional malloc() calls per fence. Hopefully this doesn't have a noticeable impact on performance. There are few error cases that are more optimized now which will help balance some things out.

chu11 added some commits Jun 21, 2017

modules/kvs: Code cleanup
Code between relayfence_request_cb and fence_request_cb is nearly
identical, but uses different code style, formatting, and logging
patterns.  Make them the same.
modules/kvs: Code cleanup, remove unused variable
Remove 'ref' from call to content_store_request_send,
in which ref isn't used.
modules/kvs: Code cleanup, remove unnecessary code
Remove flux_msg_destroy() call in fence_finalize(), as the
messages will be deleted shortly thereafter when the fence_t
data structure is destroyed.  Adjust surrounding code appropriately.
modules/kvs: Minor code cleanup
Move variable into the only block it is used.

@chu11 chu11 requested a review from garlick Jul 11, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 11, 2017

Coverage Status

Coverage increased (+0.1%) to 78.495% when pulling 6d374f9 on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 11, 2017

Looks like clang is sad here:

In file included from ../../../../../src/modules/kvs/kvs.c:55:
../../../../../src/modules/kvs/types.h:3:14: error: redefinition of typedef 'href_t' is a C11 feature [-Werror,-Wtypedef-redefinition]
typedef char href_t[BLOBREF_MAX_STRING_SIZE];
             ^
../../../../../src/modules/kvs/types.h:3:14: note: previous definition is here
typedef char href_t[BLOBREF_MAX_STRING_SIZE];
             ^

while the gcc build is failing the valgrind test

chu11 added some commits Jul 11, 2017

modules/kvs: Rename fence_append_request()
Rename fence_append_request to fence_add_request_copy for clarity.
modules/kvs: Rename fence_append_ops()
Rename to fence_add_request_data() for clarity.
modules/kvs: Refactor fence_append_ops()
Include count increment in fence_append_ops(), which indicates
number of fence counts occuring so far.
modules/kvs: Add fence_process_fence_request()
Create new function fence_process_fence_request(), which wraps
up counting checks to see if a fence is ready for committing.
modules/kvs: Refactor commit_merge_all()
Splice out portion of code into new function fence_merge().
modules/kvs: Split fence_t structure
Split fence_t into fence_t and commit_t structs.  Add new
commit_create() and commit_destroy() functions.  Adjust
fence_create() function.
modules/kvs: Re-org fence_t
Move fence_t into a new file.  Move fence_create(), fence_destroy(),
fence_add_request_data(), fence_add_request_copy(), fence_merge() into
new file too.
modules/kvs: Add fence get/set functions
Add fence_count_reached(), fence_get_flags(), fence_set_flags(),
fence_get_json_ops(), and fence_get_json_names() to fence files
to abstract away fence_t.
modules/kvs: Rename functions
Rename commit_apply_fence() to commit_apply(), as former no longer
makes sense given fence_t struct split.
modules/kvs: Add new fence functions
Add new function fence_iter_request_copies().

@chu11 chu11 force-pushed the chu11:kvs_fence_refactor_try25 branch from 6d374f9 to e5b5583 Jul 11, 2017

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 11, 2017

Codecov Report

Merging #1105 into master will increase coverage by 0.08%.
The diff coverage is 85.88%.

@@            Coverage Diff             @@
##           master    #1105      +/-   ##
==========================================
+ Coverage   78.13%   78.22%   +0.08%     
==========================================
  Files         155      157       +2     
  Lines       25959    26132     +173     
==========================================
+ Hits        20284    20442     +158     
- Misses       5675     5690      +15
Impacted Files Coverage Δ
src/modules/kvs/cache.c 93.89% <100%> (+0.24%) ⬆️
src/modules/kvs/json_util.c 93.33% <100%> (+2.42%) ⬆️
src/common/libkvs/json_dirent.c 97.72% <100%> (ø) ⬆️
src/modules/kvs/kvs.c 79.11% <80.9%> (-0.32%) ⬇️
src/modules/kvs/fence.c 85.48% <85.48%> (ø)
src/modules/kvs/commit.c 87.65% <87.65%> (ø)
src/cmd/flux-event.c 65.55% <0%> (-1.12%) ⬇️
src/common/libutil/dirwalk.c 93.33% <0%> (-0.75%) ⬇️
src/broker/overlay.c 71.32% <0%> (-0.7%) ⬇️
src/common/libutil/veb.c 98.31% <0%> (-0.57%) ⬇️
... and 12 more
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 12, 2017

Coverage Status

Coverage increased (+0.2%) to 78.548% when pulling e5b5583 on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 12, 2017

Re-pushed with a variety of mini-fixes. .h file header guards fixed the clang issue. Found the mem-leak and fixed that. Coverage looks pretty good. I think I can add 1-2 tiny things to add a little coverage. Will run soak tests tomorrow and hopefully will be awesome.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 12, 2017

ran some soak tests out of /tmp on catalyst

master

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1200  0.1200  0.1236  0.1300  0.1400 
 0.1100  0.1200  0.1200  0.1218  0.1200  0.1400 
 0.1200  0.1200  0.1200  0.1228  0.1300  0.1300 
 0.1100  0.1200  0.1200  0.1243  0.1300  0.1700 
 0.1100  0.1200  0.1200  0.1222  0.1300  0.1300 
kvs refactor branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1200  0.1200  0.1200  0.1239  0.1300  0.1300 
 0.1200  0.1200  0.1200  0.1243  0.1300  0.1300 
 0.1200  0.1200  0.1200  0.1238  0.1300  0.1400 
 0.1200  0.1200  0.1200  0.1223  0.1200  0.1300 
 0.1200  0.1200  0.1200  0.1235  0.1300  0.1300 

master mean .12294
kvs refactor branch mean .12356

0.5% slower

Probably within range of what one would have expected given a new data structure allocation and minor API function calling overhead. Or just within range of randomness of runs. I'll see if I can get the number down a bit, but minimally it's well within range of what it was before.

chu11 added some commits Jun 22, 2017

modules/kvs: Refactor fence_finalize_bynames
Using new fence API function, refactor fence_finalize_bynames
and rename to finalize_fences_bynames for clarity.
modules/kvs: API-ize fence
Hide fence_t internally within fence api.
modules/kvs: Add types.h file with common types
Move href_t definition into new types.h file.
modules/kvs: fence cleanup
Now that fence is now abstracted, use zlist free function instead of
destroying each element on a list manually in fence_destroy().

chu11 added some commits Jun 28, 2017

modules/kvs: Refactor commit_unroll()
Instead of calling store(), which will store to cache and
send rpcs to the content store, instead only store to cache
and return a list of entries which need to be sent to content
store.  Caller to commit_unroll() is responsible for sending
data to content store and waiting on dirty entries.

Remove store(), as it is no longer used.
modules/kvs: Refactor commit_apply()
Splice large chunks of code into commit_process(), which
handles most processing, return to commit_apply() when
data must be loaded/stored/waited.
modules/kvs: Add state to commit_t
Cleanup code by adding a state variable to commit_t, which
more logically explains progression of code instead of checks
for flags and variables being NULL/non-NULL.
modules/kvs: Re-org commit_t
Move commit_t, commit_create(), and commit_destroy() into its own
files.  Slight adjustment, use aux variable to handle ctx.
modules/kvs: Add enum to commit_process return
Add enum of potential commit_process return types, to
make function more clear.
modules/kvs: Create commit_mgr_t struct
Move several kvs_ctx_t fields into a commit_mgr_t type.  Create
new commit_mgr_create() and commit_mgr_destroy() functions.
modules/kvs: Re-org/refactor commit_mgr
Move fence_add, fence_lookup, fence_process_fence_request, and
commit_merge_all into commit mgr API.  Adjust API and functions
accordingly.  As a result, some functions can be made static as
they are no longer called except inside commit_mgr API.
modules/kvs: Refactor commit_mgr_t, commit_t
Place variables into commit_mgr_t and commit_t to begin abstracting
away kvs_ctx_t from various functions.
modules/kvs: Add commit accessor functions
Add various functions to get/set/deal with commit_t and commit_mgr_t
internals, to abstract away commit internal details from callers.
modules/kvs: Refactor variable passing
With recent changes to commit_t and commit_mgr_t, no longer pass
around kvs_ctx_t to various functions such as store_cache(),
commit_unroll(), commit_link_dirent(), and commit_process().

As noop_stores is now in commit_mgr_t, eliminate the stats_t struct
and adjust appropriately in stat get/clear callbacks.
modules/kvs: Add iteration callbacks
Add iteration callback functions to commit API to access internal
lists.  As consequence, eliminate store_content_store().
modules/kvs: Re-org commit functions
Move commit_process(), store_cache(), commit_unroll(), and
commit_link_dirent() into commit API.  Latter 3 can be made
static and hidden from caller.
modules/kvs: API-ize commit_t & commit_mgr_t
Hide commit_t and commit_mgr_t from callers.
modules/kvs: Add state checks
Add state checks to some commit API functions.
modules/kvs: commit API optimization
Now that the commit functions have been API-ized, we can hide
some internals for optimization.  Collapse the missing_refs and
dirty_cache_entries lists into one, to remove an extra zlist create
and destroy.
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 12, 2017

Wow, nice cleanup. I've been through this at the 10,000 foot level and it seems good.

Thanks for the extra paranoia of running the soak test and analyzing the results. I wouldn't worry about the small performance hit - it's close enough that I doubt you've introduced any new stalls or other significant problems that will balloon up at scale.

The new unit tests are great addition!

The main thing here IMHO is that you are feeling like the code is getting structured in such a way that you can work on it with confidence. The next steps of converting to jansson and the treeobj metadata will be much easier with your help and with the new unit tests.

Anyway, awesome, and when you're done fine tuning it, let me know and I think this could go in.

chu11 added some commits Jul 12, 2017

modules/kvs: Fix API logic in commit_process()
Now that the commit functions have been API-ized, replace some
asserts with some logical checks instead and return appropriate
values to the user.

@chu11 chu11 force-pushed the chu11:kvs_fence_refactor_try25 branch from e5b5583 to 0e420ff Jul 12, 2017

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 12, 2017

A few tweaks, two lists in commit_t collapsed into one, removing an unnecessary branch check, a few minor cleanups/tweaks, a few more API-niceties (sp?), a few more tests for coverage.

It's closer to even now, with some normal randomness. I've gotten anywhere from 0.19% to 0.6% slower on this kvs refactor branch.

I just saw this when running soak with 1000 jobs on hype2 (which is empty of users and has less randomness).

master branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.100   0.110   0.110   0.111   0.110   0.160

kvs refactor branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.1000  0.1100  0.1100  0.1106  0.1100  0.1500

the kvs refactor branch was actually a tad faster for this particular run. That's probably not going to be normal, but it's clearly very close.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 12, 2017

Coverage Status

Coverage increased (+0.09%) to 78.486% when pulling 0e420ff on chu11:kvs_fence_refactor_try25 into 062ade5 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 12, 2017

Great! Ready to go in?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Jul 12, 2017

Yup, it's good to go.

@garlick garlick merged commit afcbd94 into flux-framework:master Jul 12, 2017

4 checks passed

codecov/patch 85.88% of diff hit (target 78.13%)
Details
codecov/project 78.22% (+0.08%) compared to 062ade5
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.09%) to 78.486%
Details

@chu11 chu11 referenced this pull request Jul 26, 2017

Merged

KVS more oom() fixes #1128

@grondo grondo referenced this pull request Aug 23, 2017

Closed

0.8.0 Release #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.