Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: support ability to "revert" / "back out" merged commits #1346

Merged
merged 8 commits into from Mar 7, 2018

Conversation

5 participants
@chu11
Copy link
Contributor

chu11 commented Feb 22, 2018

As described in #1337, a failure in a merged commit should not fail all transactions that make up that commit. In order to accomplish this, this is what I did:

  1. When merging, do not merge ops/names into an existing commit_t, instead create a new commit_t and merge into that data structure.

  2. Flag that the new commit_t as a collection of merges and that the other ones are components of a merge. Leave all the component commit_t's on the ready queue as they were.

  3. Push this new merged commit_t onto the ready queue and use it.

  4. If the new merged commit_t succeeds, when it is done remove it and all the "components" of the merge. If the new merged commit_t fails, remove it from the ready queue, and leave all of the "components" there for processing later. Flag them as non-mergeable going forward and let processing continue as normal.

Ran soak tests over 1000 jobs, and overall performance is < 1% slower. Understandable that there is slowdown as there are additional allocations and what not.

Before

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2300  0.2353  0.2400  0.3300 

After

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2376  0.2400  0.3100 

(I should say, the "before" is before PR #1343, the refactor done right before this PR.)

Note, I used the word "revert" in the documentation and code to describe a failed merged commit becoming "un-merged". I'm not 100% happy with the use of this term to describe what's going on, but
couldn't think of a better word. Both it and "backout" give the impression that the commit completed and you want to "revert" or "backout" of it. "unmerge" seems to give the wrong impression
(i.e. it's not mergeable). Perhaps this is something that should be solved with #1344, as once the data structure names are changed, a more obvious choice of language would emerge.

Also, no unit tests at the moment for the kvs module, only some unit tests for the internal commit API. Outside of pure instrumentation, impossible to test as the commit merging is racy. It would be easier once #1341 is implemented, as a stress test across namespaces could be done.

@chu11 chu11 added this to To do in multi-user parallel jobs via automation Feb 22, 2018

@chu11 chu11 moved this from To do to In progress in multi-user parallel jobs Feb 22, 2018

@chu11 chu11 self-assigned this Feb 22, 2018

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Feb 23, 2018

This sounds like a clean approach!

Better to get terminology right in the commit log, if possible. Maybe say "fall back to indiivdual commits if merged commit fails"? Or "upon failure of merged commit, retry the original commits individually"?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Feb 23, 2018

You know what, I like the idea of using "fallback". I think it is far more clearer than "revert" or "unmerge". I'll tweak the commit log, update the variable names, and re-push.

@chu11 chu11 force-pushed the chu11:issue1337-part3 branch from f680b56 to 3fb1fd7 Feb 23, 2018

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Feb 23, 2018

re-pushed using the wording "fallback" instead of "revert" everywhere. Went ahead and squashed it since it was only a change in the last commit.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 23, 2018

Codecov Report

Merging #1346 into master will increase coverage by 0.03%.
The diff coverage is 87.8%.

@@            Coverage Diff             @@
##           master    #1346      +/-   ##
==========================================
+ Coverage   78.47%   78.51%   +0.03%     
==========================================
  Files         162      162              
  Lines       29689    29739      +50     
==========================================
+ Hits        23298    23349      +51     
+ Misses       6391     6390       -1
Impacted Files Coverage Δ
src/modules/kvs/kvstxn.c 78.97% <87.5%> (+0.9%) ⬆️
src/modules/kvs/kvs.c 65.21% <90%> (+0.28%) ⬆️
src/common/libflux/rpc.c 93.38% <0%> (-0.83%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/common/libflux/future.c 88.78% <0%> (ø) ⬆️
src/common/libflux/message.c 81.72% <0%> (+0.47%) ⬆️
src/common/libflux/mrpc.c 86.66% <0%> (+1.17%) ⬆️
@garlick
Copy link
Member

garlick left a comment

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Would be good to explain this in the commit message so that we aren't tempted to add it back later for performance if it would break something.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Feb 23, 2018

(oops meant to add that as a single review comment on the first commit)

@garlick
Copy link
Member

garlick left a comment

Some comments, mainly suggestions for improving commit messages.

@@ -1222,8 +1222,7 @@ static int commit_merge (commit_t *dest, commit_t *src)
/* Merge ready commits that are mergeable, where merging consists of
* popping the "donor" commit off the ready list, and appending its
* ops to the top commit. The top commit can be appended to if it
* hasn't started, or is still building the rootcpy, e.g. stalled
* walking the namespace.
* hasn't started.

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

Suggestion: add a note here on why only COMMIT_STATE_INIT is mergeable.

Commit message summary should be more descriptive, e.g. "modules/kvs: only merge commit in INIT state" or similar.

@@ -1219,10 +1219,46 @@ static int commit_merge (commit_t *dest, commit_t *src)
return -1;
}

static commit_t *commit_create_empty (commit_mgr_t *cm)

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

Suggestion: change commit message to "modules/kvs: merge to new empty commit".

Suggestion: restructure to avoid repetition, e.g.

if (!(cnew = calloc ()))
    goto error_nomem;
if (!(cnew->ops = json_array()))
    goto error_nomem;
...
error_nomem:
    commit_destroy (cnew);
    errno = ENOMEM;
    return NULL;
...
@@ -1164,59 +1164,34 @@ int commit_mgr_ready_commit_count (commit_mgr_t *cm)

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

Not sure what the "atomic" comment refers to. This appears to just be about how this function cleans up on error?

Better commit summary message would be useful, as well as expanded commit main message.

This comment has been minimized.

@chu11

chu11 Feb 23, 2018

Author Contributor

Yeah, perhaps "atomic" is the wrong word. In the past, data structures were modified on the fly as merging occurred. Any error would lead to exit(). As we refactored exit() away and returned errors, we couldn't return half modified data structures. So a number of functions were modified to be "atomic", where the data structure was successfully modified completely or not at all. Maybe there's a better word than "atomic" here.

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

Ah, well I would say in the context of processing transactions/commits, the "atomic" term is a bit overloaded :-) Maybe just "fully clean up on error"?

@@ -1253,6 +1253,7 @@ int commit_mgr_merge_ready_commits (commit_mgr_t *cm)
{

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

maybe it would be clearer to make the commit summary "modules/kvs: don't modify ready queue on error"?

@@ -45,7 +45,9 @@

#define FENCE_READY_MASK 0x01

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

maybe "modules/kvs: preserve orig commits during merge"?

@@ -163,6 +163,13 @@ int commit_set_aux_errnum (commit_t *c, int errnum)
return c->aux_errnum;

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

Suggestion: modules/kvs: try commits individually if merged commit fails

"core KVS file" could be KVS main or kvs.c.

This comment has been minimized.

@garlick

garlick Feb 23, 2018

Member

or "try orig commit if merged comit fails"

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Feb 23, 2018

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Ahh, I should change that commit log message. It's no longer true (was from a prior attempt). I left this in for another reason. Now that we are no longer removing commits from the queue, I'd have to regularly scan the ready queue to see if there are new things to merge.

That or keep a pointer to the "last merge" point. Hmmm. I suppose this would be doable, it's just a single pointer. Let me think about this.

chu11 added some commits Mar 7, 2018

modules/kvs: Update kvstxn comments
Fix forgotten change to function name.
modules/kvs: Merge to new empty kvstxn_t
In kvstxn_mgr_merge_ready_transactions(), instead of merging
transactions into the current head ready transaction, create a
new empty transaction and merge contents into it.  Then push
that new transaction onto the head of the ready list.

Requires users to call kvstxn_mgr_get_ready_commit() after the
merge to get the new head.
modules/kvs: Refactor internal kvstxn_merge()
With recent changes, kvstxn_merge() no longer needs to be fully cleaned
up on error.  An error code can be returned to the caller
kvstxn_mgr_merge_ready_commits(), which will handle full cleanup.
modules/kvs: Check flags on kvstxn merge
When merging transactions, also ensure flags are identical.
modules/kvs: don't modify ready queue on error
Alter logic in kvstxn_mgr_merge_ready_transactions(), so that
on error, no modifications to the kvstxn ready queue occur.
modules/kvs: Add check in internal kvstxn API
Add internal checks that ensures only kvstxn's that are
ready for processing are passed to processing functions.

Add unit tests appropriately.
modules/kvs: Preserve orig kvstxns during merge
Do not destroy transactions after they have been merged.  Instead
flag them as components of a larger merge.  When the kvstxn
of a set of merged transactions completes/is removed, at that point
in time remove all of the components of the larger merge.

As a consequence of this change and for optimization purposes, once
a merger of transactions has occured, there can no longer be any more
mergers until the head merged transaction has completed.  If this were
not done, the ready queue would constantly be iterted through and
new head merged transactions would be created.  This can be optimized
at a later time.

Add unit tests.
modules/kvs: try orig transactions on kvstxn error
In kvstxn_mgr_remove_transaction() support flag for user to fallback
a merged kvstxn to the original transactions that made up the merge.

By doing so, the user need not send an error to all transactions merged
into that kvstxn.  Instead, each of the original transactions
can be replayed individually, and an error will only be sent to
the offending commit/fence transaction.

Support kvstxn_fallback_mergeable() so user knows if a kvstxn can
be falled back on.

In kvstxn_apply(), take advantage of this by not sending an error
when a kvstxn's merging can be falled back on.  As an exception,
do not fallback if it's a "death"-like error (e.g. ENOMEM).

Add internal kvstxn API unit tests.

Fixes #1337

@chu11 chu11 force-pushed the chu11:issue1337-part3 branch from 3fb1fd7 to 96b46c0 Mar 7, 2018

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Mar 7, 2018

Just re-pushed with updated patches based on current master. Discounting the renaming of data structures/variables/names/etc., changes are largely the same. I did decide to squash some patches into other ones.

The one notable difference is I removed my prior change where merges can only occur for transactions in state KVSTXN_STATE_INIT. I instead do not allow merges if a merge as already occurred. The net affect is identical, more clear, and does protect against a corner case where the user calls the merge function multiple times.

I put in the commit message why I did this and note that the reason for doing this could be optimized in the future. I may try and optimize before this PR is merged. Gonna think about it a bit, but didn't want that to be the hold up for pushing/merging this PR.

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Mar 7, 2018

and two soak runs to compare

master

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2362  0.2400  0.2900 

this branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2353  0.2400  0.2900 

little surprised the mean is faster. Perhaps just a lucky run. Or atleast ballpark similar performance.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 7, 2018

Coverage Status

Coverage increased (+0.04%) to 78.825% when pulling 96b46c0 on chu11:issue1337-part3 into c6c48fd on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 7, 2018

I instead do not allow merges if a merge as already occurred

Does this limit merging to 2:1?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Mar 7, 2018

It limits merging to whatever was in the ready queue at the time of the merge. May it be 2 transactions or a bajillion transactions.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Mar 7, 2018

or a bajillion transactions.

I'll need to see a test case added for that.
😜

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 7, 2018

OK, I thought I must have misunderstood you there. Good! Ready for merge?

@chu11

This comment has been minimized.

Copy link
Contributor Author

chu11 commented Mar 7, 2018

yup, and I'll write up an issue for the optimization of merges. Already working on an idea.

@garlick garlick merged commit 5dc1611 into flux-framework:master Mar 7, 2018

4 checks passed

codecov/patch 87.8% of diff hit (target 78.47%)
Details
codecov/project 78.51% (+0.03%) compared to c6c48fd
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 78.825%
Details

multi-user parallel jobs automation moved this from In progress to Done Mar 7, 2018

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.