New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge-base: Remove redundant merge bases #3492

Merged
merged 5 commits into from Nov 2, 2015

Conversation

Projects
None yet
4 participants
@vmg
Copy link
Member

vmg commented Oct 30, 2015

Following up on the Famous Case Of The Extra Merge Base. Comparing our runtime vs Git's, it turns out we never implemented a second optimization pass after finding the merge bases:

https://github.com/git/git/blob/80980a1d5c2678ab9031d7c60faf38b9631eb1ce/commit.c#L900-L954

Some commit in the array may be an ancestor another commit. Move such commit to the end the array, and return the number of commits are independent from each other.

So, here's my implementation, ported mostly verbatim from Git. I'm aware tests are missing, but it's not immediately obvious to me how to reproduce the "minimal test case" here (I haven't put much thought into it to be fair). Any assistance on that would be welcome.

cc @carlosmn @ethomson @peff

src/merge.c Outdated
return -1;
}

/* more than one merge base -- remove redundants */

This comment has been minimized.

@carlosmn

carlosmn Oct 30, 2015

Member

This makes it sound like there are redundant entries every time we have more than one merge base, which is not what we're doing here. If there are multiple merge bases, we may have some redundant ones.

src/merge.c Outdated
int error = 0;

redundant = git__calloc(commits->length, 1);
filled_index = git__calloc((commits->length - 1), sizeof(unsigned int));

This comment has been minimized.

@carlosmn

carlosmn Oct 30, 2015

Member

These should have a GITERR_CHECK_ALLOC() to check for OOM.

@carlosmn

This comment has been minimized.

Copy link
Member

carlosmn commented Oct 30, 2015

Looks fine; as far as testing goes, were you able to identify what was special about the repository in which this happened? Maybe we can recreate part of the graph which made us have redundant commits.

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Oct 30, 2015

Waiting for some feedback from @peff too.

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Oct 30, 2015

Looks fine; as far as testing goes, were you able to identify what was special about the repository in which this happened? Maybe we can recreate part of the graph which made us have redundant commits.

Nope, the history on that repo is quite dense. I could use the whole anonymized repo as the test case, but that'd be a tad too large I think.

@ethomson

This comment has been minimized.

Copy link
Member

ethomson commented Oct 30, 2015

Yeah, even if we can't use the exact repo that you found, I would like to be able to have a test repo that shows what a redundant merge-base is...

Though I think to create such a beast, we'll have to understand WTF is actually going on here, which I certainly don't yet.

@peff

This comment has been minimized.

Copy link
Member

peff commented Oct 30, 2015

I looked at the graph of the repo that triggered this. What happens to cause the redundant base is that you have a merge one one side that "straddles" a merge on the other. Here's a picture:

       *--*--S
      /     /
--*--F--*--R---M
         \    /
          *--*

F = fork point
R = re-merge point
S = side branch tip
M = master tip

Here we have a side branch S that has forked from master at fork point F. It then remerged later from master at R. Meanwhile, master itself had its own unrelated merges, one of which straddles the re-merge.

Imagine we want to find the merge base of the side branch and master. We walk backwards from M and S. Obviously we'll find R, but we still need to follow up S^1 and M^2 to see if they meet. They do, at F.

In a true criss-cross merge, F would not be an ancestor of R, and we would truly have two merge bases (which is why we need to follow up even after finding R). But in this case, one is an ancestor of the other, and can therefore be removed.

I drew the diagram above by hand. It does represent the situation in the repo we found, but I didn't actually recreate the simplified version and test it. But it should be pretty straightforward to make a libgit2 test case out of it.

@peff

This comment has been minimized.

Copy link
Member

peff commented Oct 30, 2015

I don't see anything obviously wrong with the code itself, but I am not that familiar with the libgit2 code. I'm fairly sure that failing to remove the redundant bases is the source of the problem we found, though. So at least the intent of this PR is correct. :)

@vmg

This comment has been minimized.

Copy link
Member

vmg commented Nov 2, 2015

So, I couldn't make heads or tails of @peff's ASCII diagram, but running the original repository through the anonymizer and manually pruning some extra branches gave a pretty small reproduction case that seems to work really well. I've committed it to the resources folder and added the corresponding test.

I think this is ready to merge now.

@vmg vmg force-pushed the vmg/redundant branch from 1d68000 to b656e5e Nov 2, 2015

vmg added a commit that referenced this pull request Nov 2, 2015

Merge pull request #3492 from libgit2/vmg/redundant
merge-base: Remove redundant merge bases

@vmg vmg merged commit 1318ec9 into master Nov 2, 2015

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@vmg vmg referenced this pull request Nov 2, 2015

Merged

Fix redundant merge bases #544

@peff

This comment has been minimized.

Copy link
Member

peff commented Nov 2, 2015

@vmg Hmm. I did recreate my diagram as a git repository, but running I git, I noticed that it didn't actually trigger remove_redundant. So it's entirely possible that my analysis of the root cause here was wrong. Or it may simply be that there are some vagaries in paint_down_to_common based on the exact number of commits between each event, and especially on the order we visit them based on commit timestamps (we always end up with the right answer, but traversal order sometimes matters for things like early-cutoff behavior).

I'd give it 50/50 odds on one explanation versus the other. I don't think it's worth spending a lot of time trying to recreate the minimal test case, though. That would give me a warm fuzzy feeling, but since you have a working anonymized case, that's probably enough.

@ethomson ethomson deleted the vmg/redundant branch Jan 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment