New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate "git merge-base --is-ancestor" #664
Accelerate "git merge-base --is-ancestor" #664
Conversation
The next change will make repo_in_merge_bases() depend on the logic in is_descendant_of(), but we need to make the method independent of the_repository first. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The repo_is_descendant_of() method is aware of the existence of the commit-graph file. It checks for generation_numbers_enabled() before deciding on using can_all_from_reach() or repo_in_merge_bases() depending on the situation. The reason here is that can_all_from_reach() uses a depth-first search that is limited by the minimum generation number of the target commits, and that algorithm can be very slow when generation numbers are not present. The alternative uses paint_down_to_common() which will walk the entire merge-base boundary, which is typically slower. This method is used by commands like "git tag --contains" and "git branch --contains" for very fast results when a commit-graph file exists. Unfortunately, it is _not_ used in commands like "git merge-base --is-ancestor" which is doing an even simpler request. This issue was raised recently [1] with respect to a change to how generation numbers are stored, but was also reported much earlier [2] before commit-reach.c existed to simplify these reachability queries. [1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/ [2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/ The root cause is that builtin/merge-base.c has a method handle_is_ancestor() that calls in_merge_bases(), an older version of repo_in_merge_bases(). It would be better if we have every caller to in_merge_bases() use the logic in can_all_from_reach() when possible. This is where things get a little tricky: repo_is_descendant_of() calls repo_in_merge_bases() in the non-generation numbers enabled case! If we simply update repo_in_merge_bases() to call repo_is_descendant_of() instead of repo_in_merge_bases_many(), then we will get a recursive call loop. Thankfully, this is caught by the test suite in the default mode (i.e. GIT_TEST_COMMIT_GRAPH=0). The trick, then, is to make the non-generation number case for repo_is_descendant_of() call repo_in_merge_bases_many() directly, skipping the non-_many version. This allows us to take advantage of this faster code path, when possible. The easiest way to measure the performance impact is to test the following command on the Linux kernel repository: git merge-base --is-ancestor <A> <B> | A | B | Time Before | Time After | |------|------|-------------|------------| | v3.0 | v5.7 | 0.459s | 0.028s | | v4.0 | v5.7 | 0.267s | 0.021s | | v5.0 | v5.7 | 0.074s | 0.013s | Note that each of these samples return success. The old code performed the same operation when <A> and <B> are swapped. However, can_all_from_reach() will return immediately if the generation numbers show that <A> has larger generation number than <B>. Thus, the time for the swapped case is universally 0.004s in each case. Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
ab7af71
to
ead5a0d
Compare
/submit |
Submitted as pull.664.git.1592414670.gitgitgadget@gmail.com |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into pu via git@92cfb92. |
On the Git mailing list, Derrick Stolee wrote (reply to this):
|
This patch series was integrated into pu via git@3a602fe. |
This patch series was integrated into pu via git@dd64d65. |
This patch series was integrated into next via git@86b34a3. |
On the Git mailing list, Abhishek Kumar wrote (reply to this):
|
This patch series was integrated into seen via git@168f87a. |
This patch series was integrated into seen via git@d082ecb. |
This patch series was integrated into seen via git@dc4b3cf. |
This patch series was integrated into next via git@dc4b3cf. |
This patch series was integrated into master via git@dc4b3cf. |
Closed via dc4b3cf. |
@@ -283,7 +283,9 @@ struct commit_list *repo_get_merge_bases(struct repository *r, | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Wed, Jun 17, 2020 at 05:24:28PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The next change will make repo_in_merge_bases() depend on the logic in
> is_descendant_of(), but we need to make the method independent of
> the_repository first.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> commit-reach.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 4ca7e706a18..13722430aa5 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -283,7 +283,9 @@ struct commit_list *repo_get_merge_bases(struct repository *r,
> /*
> * Is "commit" a descendant of one of the elements on the "with_commit" list?
> */
> -int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
> +static int repo_is_descendant_of(struct repository *r,
> + struct commit *commit,
> + struct commit_list *with_commit)
> {
> if (!with_commit)
> return 1;
> @@ -301,13 +303,18 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
>
> other = with_commit->item;
> with_commit = with_commit->next;
> - if (in_merge_bases(other, commit))
> + if (repo_in_merge_bases(r, other, commit))
> return 1;
> }
> return 0;
> }
> }
>
> +int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
> +{
> + return repo_is_descendant_of(the_repository, commit, with_commit);
> +}
> +
I don't think that it makes a big deal either way, but I wonder about
moving 'repo_is_descendant_of' to the header file, and making
'is_descendant_of' be 'static inline int' as you defined it here.
Since this has already graduated up to master already, I don't think
that it's worth going back just to shuffle this code around, but I was
wondering if you had any specific reason for doing it this way.
> /*
> * Is "commit" an ancestor of one of the "references"?
> */
> --
> gitgitgadget
>
Thanks,
Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 6/29/2020 9:40 AM, Taylor Blau wrote:
> On Wed, Jun 17, 2020 at 05:24:28PM +0000, Derrick Stolee via GitGitGadget wrote:
>> +int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
>> +{
>> + return repo_is_descendant_of(the_repository, commit, with_commit);
>> +}
>> +
>
> I don't think that it makes a big deal either way, but I wonder about
> moving 'repo_is_descendant_of' to the header file, and making
> 'is_descendant_of' be 'static inline int' as you defined it here.
>
> Since this has already graduated up to master already, I don't think
> that it's worth going back just to shuffle this code around, but I was
> wondering if you had any specific reason for doing it this way.
I have good news for you. [1]
[1] https://lore.kernel.org/git/20200623184222.54201-1-carenas@gmail.com/
Thanks,
-Stolee
It was recently [1] reported (and not-so-recently [2] reported) that "git merge-base --is-ancestor" can be pretty slow. In fact, it is regularly slower than "git branch --contains" or "git tag --contains", which are answering a "harder" query.
[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/
The root cause is that the in_merge_base() implementation is skipping the fast can_all_from_reach() implementation and using paint_down_to_common() instead. Note that these are equivalent: a commit A is in the set of merge-bases between A and B if and only if B can reach A.
This fixes the issue, and makes the performance degradation reported by Szeder a non-issue.
Thanks,
-Stolee
Cc: szeder.dev@gmail.com, avarab@gmail.com, abhishekkumar8222@gmail.com, me@ttaylorr.com