Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit-reach: fix first-parent heuristic #51

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Oct 18, 2018

I originally reported this fix [1] after playing around with the trace2 series for measuring performance. Since trace2 isn't merging quickly, I pulled the performance fix patch out and am sending it on its own.
The only difference here is that we don't have the tracing to verify the performance fix in the test script.

See the patch message for details about the fix.

Thanks,
-Stolee

[1] https://public-inbox.org/git/20180906151309.66712-7-dstolee@microsoft.com/

[RFC PATCH 6/6] commit-reach: fix first-parent heuristic

The algorithm in can_all_from_reach_with_flags() performs a depth-
first-search, terminated by generation number, intending to use
a hueristic that "important" commits are found in the first-parent
history. This heuristic is valuable in scenarios like fetch
negotiation.

However, there is a problem! After the search finds a target commit,
it should pop all commits off the stack and mark them as "can reach".
This logic is incorrect, so the algorithm instead walks all reachable
commits above the generation-number cutoff.

The existing algorithm is still an improvement over the previous
algorithm, as the worst-case complexity went from quadratic to linear.
The performance measurement at the time was good, but not dramatic.
By fixing this heuristic, we reduce the number of walked commits.

We can also re-run the performance tests from commit 4fbcca4
"commit-reach: make can_all_from_reach... linear".

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case:

4fbcca4~1: 0.85 s
  4fbcca4: 0.26 s (num_walked: 1,011,035)
      HEAD: 0.14 s (num_walked:     8,601)

Large Case:

4fbcca4~1: 24.0  s
  4fbcca4:  0.12 s (num_walked:  503,925)
      HEAD:  0.06 s (num_walked:  217,243)

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 18, 2018

Submitted as pull.51.git.gitgitgadget@gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant