pr-2149/spkrka/side-exhaust-pr-v3
tagged this
26 Jun 13:08
commit-reach: terminate merge-base walk when one paint side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patches 2-3 incorporate test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/8 Documentation/technical: add paint-down-to-common doc 2/8 t6600: add
test cases for side-exhaustion edge cases 3/8 t6099, t6600: add
side-exhaustion regression tests 4/8 commit-reach: add trace2
instrumentation to paint_down_to_common() 5/8 commit-reach: introduce struct
paint_state with per-side counters 6/8 commit-reach: remove unused
nonstale_queue dedup wrappers 7/8 commit-reach: terminate merge-base walk
when one paint side is exhausted 8/8 commit-reach: move min_generation check
into paint_queue_get()
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added in
patch 4). Wall-clock times are best-of-11 runs.
2.6M-commit monorepo with commit-graph:
steps wall-clock
merge-base --all (across import) 2143438 -> 3 3.67s -> 5ms
merge-base --all (1000 apart) 2692915 -> 1035 4.41s -> 7ms
merge-base --all (5000 apart) 2692915 -> 6401 4.45s -> 13ms
merge-base --all (HEAD vs import) 2698872 -> 45960 4.50s -> 79ms
merge-tree (across import) 2143438 -> 3 4.42s -> 11ms
git.git (88k commits, commit-graph):
steps wall-clock
merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 110ms -> 68ms
merge-base --all HEAD HEAD~1000 9891 -> 3828 18ms -> 10ms
merge-base --all HEAD HEAD~10000 72303 -> 41487 101ms -> 50ms
Changes since v2:
* New patch 8/8: moved the min_generation termination check and the
last_gen monotonicity assertion into paint_queue_get(), consolidating
halt conditions. commit_graph_generation() is now called once per
dequeued commit and shared across all checks.
* Widened the generation-monotonicity BUG assertion to fire
unconditionally, not only when min_generation is set. The side-exhaustion
optimization depends on correct generation ordering, so the assertion
should always be active. This is a behavior change: the BUG() now fires
for any generation ordering violation, regardless of the caller.
* Moved all halt conditions inside paint_queue_get() with the "pop first"
form: pop, check, then decrement counters. This keeps the optimization
commit's diff minimal (just inserting the new checks between pop and
decrement).
* Shortened the doc comment on paint_queue_get() to describe what it does
rather than how. Inline comments on each return NULL explain the specific
halt condition.
* Replaced the manual commit-graph setup in the step-count test with
run_all_modes, which now sets GIT_TRACE2_EVENT per mode and produces
trace-mode-{none,full,half,no-gdat}.txt files.
* Added a test_paint_down_steps helper for concise 4-mode step assertions
with diagnostic output on mismatch (prints "expected X, got Y" instead of
a silent grep failure).
* Added step-count assertions to the single-walk edge-case tests:
in_merge_bases_many:self, pending-stale, infinity-both-sides,
mixed-finite-infinity.
* Included step counts alongside wall-clock times in the benchmark tables.
Changes since v1:
* Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate passing with
old logic first.
* Dropped the ahead_behind decoupling patch. paint_state is now a NEW
struct alongside nonstale_queue instead of replacing it. ahead_behind()
is completely untouched.
* Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
code after the conversion) in a separate commit.
* Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function with signed
delta parameter).
* Split the old paint_count_transition (which handled both old and new
flags in one call) into separate remove/add calls with a signed delta.
This eliminates the need for the case 0 handler (which tracked "not in
the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
STALE) that documents all valid flag combinations, with BUG() in default.
* Added trace2_data_intmax() instrumentation to report the number of
commits visited per paint walk (separate commit), with deterministic
step-count assertions in t6600.
* Expanded switch statements to multi-line format per .clang-format.
* Used !count style throughout instead of count == 0.
* Updated technical documentation alongside code changes.
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (7):
Documentation/technical: add paint-down-to-common doc
t6099, t6600: add side-exhaustion regression tests
commit-reach: add trace2 instrumentation to paint_down_to_common()
commit-reach: introduce struct paint_state with per-side counters
commit-reach: remove unused nonstale_queue dedup wrappers
commit-reach: terminate merge-base walk when one paint side is
exhausted
commit-reach: move min_generation check into paint_queue_get()
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 137 +++++++++++++
commit-reach.c | 147 ++++++++++----
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 ++++++++
t/t6600-test-reach.sh | 181 ++++++++++++++++--
7 files changed, 501 insertions(+), 49 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: 6c3d7b73556db708feb3b16232fab1efc4353428
Submitted-As: https://lore.kernel.org/git/pull.2149.v3.git.1782479286.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.git.1781951820.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.v2.git.1782303254.gitgitgadget@gmail.com