pr-2149/spkrka/side-exhaust-pr-v2
tagged this
24 Jun 12:14
commit-reach: terminate merge-base walk when one side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patches 2-3 incorporate test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/7 Documentation/technical: add paint-down-to-common doc 2/7 t6600: add
test cases for side-exhaustion edge cases 3/7 t6099, t6600: add
side-exhaustion regression tests 4/7 commit-reach: add trace2
instrumentation to paint_down_to_common() 5/7 commit-reach: introduce struct
paint_state with per-side counters 6/7 commit-reach: remove unused
nonstale_queue dedup wrappers 7/7 commit-reach: terminate merge-base walk
when one paint side is exhausted
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added in
patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
to performance.
2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
steps wall-clock
merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms
merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms
merge-tree (across import) - 8.11s -> 100ms
git.git (88k commits, commit-graph):
steps wall-clock
merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms
merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms
merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms
merge-base HEAD HEAD~1000 - 9ms -> 9ms
merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms
Changes since v1:
* Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate passing with
old logic first.
* Dropped the ahead_behind decoupling patch. paint_state is now a NEW
struct alongside nonstale_queue instead of replacing it. ahead_behind()
is completely untouched.
* Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
code after the conversion) in a separate commit.
* Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function with signed
delta parameter).
* Split the old paint_count_transition (which handled both old and new
flags in one call) into separate remove/add calls with a signed delta.
This eliminates the need for the case 0 handler (which tracked "not in
the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
STALE) that documents all valid flag combinations, with BUG() in default.
* Moved all termination conditions into paint_queue_get(). The all-zero
check and the side-exhaustion check are merged under a shared
!pending_merge_bases guard. paint_queue_get() derives the generation from
the dequeued commit itself, so no extra parameter is needed.
* Added trace2_data_intmax() instrumentation to report the number of
commits visited per paint walk (separate commit), with deterministic
step-count assertions in t6600.
* Expanded switch statements to multi-line format per .clang-format.
* Used !count style throughout instead of count == 0.
* Updated technical documentation alongside code changes.
* Added benchmark data (both git-bench wall-clock and trace2 step counts)
to commit messages.
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (6):
Documentation/technical: add paint-down-to-common doc
t6099, t6600: add side-exhaustion regression tests
commit-reach: add trace2 instrumentation to paint_down_to_common()
commit-reach: introduce struct paint_state with per-side counters
commit-reach: remove unused nonstale_queue dedup wrappers
commit-reach: terminate merge-base walk when one paint side is
exhausted
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 128 ++++++++++++++
commit-reach.c | 119 ++++++++++---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++
t/t6600-test-reach.sh | 157 ++++++++++++++++++
7 files changed, 464 insertions(+), 25 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: ab776a62a78576513ee121424adb19597fbb7613
Submitted-As: https://lore.kernel.org/git/pull.2149.v2.git.1782303254.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.git.1781951820.gitgitgadget@gmail.com