Skip to content

pr-2149/spkrka/side-exhaust-pr-v5

tagged this 01 Jul 16:37
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.

When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.

A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.

origin/HEAD  o   o  PR HEAD
             |   |
   (import)  o   :
            / \ /
           |   o  merge-base
           |   |
           :   :  (~2.5M commits)
           |   |
import root   main root

In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patch 3 incorporates test cases from Elijah's branch.

This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.

Patch 2 adds a test_trace2_data_singular helper to test-lib-functions.sh
that reports expected/actual values on assertion failure instead of a silent
grep exit. This was invaluable during development for iterating on step
counts across the series, and should be valuable for repairing tests after
future algorithmic changes. Happy to drop it if it is considered unnecessary
infrastructure.

The final patch removes the commit-date ordering fallback introduced by
091f4cf3 (commit: don't use generation numbers if not needed, 2018-08-30).
With side-exhaustion in place, the fallback is no longer needed for
performance, and removing it ensures the queue is always generation-ordered
regardless of graph version, so every termination condition can rely on a
single ordering invariant. This patch can be dropped if the scope is too
broad for this series.

NOTE: If the final patch is kept, the separate "commit-reach: fix !FIND_ALL
early exit with v1 commit graph" topic becomes unnecessary. Either way, the
two topics conflict trivially and I am happy to reroll whichever lands
second.

Benchmarks

Trace2 step counts are deterministic (measured via trace2_data_intmax added
in patch 5). Wall-clock times are best-of-11 runs.

2.6M-commit monorepo with commit-graph:

                                      steps              wall-clock
merge-base --all  (across import)  2143438 ->      3     3.67s ->    5ms
merge-base --all  (1000 apart)     2692915 ->   1035     4.41s ->    7ms
merge-base --all  (5000 apart)     2692915 ->   6401     4.45s ->   13ms
merge-base --all  (HEAD vs import) 2698872 ->  45960     4.50s ->   79ms
merge-tree        (across import)  2143438 ->      3     4.42s ->   11ms

git.git (88k commits, commit-graph):

                                      steps              wall-clock
merge-base --all v2.0.0 v2.55.0-rc1 72264 ->  44589      110ms ->   68ms
merge-base --all HEAD HEAD~1000      9891 ->   3828       18ms ->   10ms
merge-base --all HEAD HEAD~10000    72303 ->  41487      101ms ->   50ms

CC: Derrick Stolee stolee@gmail.com CC: Elijah Newren newren@gmail.com

[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150

Changes since v4:

 * New patch 2/10: added test_trace2_data_singular helper to
   test-lib-functions.sh. Shows expected/actual values on assertion failure
   instead of a silent grep failure. Makes iterating on step counts much
   easier.

 * New patch 6/10: added clock-skew topologies (se-, se2-) that expose
   side-exhaustion bugs when the commit-date ordering fallback fires with a
   v1 commit graph. All topologies use a shared skew_commit helper. Includes
   step count assertions for edge-case tests from patch 3.

 * Folded the nonstale_queue dedup wrapper removal (previously separate
   patch 6/8) into the paint_state introduction in patch 7/10.

 * New patch 10/10: remove the commit-date ordering fallback in
   paint_down_to_common(). The fallback (091cf18e) was a performance
   optimization for v1 commit graphs, but it breaks the generation ordering
   invariant that both the side-exhaustion and single-result optimizations
   depend on. With side-exhaustion in place, the fallback is no longer
   needed. If kept, this supersedes the separate "commit-reach: fix
   !FIND_ALL early exit with v1 commit graph" topic.

Changes since v3:

 * Fixed BUG assertion that was accidentally made unconditional in v3:
   restored the min_generation guard so it only fires when generation-based
   ordering is active.

 * Moved generation cutoff and single-result termination conditions into the
   documentation in patch 1, since they describe existing behavior.

 * Renamed paint_state counter fields for clarity: p1_count ->
   parent1_count, p2_count -> parent2_count, pending_merge_bases ->
   mb_candidate_count. Changed counter types from int to size_t. (Suggested
   by Rene Scharfe.)

Changes since v2:

 * New patch 9/10 (was 8/8): moved the min_generation termination check and
   the last_gen monotonicity assertion into paint_queue_get(), consolidating
   halt conditions. commit_graph_generation() is now called once per
   dequeued commit and shared across all checks.

 * Moved all halt conditions inside paint_queue_get() with the "pop first"
   form: pop, check, then decrement counters. This keeps the optimization
   commit's diff minimal (just inserting the new checks between pop and
   decrement).

 * Shortened the doc comment on paint_queue_get() to describe what it does
   rather than how. Inline comments on each return NULL explain the specific
   halt condition.

 * Replaced the manual commit-graph setup in the step-count test with
   run_all_modes, which now sets GIT_TRACE2_EVENT per mode and produces
   trace-mode-{none,full,half,no-gdat}.txt files.

 * Added a test_paint_down_steps helper for concise 4-mode step assertions
   with diagnostic output on mismatch (prints "expected X, got Y" instead of
   a silent grep failure).

 * Added step-count assertions to the single-walk edge-case tests:
   in_merge_bases_many:self, pending-stale, infinity-both-sides,
   mixed-finite-infinity.

 * Included step counts alongside wall-clock times in the benchmark tables.

Changes since v1:

 * Reordered patches: documentation first (describing the existing
   algorithm), tests before code changes, so they demonstrate passing with
   old logic first.

 * Dropped the ahead_behind decoupling patch. paint_state is now a NEW
   struct alongside nonstale_queue instead of replacing it. ahead_behind()
   is completely untouched.

 * Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
   code after the conversion) in a separate commit.

 * Renamed: struct paint_queue -> paint_state, field pq -> queue,
   paint_count_add/remove -> paint_count_update (single function with signed
   delta parameter).

 * Split the old paint_count_transition (which handled both old and new
   flags in one call) into separate remove/add calls with a signed delta.
   This eliminates the need for the case 0 handler (which tracked "not in
   the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
   STALE) that documents all valid flag combinations, with BUG() in default.

 * Added trace2_data_intmax() instrumentation to report the number of
   commits visited per paint walk (separate commit), with step-count
   assertions in tests for deterministic regression detection.

Elijah Newren (1):
  t6600: add test cases for side-exhaustion edge cases

Kristofer Karlsson (9):
  Documentation/technical: add paint-down-to-common doc
  test-lib-functions: improve diagnostic output for trace2 data
    assertions
  t6099, t6600: add side-exhaustion regression tests
  commit-reach: add trace2 instrumentation to paint_down_to_common()
  t6600: add clock-skew topologies and step counts for edge cases
  commit-reach: introduce struct paint_state with per-side counters
  commit-reach: terminate merge-base walk when one paint side is
    exhausted
  commit-reach: move min_generation check into paint_queue_get()
  commit-reach: remove commit-date ordering fallback

 Documentation/Makefile                        |   1 +
 Documentation/technical/meson.build           |   1 +
 .../technical/paint-down-to-common.adoc       | 151 +++++++++
 commit-graph.c                                |  11 -
 commit-graph.h                                |   6 -
 commit-reach.c                                | 152 ++++++---
 t/meson.build                                 |   1 +
 t/t6099-merge-base-side-exhaustion.sh         |  82 +++++
 t/t6600-test-reach.sh                         | 289 +++++++++++++++++-
 t/test-lib-functions.sh                       |  36 +++
 10 files changed, 663 insertions(+), 67 deletions(-)
 create mode 100644 Documentation/technical/paint-down-to-common.adoc
 create mode 100755 t/t6099-merge-base-side-exhaustion.sh

base-commit: e9019fcafe0040228b8631c30f97ae1adb61bcdc

Submitted-As: https://lore.kernel.org/git/pull.2149.v5.git.1782923832.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.git.1781951820.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.v2.git.1782303254.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.v3.git.1782479286.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2149.v4.git.1782649547.gitgitgadget@gmail.com
Assets 2
Loading