Skip to content

pr-2120/mmontalbo/mm/structural-diff-backend-clean-v4

tagged this 14 Jun 18:59
Language-aware diff tools (e.g., Difftastic) and format-specific analyzers
can produce better line matching than Git's builtin diff algorithm, but
diff.<driver>.command replaces Git's output entirely, losing downstream
features like word diff, function context, color, and blame.

This series adds diff.<driver>.process, a long-running subprocess protocol
that lets an external tool control which lines Git considers changed while
Git handles all output formatting. The protocol follows
filter.<driver>.process: pkt-line over stdin/stdout, capability negotiation,
one process per Git invocation.

The tool receives both file versions and returns changed regions (line
ranges in the old and new file). Git validates and feeds them into the xdiff
pipeline in place of the builtin diff algorithm. When the tool returns no
hunks, Git treats the files as having no changes.

 * Patch 1: xdiff plumbing for externally supplied hunks.
 * Patch 2: diff.<driver>.process config key.
 * Patch 3: refactor subprocess API to separate process lifecycle from
   hashmap management, since the diff process stores its subprocess on the
   userdiff driver rather than in a hashmap.
 * Patch 4: the main feature.
 * Patch 5: bypass knobs (--no-ext-diff, format-patch).
 * Patch 6: blame integration so the tool can declare commits as having no
   changes.

Changes since v3:

 * Replaced Python test backend with C test-tool helper (thanks to Johannes
   Schindelin).
 * Added test coverage cases for deleted file, malformed hunk line, and
   missing capability.
 * Fixed potential overflow in synchronization invariant check by counting
   from changed[] arrays instead of accumulating.
 * Accept start=0 with count=0 in the hunk protocol, matching what git diff
   itself emits for empty file sides.
 * Warn on external hunk validation failure with specific reasons (range
   exceeded, overlap, sync mismatch) to help tool authors debug their
   implementations.
 * Test backend follows the same convention (start=0 when count=0 for empty
   file sides).

Michael Montalbo (6):
  xdiff: support external hunks via xpparam_t
  userdiff: add diff.<driver>.process config
  sub-process: separate process lifecycle from hashmap management
  diff: add long-running diff process via diff.<driver>.process
  diff: bypass diff process with --no-ext-diff and in format-patch
  blame: consult diff process for no-hunk detection

 Documentation/config/diff.adoc           |   5 +
 Documentation/diff-algorithm-option.adoc |   3 +
 Documentation/diff-options.adoc          |   4 +-
 Documentation/gitattributes.adoc         | 143 ++++++
 Makefile                                 |   2 +
 blame.c                                  |  40 +-
 builtin/log.c                            |   7 +
 diff-process.c                           | 297 ++++++++++++
 diff-process.h                           |  39 ++
 diff.c                                   |  29 +-
 diff.h                                   |   5 +
 meson.build                              |   1 +
 sub-process.c                            |  28 +-
 sub-process.h                            |   9 +-
 t/helper/meson.build                     |   1 +
 t/helper/test-diff-process-backend.c     | 299 ++++++++++++
 t/helper/test-tool.c                     |   1 +
 t/helper/test-tool.h                     |   1 +
 t/meson.build                            |   1 +
 t/t4080-diff-process.sh                  | 553 +++++++++++++++++++++++
 userdiff.c                               |   7 +
 userdiff.h                               |   5 +
 xdiff-interface.c                        |   7 +-
 xdiff/xdiff.h                            |  14 +
 xdiff/xdiffi.c                           | 123 ++++-
 xdiff/xprepare.c                         |  10 +
 xdiff/xprepare.h                         |   1 +
 27 files changed, 1614 insertions(+), 21 deletions(-)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100644 t/helper/test-diff-process-backend.c
 create mode 100755 t/t4080-diff-process.sh

base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c

Submitted-As: https://lore.kernel.org/git/pull.2120.v4.git.1781463564.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2120.git.1779415884.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2120.v2.git.1779733799.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.2120.v3.git.1780087700.gitgitgadget@gmail.com
Assets 2
Loading