Skip to content

Improve LCS alignment using frequency-capped row matching#53

Merged
trotzig merged 1 commit intomainfrom
claude/compassionate-grothendieck
Mar 22, 2026
Merged

Improve LCS alignment using frequency-capped row matching#53
trotzig merged 1 commit intomainfrom
claude/compassionate-grothendieck

Conversation

@trotzig
Copy link
Copy Markdown
Contributor

@trotzig trotzig commented Mar 22, 2026

Summary

  • Unlimited drift: removes the old 10%-of-height band limit so very tall images with large content shifts align correctly
  • Frequency-capped LCS anchors: rows appearing more than 20 times in either image are excluded from LCS matching (replaced with per-image unique sentinels). This prevents ubiquitous rows (e.g. hundreds of identical white rows in a blank-vs-graph diff) from creating spurious long-range matches that stretch the diff image, while still letting moderately-repeated rows (e.g. shared list-item icons) serve as alignment anchors
  • Sentinels use distinct \0a / \0b prefixes so they never accidentally match each other across images

Fixes

  • blank-graph: was being stretched to nearly double height due to white rows matching across large vertical distances — now shows a clean pixel-level diff
  • whitespace-shift: preserved — single-band diff showing only the inserted blank row
  • Tall images (long-example, tall-content-change-and-shift): alignment still works correctly with unlimited drift

Test plan

  • All 14 snapshot tests pass
  • blank-graph diff shows clean magenta overlay (no artificial gap insertion)
  • whitespace-shift diff shows single pink band in middle, clean items above and below
  • tall-content-change-and-shift diff still correctly aligns shifted content

🤖 Generated with Claude Code

Replace the band-limited drift with two improvements:

1. Unlimited drift: remove the 10%-of-height drift cap so content that
   has shifted by more than ~500 rows (e.g. in very tall images) still
   aligns correctly.

2. Frequency-capped LCS anchors: before running the LCS, rows that
   appear more than MAX_ROW_OCCURRENCES (20) times in either image are
   replaced with per-image unique sentinels and excluded from matching.
   This prevents ubiquitous rows (e.g. hundreds of identical white rows
   in a blank-vs-graph comparison) from creating spurious long-range
   matches that distort the diff, while still allowing rows that repeat
   a small number of times (e.g. shared list-item icons) to serve as
   alignment anchors. Sentinels use distinct prefixes (\0a / \0b) per
   image so they never accidentally match each other.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@trotzig trotzig merged commit 695a5cc into main Mar 22, 2026
1 check passed
@trotzig trotzig deleted the claude/compassionate-grothendieck branch March 22, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant