Improve LCS alignment using frequency-capped row matching#53
Merged
Conversation
Replace the band-limited drift with two improvements: 1. Unlimited drift: remove the 10%-of-height drift cap so content that has shifted by more than ~500 rows (e.g. in very tall images) still aligns correctly. 2. Frequency-capped LCS anchors: before running the LCS, rows that appear more than MAX_ROW_OCCURRENCES (20) times in either image are replaced with per-image unique sentinels and excluded from matching. This prevents ubiquitous rows (e.g. hundreds of identical white rows in a blank-vs-graph comparison) from creating spurious long-range matches that distort the diff, while still allowing rows that repeat a small number of times (e.g. shared list-item icons) to serve as alignment anchors. Sentinels use distinct prefixes (\0a / \0b) per image so they never accidentally match each other. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\0a/\0bprefixes so they never accidentally match each other across imagesFixes
blank-graph: was being stretched to nearly double height due to white rows matching across large vertical distances — now shows a clean pixel-level diffwhitespace-shift: preserved — single-band diff showing only the inserted blank rowlong-example,tall-content-change-and-shift): alignment still works correctly with unlimited driftTest plan
blank-graphdiff shows clean magenta overlay (no artificial gap insertion)whitespace-shiftdiff shows single pink band in middle, clean items above and belowtall-content-change-and-shiftdiff still correctly aligns shifted content🤖 Generated with Claude Code