-
Notifications
You must be signed in to change notification settings - Fork 1
feat!: replace configurable DMP with perf-based heuristics #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced Jun 3, 2025
55343b7 to
23a0335
Compare
This was referenced Jun 4, 2025
rexxars
requested changes
Jun 6, 2025
BREAKING CHANGE: Remove `DiffMatchPatchOptions` and `DiffOptions` interfaces. The library now uses automatic heuristics based on document size (1MB limit) and change ratio (40% threshold) instead of user-configurable thresholds. This optimizes for real-world editing scenarios like keystrokes and copy-paste operations while avoiding slow algorithm paths on large text replacements. - Remove lengthThresholdAbsolute and lengthThresholdRelative options - Add shouldUseDiffMatchPatch() function with performance-tested heuristics - Simplify getDiffMatchPatch() to use direct makePatches(source, target) approach - Update constants: DMP_MAX_DOCUMENT_SIZE, DMP_MAX_CHANGE_RATIO, DMP_MIN_SIZE_FOR_RATIO_CHECK - Remove DiffMatchPatchOptions and DiffOptions from public API exports
63bee6e to
db4ab0e
Compare
rexxars
approved these changes
Jun 9, 2025
Contributor
Author
Merge activity
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

This PR introduces a BREAKING CHANGE by removing the user-configurable options for
diff-match-patch(DMP) behavior. Instead, the library now employs a set of automatic, performance-tested heuristics to determine when to use DMP for string diffs versus falling back to a simplesetoperation.This change aims to simplify the API, provide more consistent and optimized behavior out-of-the-box, and better handle real-world editing scenarios like keystrokes and copy-paste operations, while avoiding known performance pitfalls of DMP on large text replacements.
Key Changes:
diffMatchPatchoptions (enabled,lengthThresholdAbsolute,lengthThresholdRelative) fromPatchOptions.DiffMatchPatchOptionsandDiffOptions(which includeddiffMatchPatch) interfaces from the public API.mergeOptionsfunction and the DMP-specific parts ofdefaultOptions.shouldUseDiffMatchPatch(source: string, target: string): boolean. This function encapsulates the new logic for deciding whether to use DMP.DMP_MAX_DOCUMENT_SIZE) will usesetoperations.DMP_MAX_CHANGE_RATIO) of the text changes,setis used (indicates replacement vs. editing).DMP_MIN_SIZE_FOR_RATIO_CHECK) always use DMP, as performance is consistently high for these._(system keys) continue to usesetoperations.shouldUseDiffMatchPatchdetailing the heuristic rationale, performance characteristics (based on testing@sanity/diff-match-patchon an M2 MacBook Pro), algorithm details, and test methodology.getDiffMatchPatchfunction now usesshouldUseDiffMatchPatchto make its decision and no longer accepts DMP-related options.@sanity/diff-match-patchlibrary withingetDiffMatchPatchto usemakePatches(source, target)directly. This is more concise and leverages the internal optimizations of that library, with performance validated to be equivalent to the previous multi-step approach.SYSTEM_KEYS,DMP_MAX_DOCUMENT_SIZE,DMP_MAX_CHANGE_RATIO, andDMP_MIN_SIZE_FOR_RATIO_CHECKto define these thresholds.Rationale for Change:
The previous configurable thresholds for DMP were somewhat arbitrary and could lead to suboptimal performance or overly verbose patches in certain scenarios. This change is based on empirical performance testing of the
@sanity/diff-match-patchlibrary itself. The new heuristics are designed to:By hardcoding these well-tested heuristics, we aim for a more robust and performant string diffing strategy by default.