Skip to content

Conversation

@ricokahler
Copy link
Contributor

@ricokahler ricokahler commented Jun 3, 2025

This PR introduces a BREAKING CHANGE by removing the user-configurable options for diff-match-patch (DMP) behavior. Instead, the library now employs a set of automatic, performance-tested heuristics to determine when to use DMP for string diffs versus falling back to a simple set operation.

This change aims to simplify the API, provide more consistent and optimized behavior out-of-the-box, and better handle real-world editing scenarios like keystrokes and copy-paste operations, while avoiding known performance pitfalls of DMP on large text replacements.

Key Changes:

  • BREAKING CHANGE:
    • Removed the diffMatchPatch options (enabled, lengthThresholdAbsolute, lengthThresholdRelative) from PatchOptions.
    • Removed the DiffMatchPatchOptions and DiffOptions (which included diffMatchPatch) interfaces from the public API.
    • Removed the internal mergeOptions function and the DMP-specific parts of defaultOptions.
  • New Performance-Based Heuristics for DMP:
    • Introduced a new exported utility function shouldUseDiffMatchPatch(source: string, target: string): boolean. This function encapsulates the new logic for deciding whether to use DMP.
    • The decision is now based on:
      • Document Size Limit: Documents larger than 1MB (DMP_MAX_DOCUMENT_SIZE) will use set operations.
      • Change Ratio Threshold: If more than 40% (DMP_MAX_CHANGE_RATIO) of the text changes, set is used (indicates replacement vs. editing).
      • Small Document Optimization: Documents smaller than 10KB (DMP_MIN_SIZE_FOR_RATIO_CHECK) always use DMP, as performance is consistently high for these.
      • System Key Protection: Properties starting with _ (system keys) continue to use set operations.
    • Added extensive JSDoc to shouldUseDiffMatchPatch detailing the heuristic rationale, performance characteristics (based on testing @sanity/diff-match-patch on an M2 MacBook Pro), algorithm details, and test methodology.
  • Internal Simplification:
    • The internal getDiffMatchPatch function now uses shouldUseDiffMatchPatch to make its decision and no longer accepts DMP-related options.
    • Simplified the call to the underlying @sanity/diff-match-patch library within getDiffMatchPatch to use makePatches(source, target) directly. This is more concise and leverages the internal optimizations of that library, with performance validated to be equivalent to the previous multi-step approach.
  • Constants: Introduced SYSTEM_KEYS, DMP_MAX_DOCUMENT_SIZE, DMP_MAX_CHANGE_RATIO, and DMP_MIN_SIZE_FOR_RATIO_CHECK to define these thresholds.
  • Test Updates: Snapshots have been updated to reflect the new DMP behavior based on these heuristics.

Rationale for Change:

The previous configurable thresholds for DMP were somewhat arbitrary and could lead to suboptimal performance or overly verbose patches in certain scenarios. This change is based on empirical performance testing of the @sanity/diff-match-patch library itself. The new heuristics are designed to:

  • Optimize for common editing patterns: Ensure fast performance for keystrokes and small pastes, which are the most frequent operations.
  • Prevent performance degradation: Avoid triggering complex and potentially slow DMP algorithm paths when users perform large text replacements (e.g., pasting entirely new content).
  • Simplify the API: Remove the burden of configuration from the user, providing sensible defaults.
  • Maintain conflict-resistance: Continue to leverage DMP's strengths for collaborative editing where appropriate.

By hardcoding these well-tested heuristics, we aim for a more robust and performant string diffing strategy by default.

Copy link
Contributor Author

ricokahler commented Jun 3, 2025

BREAKING CHANGE: Remove `DiffMatchPatchOptions` and `DiffOptions` interfaces. The library now uses automatic heuristics based on document size (1MB limit) and change ratio (40% threshold) instead of user-configurable thresholds. This optimizes for real-world editing scenarios like keystrokes and copy-paste operations while avoiding slow algorithm paths on large text replacements.

- Remove lengthThresholdAbsolute and lengthThresholdRelative options
- Add shouldUseDiffMatchPatch() function with performance-tested heuristics
- Simplify getDiffMatchPatch() to use direct makePatches(source, target) approach
- Update constants: DMP_MAX_DOCUMENT_SIZE, DMP_MAX_CHANGE_RATIO, DMP_MIN_SIZE_FOR_RATIO_CHECK
- Remove DiffMatchPatchOptions and DiffOptions from public API exports
@ricokahler ricokahler force-pushed the v6_new-dmp-behavior branch from 63bee6e to db4ab0e Compare June 6, 2025 21:57
Copy link
Contributor Author

ricokahler commented Jun 13, 2025

Merge activity

  • Jun 13, 7:24 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jun 13, 7:24 PM UTC: @ricokahler merged this pull request with Graphite.

@ricokahler ricokahler merged commit 9577019 into main Jun 13, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants