New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cursor logic, emoji handling, canonicality, and tests #4

merged 6 commits into from Oct 8, 2018


None yet
2 participants
Copy link

dgreensp commented Oct 5, 2018

This PR makes the following changes:

  • Changes the behavior of the third, "cursor" argument of diff(...) in a roughly backwards-compatible way to 1) meet Quill's diffing needs for text editing, and 2) make it easier to return canonical diffs with correct unicode handling
    • Previously, this argument was always a number, representing the cursor position before the change. While the exact behavior has never been specified, and was previously not unit-tested, the idea was that diff('aaa', 'aa', 1) would delete the first a (under the notion that the user positioned the cursor at position 1 and then backspaced), while diff('aaa', 'aa', 2) would delete the second a, and so on. However, the implementation was flawed, and fixing it would have been very complex due to interactions with the diff optimizer. It failed on documents as simple as xaa or aax.
    • The new implementation takes either a number or a structure with shape {oldSelection: {index, length}, newSelection: {index, length}} representing the selection range or text cursor (a range with length 0) before and after the change. If a number is given, it represents the position of a zero-length oldSelection, with the newSelection being inferred. Using selection ranges allows cases such as forward-delete and typing to replace a range of text to be taken into account. The major difference from the old implementation when a number is given, besides working correctly on arbitrary splices, is that it only works if the entire diff is a splice at the cursor position. It doesn't try to "fix up" the diff around the cursor position in the presence of other changes like the old implementation. It seems unlikely that someone was using this argument and relying on that precise behavior.
    • Added a bunch of unit tests for the cursor argument
  • Fix unicode/emoji handling to be fully valid and tested. The previous code for fixing invalid emojis was not exhaustive.
  • Improve canonicality of resulting diffs.
    • The previous implementation would sometimes include empty spans in the output! This is a quirk of the original code from Google
    • The original code sought to ensure a certain pattern to the output spans, e.g. any two EQUAL spans with only INSERT and DELETE spans between them must have at least one INSERT or DELETE span, and at most one of each, and if there is one of each, the DELETE must come before the INSERT. This pattern was disrupted by the handling of cursor position and unicode, and in fact all three concerns worked against each other. By integrating unicode handling with the diff cleanup routine, this is no longer the case, and we can ensure good properties of the diff.
  • I removed the tests that compare our diff output with the original Google code's diff output, and the dependency on googlediff. Our diff output now diverges, even without emojis and without a cursor position, due to my insistence on eliminating empty spans from the output. There is new test code that makes sure diffs are correct, e.g. that they are retaining or deleting all the old text, in order, and retaining or inserting all the new text, in order, in addition to checking canonicality.
  • Minor formatting cleanups. I kept to the original style of the code in many ways (e.g. using var and function). I pretty much wrote in ES5 style, though I have not made sure the code is ES5 clean. I did change == to === in a few places as a matter of style and changed some indentation so I could use my IDE's auto-formatter.

dgreensp added some commits Oct 5, 2018


jhchen approved these changes Oct 5, 2018

Copy link

jhchen left a comment

I'd slightly prefer if we can call it newRange/oldRange instead of newSelection/oldSelection since it's shorter and in Quill selection is mostly related to the module and range is object representing position info.

Also we could just change all usage of == into === to keep things consistent, with the exception of an intentional comparison with null/undefined.

Otherwise looks good and happy to have this edge case taken care of!

@@ -449,7 +465,7 @@ function diff_halfMatch_(text1, text2) {
* Any edit section can move as long as it doesn't cross an equality.
* @param {Array} diffs Array of diff tuples.
function diff_cleanupMerge(diffs) {
function diff_cleanupMerge(diffs, fix_unicode) {

This comment has been minimized.


jhchen Oct 5, 2018


Should add @params as well


This comment has been minimized.

Copy link
Contributor Author

dgreensp commented Oct 8, 2018

newRange/oldRange sounds good. Took all feedback.

I will try using the latest commit in Quill, maybe before merging.

@dgreensp dgreensp force-pushed the dg-cursor-and-unicode branch 2 times, most recently from da3e786 to ea599b6 Oct 8, 2018

@dgreensp dgreensp force-pushed the dg-cursor-and-unicode branch from ea599b6 to f4e19f4 Oct 8, 2018

@dgreensp dgreensp merged commit bc1e462 into master Oct 8, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
continuous-integration/travis-ci/push The Travis CI build passed

@dgreensp dgreensp deleted the dg-cursor-and-unicode branch Oct 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment