Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new list diffing algorithm #1274

Merged
merged 13 commits into from
Mar 24, 2018
Merged

A new list diffing algorithm #1274

merged 13 commits into from
Mar 24, 2018

Conversation

Rich-Harris
Copy link
Member

@Rich-Harris Rich-Harris commented Mar 23, 2018

1.58.0 introduced a new list diffing algorithm, intended to solve the performance problems in #588. The original PR (#1249) was incredibly fast, but unfortunately had incorrect behaviour in some cases (where sequences of blocks were moving together). I modified the algorithm, thinking I was fixing the bug, but in fact I was also undoing the reason it was so fast, and everything actually became slower than it was before.

Oops.

I first tried to implement the classic list diffing algorithm described in this Medium post but... it's too complicated for my liberal arts major brain. I had a hunch that there might be a simpler approach with the same performance characteristics, that would avoid ping-ponging from the front to the back of the list. And here it is, in pseudo-code:

let o = length of old_list
let n = length of new_list

let deltas = Map<key, delta> where key is present in both lists,
                             and delta is abs(change in key position)

let did_move = Map<key, boolean>
let will_move = Map<key, boolean>

let insertion_point = null (i.e. the end)

while (o > 0 and n > 0)
  let old_key = old_list[o - 1];
  let new_key = new_list[n - 1];

  if (old_key === new_key)
    o--
    n--
    insertion_point = new_key

  else if (old_key is not in new_list)
    remove(old_key)
    o--

  else if (new_key is not in old_list)
    insert(new_key, insertion_point)
    n--

  else
    if (did_move[old_key])
      o--

    else if (will_move[new_key])
      insert(new_key, insertion_point)
      insertion_point = new_key
      n--

    else if (deltas[new_key] > deltas[old_key])
      did_move[new_key] = true
      insert(new_key, insertion_point)
      insertion_point = new_key
      n--

    else
      will_move[old_key] = true
      o--

while (o-- > 0)
  let old_key = old_list[o]
  remove(old_key)

while (n-- > 0)
  let new_key = new_list[n]
  insert(new_key, insertion_point)
  insertion_point = new_key

It's likely that this isn't novel at all, and it's possible that it has negative performance characteristics in circumstances I haven't considered. But it seems to work pretty well.

The secret sauce is deltas. When changing ABCDE to EABCD and working backwards (which is slightly more convenient), there are two possibilities...

  • move D to the end, then C (which is now in front of the E) in front of the D, B in front of the C, A in front of the B, or
  • move the E to the front

...one of which is obviously better — the second is one move, the first is four moves. But if you codify the rule that on encountering different keys you should always move the old key rather than the new key, then going from ABCDE to BCDEA would take four moves instead of one.

deltas allows us to make the right decision in both cases. In the first example, E would have to move four spaces to get to its new home, whereas D would only have to move one. In other words, moving the old key (E) is 'worth' four moves. In the second example, E would only have to move one space, whereas A would have to move four — so we move the new key (A), not the old key (E).

In the row swapping case that motivated all this work (the js-framework-benchmark) test, this algorithm is as fast as #1249, but also works for those tricky edge cases.

Would be grateful if anyone out there who is more knowledgable about these sorts of things can sanity check me though...

TODO

  • tidy things up a little bit
  • double check all the benchmarks this time
  • remove all the linked list stuff, which is no longer necessary (store the array of keys instead)

@codecov-io
Copy link

codecov-io commented Mar 23, 2018

Codecov Report

Merging #1274 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.02%     
==========================================
  Files         126      126              
  Lines        4548     4541       -7     
  Branches     1478     1478              
==========================================
- Hits         4179     4172       -7     
  Misses        153      153              
  Partials      216      216
Impacted Files Coverage Δ
src/generators/nodes/EachBlock.ts 97.95% <100%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc416a5...99afa99. Read the comment docs.

@jacwright
Copy link
Contributor

I borrowed an algorithm from Google's observe-js polyfill and implemented it here https://github.com/chip-js/differences-js/blob/master/src/diff.js#L199. (The original is too enmeshed into their other code and had to be extracted, so my version is the easier one to try and use)

I really don't know what performance characteristics are being targeted so this may not be helpful at all, but I thought I ought to share in case it was.

@thysultan
Copy link

If it also handles both carousel directions .

1, 2, 3, 4, 5, 6 -> 2, 3, 4, 5, 6, 1 = (append 1)
2, 3, 4, 5, 6, 1 -> 1, 2, 3, 4, 5, 6 = (prepend 1)

1, 2, 3, 4, 5, 6 -> 2, 3, 4, 5, 6, 7 (remove 1, create append 7)
2, 3, 4, 5, 6, 7 -> 1, 2, 3, 4, 5, 6 (remove 7, create prepend 1)

Then it should handle the common cases.

Having some tests to ensure the resulting DOM operations taken match exceptions would help in preventing a regression in the future. ivi, domvm, dio, etc, do this to some reasonable extent since some DOM ops can impact the correctness of a ui's end state with regards to input and other stateful elements beyond the supposed html representation.

@Rich-Harris Rich-Harris changed the title [WIP] A new list diffing algorithm A new list diffing algorithm Mar 24, 2018
@Rich-Harris
Copy link
Member Author

Thanks @jacwright. The observe-js approach looks like it probably has significantly worse performance in cases where there are many edits; it has a loop inside a loop in calcEditDifferences meaning that the comparison is (I think?) O(n^2) in the worst case, compared to O(n) for this algo (though they're probably similar when it comes to the actual DOM moves, which is the expensive part).

@thysultan yep, it handles those cases as described. You're right about regressions. Not sure how to test how many move operations it performs in a given case without adding code specifically for testing, but perhaps we need a more robust approach to benchmarking generally. Will freely admit that I generally skip that part, because running decent benchmarks takes forever.

Have tidied everything up here and double checked the benchmarks, so I'll go ahead and merge this.

@Rich-Harris Rich-Harris merged commit 89c0864 into master Mar 24, 2018
@Rich-Harris Rich-Harris deleted the fix-perf-regression branch March 24, 2018 16:10
@btakita
Copy link
Contributor

btakita commented Mar 25, 2018

Perhaps one way to test performance regressions is to add a wrapper to the JSDOM api which records DOM operations. Then the test can assert n DOM operations were performed.

Also, I thought I covered all of the cases in the PR. Apologies for what I missed. It does look like you have something that works well though.

@Rich-Harris
Copy link
Member Author

No apologies necessary! It was a real corner case — I only spotted it because one of the 'random permute' cases happened to fail in CI; to reproduce it locally I had to increase the count from 100 to 1000. And one of the reasons I'm keen on extracting stuff like this out into shared helpers is that it's much easier to understand and change the code when it's written in a JavaScript file, rather than in a giant string with interpolate variable names etc — #588 stood open so long because I didn't have the stomach to wade into that unreadable mess, but you did!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants