Rewrite Merge to take two HighlightEvent Iterators #3695

the-mikedavis · 2022-09-04T23:44:58Z

Currently, the HighlightEvent merge function overlays spans (Vec<(usize, std::ops::Range<usize>)> on an Iterator of HighlightEvent. The current merge doesn't handle overlapping spans well though, specifically with LSP diagnostics:

diagnostic ranges may be discarded if they overlap
ghost text may appear when a language server sends overlapping diagnostics at the end of a document

This merge function takes two HighlightEvent Iterators which may themselves have overlapping highlights and produces a new Iterator with highlights from right overlaid on left. Also included is a new Iterator that properly subslices overlapping ranges in a Vec<(usize, std::ops::Range<usize>) which can be passed to the merge function and satisfies all of merge's invariants and assumptions.

The new merge is more complicated and it looks like we get dinged for the extra VecDeques but otherwise the performance looks comparable. I couldn't find any noticeable changes in the flamegraph except for two new tiny slivers for VecDeque::extend.

Fixes #2248
I rebased #2857 on this and it works well and actually the types work out much nicer than that branch's merge function.

helix-core/src/syntax.rs

* Refactor Merge to take two HighlightEvent Iterators * Add an Iterator for subslicing overlapping spans

In these cases, right or left has finished, respectively. We drain right or left's queue and emit those events but previously we forgot to then move the iterator to the next, discarding the Source being emitted in these branches. The result was duplicated emitted text after the last diagnostic in a file, or after the last selection.

The ranges for highlight spans are inclusive, so the span in right may be discarded if its end is equal to left's start. For a graphical example: |----------------| L |---| R Here R ends as L starts. In this case we discard R. Without this change, we subsliced R and made it into a Source which starts at L and ends at L which fails the min_width_1 invariant. Also included is a change that eagerly discards the following HighlightEnd events in the iterator and queued HighlightStart events. This case is technically covered by the outer `loop { match { .. } }` but it's better to do it eagerly: we don't need to branch on all of the possibilities, just blindly empty the right_queue and drop all HighlightEnds.

This change adds two counters which are incremented for every queued HighlightStart and decremented with every dequeued HighlightEnd. This is used to track the invariant that HighlightStarts and HighlightEnds are balanced in the left iterator. In the right iterator, the same checks are made except that the counter is also used to balance the number of HighlightStarts and HighlightEnds when the right iterator is being stopped before having finished. Without this change, the first test case from this commit would be missing a HighlightEnd. Also, the second test case would provoke a panic attempting to match on `(None, Some(HighlightEnd))`. This has been fixed by the change which allows a `(None, Some(event))` pattern. Output-wise, that change allows a `HighlightStart, HighlightEnd` sequence from `right` to end up in the emitted events. These are slightly wasteful but are otherwise benign. The given test case reflects a panic from running on the rainbow brackets branch.

Rather than transforming the input `Vec<Span>` into an iterator, this version of SpanIter keeps the Vec as-is and iterates through it manually with an index. This is necessary because in cases where many spans overlap one another, the SpanIter implementation needs to look arbitrarily far ahead in the Vec to find all ranges which start on the same point. The basic structure of this version is the same as the prior version. The following are the differences: * The position in the Vec is tracked manually with an index rather than automatically with an Iterator. * Subslicing applies to all spans with the same starting point. * All spans with the same start point are emitted simultaneously. The new version covers a real breakage noticed from diagnostics produced by rust-analyzer.

Selection highlights are already known to be satisfy all of the invariants in syntax::merge so syntax::span_iter is pure overhead. This change introduces another iterator with a naive flat_map implementation that splits Spans into HighlightEvents without checking if any spans are overlapping or sorted. We wrap the iterator in a new type to avoid any performance penalty of a `Box<dyn Iterator<Item = HighlightEvent>>` (which uses dynamic dispatch).

This change covers an edge-case that can arise in the SpanIter from subslicing spans. Spans which are subsliced may start after some other span later in the span Vec (see the new test case). To handle this, we re-sort the span Vec after subslicing any spans. Since the subslice only affects the starts of some spans and otherwise leaves the ordering unchanged, we accomplish the sort by finding any spans which should come before the subsliced spans and rotate them to come first. This preserves the ordering in the subsliced spans and in the rotated spans but ensures that the global ordering of the Vec remains correct after the subslice.

The span iterators and tests took up a significant portion of the syntax module but were fairly self-contained. In addition, the Span type was an informal (usize, std::ops::Range<usize>) which was a bit tricky to deal with: * the type was long to write out for all producers/consumers * std::ops::Range<usize> does not implement Copy Plus we don't end up using any Range methods except its Ord/PartialOrd implementations. Given its first-class usage for diagnostics and selections, it seems appropriate to separate it into its own struct type and module.

the-mikedavis · 2022-09-22T16:07:56Z

I think I've debugged this pretty thoroughly but it's still a somewhat risky change: bugs in SpanIter (which should all be squashed now) manifest as duplicated "ghost" text. I'll be driving this branch for the foreseeable future to see if I can find any rendering bugs. But this branch could probably use some time to "simmer".

It's also probably going to be pretty impossible to review 😅. My desk is completely covered in number-line drawings of every possible scenario for overlapping ranges. I think I have test cases for most scenarios though and the ascii art is very pretty 🙂

pascalkuthe

I reviewed the SpanIter locally and found/fixed some bugs and implemented some performance optimizations.
I did the changes locally because it would be really annoying to do trough the GH webui.
I opened at PR against your fork (the-mikedavis#3) that I will refer to during my review comments.

The algorithm was a bit unapproachable at first.
However it looked like the perfect candidate for property testing, because you clearly defined some properties that the resulting HighlightEvents must uphold.
I used proptest to implement this in 924e92c.
I used proptest instead of quickeck, because it makes writing reduces much nicer which is relevant because the input data also needs to uphold some variants.I converted the one existing use of quickcheck to use proptest instead.

The resulting testt almost instantly reduced some crashes. It took me quite a while to fix this because there were actually multiple bugs (see review comments).

With the suggested changes from my branch I let the proptest run ~50 million different random range inputs and found no more test failures. To find the source of the bugs (and fix them) I also had to really get into the details of the algorithm.
Therefore both from a qualitative as well as quantitative point of view I am now very confident that the updated version works as intended.

I haven't looked at the merge implementation yet, I think that will also be a good fit for proptesting.

pascalkuthe · 2022-09-25T11:47:13Z

helix-core/src/syntax/span.rs

+    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+        // Sort by range: ascending by start and then ascending by end for ties.
+        if self.start == other.start {
+            self.end.cmp(&other.end)


This should be sorting the end in descending order (its specified like that in the documentation below and that also makes more sense because produces less events). Fixed in 447270a

pascalkuthe · 2022-09-25T11:48:40Z

helix-core/src/syntax/span.rs

+            // `i` starts on the index after the last subsliced span.
+            loop {
+                match self.spans.get(i) {
+                    Some(span) if span.start < intersect => {


You need to compare span.end aswell to maintain the descending order of span ends (or even simpler just compare the entire span), in case a new span starts exactly at the intersect. Fixed in 276e58b

pascalkuthe · 2022-09-25T11:57:06Z

helix-core/src/syntax/span.rs

+                            // If this span needs to be subsliced, consume the
+                            // left part of the subslice and leave the right.
+                            self.range_ends.push(intersect);
+                            span.start = intersect;


intersect may be larger than the spans end if multiple spans start at the same point.
This will lead to spans that have their end moved past their start.
Lead to an assertion failure in debug builds or messed up highlighting in release builds.
This is the main bug I found. Its fixed in f73a14e.
I added a pretty detailed description of the bug to the commit message (including an example) so look there for details. I also added unit tests to cover these situations.

Note: If the ranges are sorted in ascending order (as was the case originally) this bug is already fixed by 276e58b. However sorting in descending order will emit less events so I think its preferable. Furthermore with this change the entire span portioning can be split into a separate function (done in c64cf12) which I also see as a plus

Edit: Discard the Note above note this is not true. Ascending order would still need a (different) fix for this edgecase. Descending order looks at the longest span first and moves the start of the span past the end which leads to easy to detect assertions failures. In ascending order the algorithm instead does not detect the interception at all and treats all spans at the same position as tough they all end before any interception would occur. This produces incorrect Highlight Events as a result. However this is currently not caught by property testing because all invariants still hold. Last time I look at ascending order I was only using property testing and hence taught it was fixed. But the unit tests I have added for this edge-case since should fail for ascending order too (I think).

pascalkuthe · 2022-09-25T11:58:17Z

helix-core/src/syntax/span.rs

+        // Ensure range-ends are sorted ascending. Ranges which start at the
+        // same point may be in descending order because of the assumed
+        // sort-order of input ranges.
+        self.range_ends.sort_unstable();


Sorting range_ends in reverse (meaning that the smallest values are at the end) allows some pretty nice performance optimizations by replacing the retain above with a simple pop off the end.
Done in c64cf12

pascalkuthe · 2022-09-25T12:00:03Z

helix-core/src/syntax/span.rs

+
+/// Converts a Vec of spans into an [Iterator] over [HighlightEvent]s
+///
+/// This implementation does not resolve overlapping spans. Zero-width spans are


Zero-width spans were actually not eliminated but it was trivial to implement. (See 36d44f5)

the-mikedavis · 2022-10-05T16:11:03Z

This approach adds a lot of complexity which ends up not being necessary: diagnostics can be handled as a special case and there isn't currently another producer of Vec<(usize, std::ops::Range<usize>)> that has overlapping spans or needs multiple highlights per range. (Selections are non-overlapping by nature and rainbow highlights end up not overlapping and only ever need one color each.)

Plus this branch has some rendering bugs when there are overlaps.

We can revisit the Merge change if it ends up being necessary later down the line but for now it doesn't look necessary to take two iterators

archseer reviewed Sep 5, 2022

View reviewed changes

helix-core/src/syntax.rs Outdated Show resolved Hide resolved

the-mikedavis force-pushed the md-highlight-event-stream-merge branch 4 times, most recently from 36e4955 to 8584189 Compare September 5, 2022 22:36

the-mikedavis mentioned this pull request Sep 5, 2022

Rainbow tree-sitter matches 🌈 #2857

Draft

the-mikedavis marked this pull request as draft September 6, 2022 15:13

kirawi added A-tree-sitter Area: Tree-sitter A-core Area: Helix core improvements S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Sep 13, 2022

the-mikedavis removed the A-tree-sitter Area: Tree-sitter label Sep 17, 2022

the-mikedavis mentioned this pull request Sep 18, 2022

Re-sort diagnostics after transaction transform #3895

Merged

the-mikedavis added 11 commits September 22, 2022 10:51

Rewrite Merge to take two HighlightEvent Iterators

f1c1605

* Refactor Merge to take two HighlightEvent Iterators * Add an Iterator for subslicing overlapping spans

Convert span Vecs to Boxed Iterators before merging

d3ddd0b

Add tests for highlight stream merge functions

cdd564e

Assert highlight merge invariants in debug mode

1b0aed5

the-mikedavis force-pushed the md-highlight-event-stream-merge branch from 23f265b to a142391 Compare September 22, 2022 15:54

the-mikedavis marked this pull request as ready for review September 22, 2022 16:08

pascalkuthe mentioned this pull request Sep 25, 2022

Span merge review the-mikedavis/helix#3

Closed

pascalkuthe requested changes Sep 25, 2022

View reviewed changes

the-mikedavis marked this pull request as draft September 25, 2022 13:29

pascalkuthe mentioned this pull request Sep 29, 2022

Make Merge behave consistently #4042

Closed

the-mikedavis closed this Oct 5, 2022

the-mikedavis deleted the md-highlight-event-stream-merge branch October 5, 2022 16:11

the-mikedavis mentioned this pull request Oct 5, 2022

Overlay all diagnostics with highest severity on top #4113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Merge to take two HighlightEvent Iterators #3695

Rewrite Merge to take two HighlightEvent Iterators #3695

the-mikedavis commented Sep 4, 2022 •

edited

Loading

the-mikedavis commented Sep 22, 2022

pascalkuthe left a comment •

edited

Loading

pascalkuthe Sep 25, 2022

pascalkuthe Sep 25, 2022 •

edited

Loading

pascalkuthe Sep 25, 2022 •

edited

Loading

pascalkuthe Sep 25, 2022

pascalkuthe Sep 25, 2022 •

edited

Loading

the-mikedavis commented Oct 5, 2022

Rewrite Merge to take two HighlightEvent Iterators #3695

Rewrite Merge to take two HighlightEvent Iterators #3695

Conversation

the-mikedavis commented Sep 4, 2022 • edited Loading

the-mikedavis commented Sep 22, 2022

pascalkuthe left a comment • edited Loading

Choose a reason for hiding this comment

pascalkuthe Sep 25, 2022

Choose a reason for hiding this comment

pascalkuthe Sep 25, 2022 • edited Loading

Choose a reason for hiding this comment

pascalkuthe Sep 25, 2022 • edited Loading

Choose a reason for hiding this comment

pascalkuthe Sep 25, 2022

Choose a reason for hiding this comment

pascalkuthe Sep 25, 2022 • edited Loading

Choose a reason for hiding this comment

the-mikedavis commented Oct 5, 2022

the-mikedavis commented Sep 4, 2022 •

edited

Loading

pascalkuthe left a comment •

edited

Loading

pascalkuthe Sep 25, 2022 •

edited

Loading

pascalkuthe Sep 25, 2022 •

edited

Loading

pascalkuthe Sep 25, 2022 •

edited

Loading