-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Merge to take two HighlightEvent Iterators #3695
Rewrite Merge to take two HighlightEvent Iterators #3695
Conversation
36e4955
to
8584189
Compare
* Refactor Merge to take two HighlightEvent Iterators * Add an Iterator for subslicing overlapping spans
In these cases, right or left has finished, respectively. We drain right or left's queue and emit those events but previously we forgot to then move the iterator to the next, discarding the Source being emitted in these branches. The result was duplicated emitted text after the last diagnostic in a file, or after the last selection.
The ranges for highlight spans are inclusive, so the span in right may be discarded if its end is equal to left's start. For a graphical example: |----------------| L |---| R Here R ends as L starts. In this case we discard R. Without this change, we subsliced R and made it into a Source which starts at L and ends at L which fails the min_width_1 invariant. Also included is a change that eagerly discards the following HighlightEnd events in the iterator and queued HighlightStart events. This case is technically covered by the outer `loop { match { .. } }` but it's better to do it eagerly: we don't need to branch on all of the possibilities, just blindly empty the right_queue and drop all HighlightEnds.
This change adds two counters which are incremented for every queued HighlightStart and decremented with every dequeued HighlightEnd. This is used to track the invariant that HighlightStarts and HighlightEnds are balanced in the left iterator. In the right iterator, the same checks are made except that the counter is also used to balance the number of HighlightStarts and HighlightEnds when the right iterator is being stopped before having finished. Without this change, the first test case from this commit would be missing a HighlightEnd. Also, the second test case would provoke a panic attempting to match on `(None, Some(HighlightEnd))`. This has been fixed by the change which allows a `(None, Some(event))` pattern. Output-wise, that change allows a `HighlightStart, HighlightEnd` sequence from `right` to end up in the emitted events. These are slightly wasteful but are otherwise benign. The given test case reflects a panic from running on the rainbow brackets branch.
Rather than transforming the input `Vec<Span>` into an iterator, this version of SpanIter keeps the Vec as-is and iterates through it manually with an index. This is necessary because in cases where many spans overlap one another, the SpanIter implementation needs to look arbitrarily far ahead in the Vec to find all ranges which start on the same point. The basic structure of this version is the same as the prior version. The following are the differences: * The position in the Vec is tracked manually with an index rather than automatically with an Iterator. * Subslicing applies to all spans with the same starting point. * All spans with the same start point are emitted simultaneously. The new version covers a real breakage noticed from diagnostics produced by rust-analyzer.
Selection highlights are already known to be satisfy all of the invariants in syntax::merge so syntax::span_iter is pure overhead. This change introduces another iterator with a naive flat_map implementation that splits Spans into HighlightEvents without checking if any spans are overlapping or sorted. We wrap the iterator in a new type to avoid any performance penalty of a `Box<dyn Iterator<Item = HighlightEvent>>` (which uses dynamic dispatch).
This change covers an edge-case that can arise in the SpanIter from subslicing spans. Spans which are subsliced may start after some other span later in the span Vec (see the new test case). To handle this, we re-sort the span Vec after subslicing any spans. Since the subslice only affects the starts of some spans and otherwise leaves the ordering unchanged, we accomplish the sort by finding any spans which should come before the subsliced spans and rotate them to come first. This preserves the ordering in the subsliced spans and in the rotated spans but ensures that the global ordering of the Vec remains correct after the subslice.
The span iterators and tests took up a significant portion of the syntax module but were fairly self-contained. In addition, the Span type was an informal (usize, std::ops::Range<usize>) which was a bit tricky to deal with: * the type was long to write out for all producers/consumers * std::ops::Range<usize> does not implement Copy Plus we don't end up using any Range methods except its Ord/PartialOrd implementations. Given its first-class usage for diagnostics and selections, it seems appropriate to separate it into its own struct type and module.
23f265b
to
a142391
Compare
I think I've debugged this pretty thoroughly but it's still a somewhat risky change: bugs in It's also probably going to be pretty impossible to review 😅. My desk is completely covered in number-line drawings of every possible scenario for overlapping ranges. I think I have test cases for most scenarios though and the ascii art is very pretty 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the SpanIter locally and found/fixed some bugs and implemented some performance optimizations.
I did the changes locally because it would be really annoying to do trough the GH webui.
I opened at PR against your fork (the-mikedavis#3) that I will refer to during my review comments.
The algorithm was a bit unapproachable at first.
However it looked like the perfect candidate for property testing, because you clearly defined some properties that the resulting HighlightEvents
must uphold.
I used proptest
to implement this in 924e92c.
I used proptest
instead of quickeck
, because it makes writing reduces much nicer which is relevant because the input data also needs to uphold some variants.I converted the one existing use of quickcheck to use proptest instead.
The resulting testt almost instantly reduced some crashes. It took me quite a while to fix this because there were actually multiple bugs (see review comments).
With the suggested changes from my branch I let the proptest run ~50 million different random range inputs and found no more test failures. To find the source of the bugs (and fix them) I also had to really get into the details of the algorithm.
Therefore both from a qualitative as well as quantitative point of view I am now very confident that the updated version works as intended.
I haven't looked at the merge implementation yet, I think that will also be a good fit for proptesting.
fn cmp(&self, other: &Self) -> std::cmp::Ordering { | ||
// Sort by range: ascending by start and then ascending by end for ties. | ||
if self.start == other.start { | ||
self.end.cmp(&other.end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be sorting the end in descending order (its specified like that in the documentation below and that also makes more sense because produces less events). Fixed in 447270a
// `i` starts on the index after the last subsliced span. | ||
loop { | ||
match self.spans.get(i) { | ||
Some(span) if span.start < intersect => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to compare span.end aswell to maintain the descending order of span ends (or even simpler just compare the entire span), in case a new span starts exactly at the intersect. Fixed in 276e58b
// If this span needs to be subsliced, consume the | ||
// left part of the subslice and leave the right. | ||
self.range_ends.push(intersect); | ||
span.start = intersect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intersect
may be larger than the spans end if multiple spans start at the same point.
This will lead to spans that have their end moved past their start.
Lead to an assertion failure in debug builds or messed up highlighting in release builds.
This is the main bug I found. Its fixed in f73a14e.
I added a pretty detailed description of the bug to the commit message (including an example) so look there for details. I also added unit tests to cover these situations.
Note: If the ranges are sorted in ascending order (as was the case originally) this bug is already fixed by 276e58b. However sorting in descending order will emit less events so I think its preferable. Furthermore with this change the entire span portioning can be split into a separate function (done in c64cf12) which I also see as a plus
Edit: Discard the Note above note this is not true. Ascending order would still need a (different) fix for this edgecase. Descending order looks at the longest span first and moves the start of the span past the end which leads to easy to detect assertions failures. In ascending order the algorithm instead does not detect the interception at all and treats all spans at the same position as tough they all end before any interception would occur. This produces incorrect Highlight Events as a result. However this is currently not caught by property testing because all invariants still hold. Last time I look at ascending order I was only using property testing and hence taught it was fixed. But the unit tests I have added for this edge-case since should fail for ascending order too (I think).
// Ensure range-ends are sorted ascending. Ranges which start at the | ||
// same point may be in descending order because of the assumed | ||
// sort-order of input ranges. | ||
self.range_ends.sort_unstable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting range_ends
in reverse (meaning that the smallest values are at the end) allows some pretty nice performance optimizations by replacing the retain
above with a simple pop off the end.
Done in c64cf12
|
||
/// Converts a Vec of spans into an [Iterator] over [HighlightEvent]s | ||
/// | ||
/// This implementation does not resolve overlapping spans. Zero-width spans are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero-width spans were actually not eliminated but it was trivial to implement. (See 36d44f5)
This approach adds a lot of complexity which ends up not being necessary: diagnostics can be handled as a special case and there isn't currently another producer of Plus this branch has some rendering bugs when there are overlaps. We can revisit the |
Currently, the HighlightEvent merge function overlays spans (
Vec<(usize, std::ops::Range<usize>)>
on an Iterator of HighlightEvent. The current merge doesn't handle overlapping spans well though, specifically with LSP diagnostics:This merge function takes two HighlightEvent Iterators which may themselves have overlapping highlights and produces a new Iterator with highlights from
right
overlaid onleft
. Also included is a new Iterator that properly subslices overlapping ranges in aVec<(usize, std::ops::Range<usize>)
which can be passed to themerge
function and satisfies all ofmerge
's invariants and assumptions.The new merge is more complicated and it looks like we get dinged for the extra
VecDeque
s but otherwise the performance looks comparable. I couldn't find any noticeable changes in the flamegraph except for two new tiny slivers forVecDeque::extend
.Fixes #2248
I rebased #2857 on this and it works well and actually the types work out much nicer than that branch's merge function.