Improve getting non-overlapping rectangles from RTree #42399

knopp · 2023-05-29T13:25:28Z

Fixes flutter/flutter#116070
Fixes flutter/flutter#126202

Introduces DlRegion class which implements subset of SkRegion required to get non-overlapping rectangles from region.

The implementation is different and faster than SkRegion for this particular use-case (display_list_region_benchmarks):

Edit: Updated benchmark to latest revision and natively (initial run went through rosetta)

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny          616 us          616 us          908
BM_RegionBenchmarkSkRegion/Tiny        70559 us        70557 us           10
BM_RegionBenchmarkDlRegion/Small        1315 us         1314 us          537
BM_RegionBenchmarkSkRegion/Small      121736 us       121717 us            6
BM_RegionBenchmarkDlRegion/Medium       1079 us         1079 us          650
BM_RegionBenchmarkSkRegion/Medium      22039 us        22035 us           32
BM_RegionBenchmarkDlRegion/Large         399 us          399 us         1763
BM_RegionBenchmarkSkRegion/Large        1510 us         1510 us          466

Pre-launch Checklist

I read the Contributor Guide and followed the process outlined there for submitting PRs.
I read the Tree Hygiene wiki page, which explains my responsibilities.
I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
I listed at least one issue that this PR fixes in the description above.
I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
I updated/added relevant documentation (doc comments with ///).
I signed the CLA.
All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

flar

It's late and I just got started on looking at this, but I wanted to hit send before I turned in.

I'd love to get rid of flow/rtree, but it is still used in a couple of places that need to be taught to use Dl classes...

flar · 2023-05-30T10:29:59Z

display_list/geometry/dl_region.h

+
+ private:
+  struct Span;
+  struct SpanLine;


Span and SpanLine have very simple definitions (ignoring the methods on SpanLine which can just be declared here for simplicity). Would it be better to define them here rather than forward reference them?

I prefer keeping the header files as terse as possible, but that's just personal preference. I don't mind removing the forward declaration if you think it's better.

The style guide recommends against them and provides some reasoning:
https://google.github.io/styleguide/cppguide.html#Forward_Declarations

In this case, is there a potential danger in that the constructor allows stack allocation of the object which has a vector of a forwardly-defined type in it? Maybe that is new as of the change to bulk load rectangles, though, as I see you've already inlined their definitions.

display_list/geometry/dl_region.cc

flar · 2023-05-30T11:23:52Z

display_list/geometry/dl_region.cc

+}
+
+std::vector<SkIRect> DlRegion::getRects(bool deband) const {
+  std::vector<SkIRect> rects;


I think the algorithm below only works as long as rects remains sorted by bottom coordinate? Is that worth mentioning here in a comment.

New rectangles are inserted at the end of the rects vector and they are only ever grown upward to absorb a previous rect so they can never violate the bottom ordering.

Just some things that I had to realize about the code before I could see if it was working as intended.

Yes. Rects are stored in SpanLines. Each span line contains rects with same top and bottom coordinates. Span lines are non overlapping and always kept sorted, which has the added benefit of being able to use binary search (upper_bound) when looking for initial span line in addRect.

display_list/geometry/dl_region.cc

jonahwilliams · 2023-05-30T16:36:43Z

Does this improve the behavior of RTree for issues like flutter/flutter#126202 ?

knopp · 2023-05-30T17:53:59Z

Does this improve the behavior of RTree for issues like flutter/flutter#126202 ?

It does. I just tested the example, DlRTree completely disappears from the profiler :)

knopp · 2023-05-30T18:07:25Z

I'm not sure why the clang_tidy checks are timing out. It only seems to be with this PR, regardless of whether it's rebased or not.

jonahwilliams · 2023-05-30T18:09:08Z

I think we're having some infra issues right now, I'm seeing this across builds. Also woot!

knopp · 2023-05-30T19:28:03Z

Actually, I found it. But at 4.2% for 5000 rectangles it's a decent improvement nonetheless.

For this particular call site we could chose not to deband the rectangles which could shave off some of the 1.3% spent in getRects at the expense of DisplayListBuilder::DrawDisplayList having to deal with more rectangles.

Edit: Made debanding optional and got it down to 3.4%.

flar

I've read through the algorithm pretty well and it looks fine to me.

But, I wonder if there is any value in having Region be mutable and simultaneously able to accept new rectangles or give out lists? Would a Builder paradigm work better, perhaps combined with a DlRegion::Make(vector of rectangles)?

The reason that this might be better is that the region object itself would be immutable and easier to share when needed and the algorithm could sort the provided rectangles (into a new vector perhaps) by starting y then starting x to make insertion much easier and faster.

display_list/geometry/dl_region.cc

knopp · 2023-06-01T12:09:52Z

But, I wonder if there is any value in having Region be mutable and simultaneously able to accept new rectangles or give out lists? Would a Builder paradigm work better, perhaps combined with a DlRegion::Make(vector of rectangles)?

Probably not for any use-case that we have for this. It would be worth doing if bulk loading is faster.

The reason that this might be better is that the region object itself would be immutable and easier to share when needed and the algorithm could sort the provided rectangles (into a new vector perhaps) by starting y then starting x to make insertion much easier and faster.

This would be interesting to profile. Right now there is a binary search for each added rectangle. But we're only searching through span lines, which is less than number of rectangles.

If presorted, we can know that we don't need to touch spans above current rect top. It might also make difference for adding span to spanline if the rects are sorted horizontally.

This would be a significantly different algorithm. But maybe worth implementing and benchmarking against the one in PR.

knopp · 2023-06-01T12:49:07Z

Just pre-sorting the rects without modifying anything else I already got faster result in first two benchmarks (that's including std::sort). This is definitely worth exploring.

flar · 2023-06-01T12:55:13Z

The way I've done it in past lives is to start with y-then-x sorted rectangles and to keep an active list of rectangles that are currently being sliced off into lines. New rectangles are added to the active list as you reach their starting Y and rectangles are pruned from that list as you reach their ending Y. The final list of lines are created in order as you go with their full complement of spans. The active list is kept in X order to make slicing the spans for each line simple. Since the rectangles were originally sub-sorted by X, they often come in in order so there is an insertion merge of new rectangles into the active list, but no need to sort the active rectangles other than that.

Using a list of pointers to the original bulk rectangle objects makes the sorting go faster with simpler moves and inserts of pointers. Then the active list of rectangles can actually be maintained in place in the sorted list of pointers, slowly consuming it as the active list shrinks and grows and is copied into Line objects in the final region.

knopp · 2023-06-02T22:20:02Z

So I tried this implementation: https://gist.github.com/knopp/1029e85421aabf4ec13a644d10ebc515

It does one pass through a sorted list of rectangles, but in pathological cases where there is lot of large overdrawn rectangles (and thus many overlapping rectangles in each span) it performs much worse than the original algorithm. That's likely because in the original algorithm each large rectangle can possibly reduce the number of span lines and spans, reducing the work needed for subsequent rectangles.

The best result I get so far is a variation of original algorithm, where the rectangles are presorted by Y axis, which removes the need of binary search for each rectangles: Here are the benchmarks for reference:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny               1348 us         1344 us          492
BM_RegionBenchmarkDlRegion2/Tiny               712 us          712 us          981
BM_RegionBenchmarkDlRegionSorted/Tiny          547 us          547 us         1285
BM_RegionBenchmarkSkRegion/Tiny              70130 us        70090 us           10
BM_RegionBenchmarkDlRegion/Small              3030 us         3030 us          232
BM_RegionBenchmarkDlRegion2/Small             1668 us         1668 us          423
BM_RegionBenchmarkDlRegionSorted/Small        1246 us         1246 us          558
BM_RegionBenchmarkSkRegion/Small            121010 us       120951 us            6
BM_RegionBenchmarkDlRegion/Medium             1434 us         1434 us          488
BM_RegionBenchmarkDlRegion2/Medium            2323 us         2323 us          302
BM_RegionBenchmarkDlRegionSorted/Medium       1049 us         1049 us          670
BM_RegionBenchmarkSkRegion/Medium            21712 us        21712 us           32
BM_RegionBenchmarkDlRegion/Large               395 us          395 us         1791
BM_RegionBenchmarkDlRegion2/Large             4146 us         4146 us          169
BM_RegionBenchmarkDlRegionSorted/Large         384 us          384 us         1829
BM_RegionBenchmarkSkRegion/Large              1484 us         1484 us          474

DlRegion is the original region implementation.
DlRegion2 is the one from gist - it regresses with medium/large rectangles
DlRegionSorted is the sorted variation of original region I'm going to go with. Has best results across all test data.

flar · 2023-06-03T00:17:15Z

So I tried this implementation: https://gist.github.com/knopp/1029e85421aabf4ec13a644d10ebc515

It does one pass through a sorted list of rectangles, but in pathological cases where there is lot of large overdrawn rectangles (and thus many overlapping rectangles in each span) it performs much worse than the original algorithm. That's likely because in the original algorithm each large rectangle can possibly reduce the number of span lines and spans, reducing the work needed for subsequent rectangles.

There is a lot of work to manage the spans. In my vision, the spans are already known on each line and it is just a matter of slicing them off into a SpanLine. I'll see if I can mock up what I'm talking about.

knopp · 2023-06-03T08:42:37Z

There is a lot of work to manage the spans. In my vision, the spans are already known on each line and it is just a matter of slicing them off into a SpanLine. I'll see if I can mock up what I'm talking about.

The problem there is that even if I completely commented out the body of DlRegion2::SpanLine::insertSpan it was still significantly slower than the original algorithm in some cases. My assumption is that continuous merging of spans and span lines reduces the work needed to add each subsequent rectangle and for medium / large size rects that is a significant amount.

If you could put together something that we can benchmark against current implementation that would be great. It is possible that I've been looking at the current algorithm for way too long and I am missing something.

flar · 2023-06-05T18:30:10Z

You're good to go then!

…ter/engine#42399)

…128282) flutter/engine@7f12e34...220ece4 2023-06-05 zanderso@users.noreply.github.com Ensure Dart roll script picks up udpates to dart-sdk/tools (flutter/engine#42576) 2023-06-05 chillers@google.com Run dependabot in off peak hours (flutter/engine#42572) 2023-06-05 flar@google.com fix bounds of inverted rendered rectangles (flutter/engine#42556) 2023-06-05 skia-flutter-autoroll@skia.org Roll Fuchsia Mac SDK from fgfgAhpxFpse7Xi4i... to tCxDcZ3yi0rnKGVHt... (flutter/engine#42574) 2023-06-05 kjlubick@users.noreply.github.com Add missing #include of SkCFObject.h (flutter/engine#42573) 2023-06-05 matej.knopp@gmail.com Improve getting non-overlapping rectangles from RTree (flutter/engine#42399) Also rolling transitive DEPS: fuchsia/sdk/core/mac-amd64 from fgfgAhpxFpse to tCxDcZ3yi0rn If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC jonahwilliams@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

flar · 2023-06-07T22:27:47Z

I played around with a more deterministic algorithm and ran into the same issues as you did - that it is faster for the smaller rect benchmarks, but gets problematic as the rectangles increase size. Really it is related to how many overlapping rectangles there are in the active list of rects with the smaller sizes having a more limited list of 10-20 active rects per loop (most of which are not overlapping), up to 300 active rects (which produce only 1-2 spans between them) for the large size. Some improvement on how to manage the generation of spans from active rects would help - perhaps a mini RTree of active spans with the top level being the actual spans that will get added for each line?

My gist: https://gist.github.com/flar/8ebcde27481bbedb5225b5577490bfcc

The results on a profile build:

-----------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations
-----------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny           595 us          593 us         1162
BM_RegionBenchmarkDlRegion2/Tiny          348 us          347 us         2012  (42% faster)
BM_RegionBenchmarkSkRegion/Tiny         73261 us        73058 us            9
BM_RegionBenchmarkDlRegion/Small         1227 us         1223 us          571
BM_RegionBenchmarkDlRegion2/Small         626 us          625 us         1120  (49% faster)
BM_RegionBenchmarkSkRegion/Small       124687 us       124355 us            6
BM_RegionBenchmarkDlRegion/Medium        1091 us         1088 us          641
BM_RegionBenchmarkDlRegion2/Medium        724 us          723 us          962  (34% faster)
BM_RegionBenchmarkSkRegion/Medium       22905 us        22844 us           30
BM_RegionBenchmarkDlRegion/Large          396 us          396 us         1767
BM_RegionBenchmarkDlRegion2/Large        1666 us         1662 us          421  (320% slower)
BM_RegionBenchmarkSkRegion/Large         1543 us         1540 us          455

One thing that concerns me is that the rectangles used for these benchmarks aren't really representative of what we see in a Flutter app. Overlapping rectangles are seen given the rendering of backgrounds of widgets covered by a few rendering ops in the widget content, but not nearly as bad as 300 active rects per span line. Still, these benchmarks do show the lack of scalability of this algorithm, but if we throw away a 42-49% improvement for the common case (and I'm not saying that Tiny and Small are representative either) then it would be nice to fix the quadratic behavior of the faster algorithm and even accept a (somewhat smaller) regression for the non-representative "Large" benchmark case to get faster common case behavior.

The stats outputs for the various benchmark cases (that use 2000 input rects) are:

Benchmark case	Spanlines	avg spans/line	avg active rects/loop
Tiny	2500	7.626	7.89524
Small	2353	17.9949	25.748
Medium	272	2.875	97.7298
Large	39	1.17949	323.6

knopp · 2023-06-08T07:25:42Z

I think optimizing for many smaller rectangles is more likely to cover possible edge cases than optimizing for large rectangles (see flutter/flutter#126202).

I'm going to play with your implementation a bit but so far I like. So I'd say that improvements in small/tiny/medium are more important than regression in large, as long the regression isn't catastrophic :)

knopp · 2023-06-08T13:02:20Z

@flar, I think the tradeoff that your addRects implementation makes is clearly better. I replaced the old implementation with it in #42620 and it works really well. With custom SpanBuffer that keeps all span data together the numbers are even better:

main branch:

BM_RegionBenchmarkDlRegion/Tiny          614 us          613 us         1026
BM_RegionBenchmarkDlRegion/Small        1326 us         1325 us          521
BM_RegionBenchmarkDlRegion/Medium       1049 us         1049 us          669
BM_RegionBenchmarkDlRegion/Large         381 us          381 us         1808

new code (your addRect + custom SpanBuffer)

BM_DlRegion_AddRects/Tiny                      202 us          202 us         3487
BM_DlRegion_AddRects/Small                     505 us          503 us         1349
BM_DlRegion_AddRects/Medium                    491 us          491 us         1414
BM_DlRegion_AddRects/Large                    1346 us         1346 us          516

I don't think the regression of /Large is going to be a problem in practice.

Do you want to make a PR for this, or should we just keep it in #42620 ?

flar · 2023-06-08T18:03:26Z

I'll take a look at #42620.

Another thing to consider is what the callers do with this info.

iOS will join all of the rects into 1 if there are more than 2 of them. Android will always join them all into 1. At least Fuchsia transfers the answer into its own version of a region.

So, for Android it could do with a much much faster "just get me the bounds of all ops that intersect this rect" function that avoids the old quadratic behavior and even the new region construction costs". iOS doesn't really find much more use than that. Fuchsia would probably prefer to not deband the rects (hard to tell how that interacts with their HitRegion construction).

Can iOS or Android be upgrading to handing off a region instead of processing the rects individually?

knopp · 2023-06-08T18:15:47Z

Returning DlRegion instead of the rects from RTree would be beneficial. I'm currently implementing platform views on macOS. Part of it is optimizing the number of layers used for platform views. Thought some of the RTree information will be used for hit testing as well. For embedder API eventually we'll pass the overlay regions as array of rectangles, but that's after some processing.

The way iOS handles overlay regions right now is quite unfortunate. But if it eventually gets migrated to embedder API the both the platform views and overlays would get much cheaper.

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 3 times, most recently from 3422516 to 35078a2 Compare May 29, 2023 13:30

knopp changed the title ~~WIP: Improve getting non-overlapping rectangles from RTree~~ Improve getting non-overlapping rectangles from RTree May 29, 2023

flutter-dashboard bot added the platform-ios label May 29, 2023

knopp requested a review from flar May 29, 2023 19:29

knopp removed the platform-ios label May 29, 2023

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 3 times, most recently from b1b1427 to 6465ae8 Compare May 30, 2023 09:25

flar reviewed May 30, 2023

View reviewed changes

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from 423ec35 to 81e3805 Compare May 30, 2023 19:35

flar reviewed Jun 1, 2023

View reviewed changes

flutter-dashboard bot added the platform-ios label Jun 1, 2023

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from 9edd122 to 64c8151 Compare June 2, 2023 23:17

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from ea9f47d to 2774ba3 Compare June 4, 2023 16:24

knopp added 13 commits June 5, 2023 19:37

Rename i1, i2 to y1, y2

2233837

Fixup comments

4e6deb5

Skip rectangles from current span line when debanding

2a5f14d

Opt out debanding in DisplayListBuilder::DrawDisplayList

80e722b

Benchmarks with different implementation

36892e2

Use DlRegionSorted as the algorithm

856bc35

Restore missing constructor and destructor

324e040

Change DlRegion API usage to closer resemble DlRTree

46da2d1

Restore random seed

e2dd021

Free the spanvec pool as soon as we're done with it.

199f43b

Simplify dirty_start/dirty_end checks

7a852a4

Replace condition with DCHECK, replace asserts with DCHECK

9dba789

Adjust previous_span_end when erasing rectangle from previous span

794ce5c

knopp force-pushed the rtree_improve_non_overlapping_rectangles branch from fa996f5 to 794ce5c Compare June 5, 2023 17:37

knopp merged commit 042ebae into flutter:main Jun 5, 2023
30 checks passed

knopp deleted the rtree_improve_non_overlapping_rectangles branch June 5, 2023 18:41

jonahwilliams mentioned this pull request Jun 5, 2023

[Impeller] TextField widget, long text causes severe frame drop. flutter/flutter#126562

Closed

2 tasks

engine-flutter-autoroll mentioned this pull request Jun 5, 2023

Roll Flutter Engine from 7f12e3497428 to 042ebae586c1 (1 revision) flutter/flutter#128272

Closed

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023

042ebae58 Improve getting non-overlapping rectangles from RTree (flut…

70bc62a

…ter/engine#42399)

engine-flutter-autoroll mentioned this pull request Jun 5, 2023

Roll Flutter Engine from 7f12e3497428 to 5c35213ffa33 (4 revisions) flutter/flutter#128275

Closed

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023

042ebae58 Improve getting non-overlapping rectangles from RTree (flut…

f879b58

…ter/engine#42399)

engine-flutter-autoroll mentioned this pull request Jun 5, 2023

Roll Flutter Engine from 7f12e3497428 to 220ece4d9faa (6 revisions) flutter/flutter#128282

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023

042ebae58 Improve getting non-overlapping rectangles from RTree (flut…

c88512e

…ter/engine#42399)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve getting non-overlapping rectangles from RTree #42399

Improve getting non-overlapping rectangles from RTree #42399

knopp commented May 29, 2023 •

edited

flar left a comment

flar May 30, 2023

knopp May 30, 2023 •

edited

flar Jun 5, 2023 •

edited

flar May 30, 2023

knopp May 30, 2023 •

edited

jonahwilliams commented May 30, 2023

knopp commented May 30, 2023 •

edited

knopp commented May 30, 2023

jonahwilliams commented May 30, 2023

knopp commented May 30, 2023 •

edited

flar left a comment

knopp commented Jun 1, 2023

knopp commented Jun 1, 2023

flar commented Jun 1, 2023 •

edited

knopp commented Jun 2, 2023 •

edited

flar commented Jun 3, 2023

knopp commented Jun 3, 2023 •

edited

flar commented Jun 5, 2023

flar commented Jun 7, 2023 •

edited

knopp commented Jun 8, 2023

knopp commented Jun 8, 2023

flar commented Jun 8, 2023 •

edited

knopp commented Jun 8, 2023

Improve getting non-overlapping rectangles from RTree #42399

Improve getting non-overlapping rectangles from RTree #42399

Conversation

knopp commented May 29, 2023 • edited

Pre-launch Checklist

flar left a comment

Choose a reason for hiding this comment

flar May 30, 2023

Choose a reason for hiding this comment

knopp May 30, 2023 • edited

Choose a reason for hiding this comment

flar Jun 5, 2023 • edited

Choose a reason for hiding this comment

flar May 30, 2023

Choose a reason for hiding this comment

knopp May 30, 2023 • edited

Choose a reason for hiding this comment

jonahwilliams commented May 30, 2023

knopp commented May 30, 2023 • edited

knopp commented May 30, 2023

jonahwilliams commented May 30, 2023

knopp commented May 30, 2023 • edited

flar left a comment

Choose a reason for hiding this comment

knopp commented Jun 1, 2023

knopp commented Jun 1, 2023

flar commented Jun 1, 2023 • edited

knopp commented Jun 2, 2023 • edited

flar commented Jun 3, 2023

knopp commented Jun 3, 2023 • edited

flar commented Jun 5, 2023

flar commented Jun 7, 2023 • edited

knopp commented Jun 8, 2023

knopp commented Jun 8, 2023

flar commented Jun 8, 2023 • edited

knopp commented Jun 8, 2023

knopp commented May 29, 2023 •

edited

knopp May 30, 2023 •

edited

flar Jun 5, 2023 •

edited

knopp May 30, 2023 •

edited

knopp commented May 30, 2023 •

edited

knopp commented May 30, 2023 •

edited

flar commented Jun 1, 2023 •

edited

knopp commented Jun 2, 2023 •

edited

knopp commented Jun 3, 2023 •

edited

flar commented Jun 7, 2023 •

edited

flar commented Jun 8, 2023 •

edited