Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve getting non-overlapping rectangles from RTree #42399

Merged
merged 21 commits into from Jun 5, 2023

Conversation

knopp
Copy link
Member

@knopp knopp commented May 29, 2023

Fixes flutter/flutter#116070
Fixes flutter/flutter#126202

Introduces DlRegion class which implements subset of SkRegion required to get non-overlapping rectangles from region.

The implementation is different and faster than SkRegion for this particular use-case (display_list_region_benchmarks):

Edit: Updated benchmark to latest revision and natively (initial run went through rosetta)

----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny          616 us          616 us          908
BM_RegionBenchmarkSkRegion/Tiny        70559 us        70557 us           10
BM_RegionBenchmarkDlRegion/Small        1315 us         1314 us          537
BM_RegionBenchmarkSkRegion/Small      121736 us       121717 us            6
BM_RegionBenchmarkDlRegion/Medium       1079 us         1079 us          650
BM_RegionBenchmarkSkRegion/Medium      22039 us        22035 us           32
BM_RegionBenchmarkDlRegion/Large         399 us          399 us         1763
BM_RegionBenchmarkSkRegion/Large        1510 us         1510 us          466

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 3 times, most recently from 3422516 to 35078a2 Compare May 29, 2023 13:30
@knopp knopp changed the title WIP: Improve getting non-overlapping rectangles from RTree Improve getting non-overlapping rectangles from RTree May 29, 2023
@knopp knopp requested a review from flar May 29, 2023 19:29
@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 3 times, most recently from b1b1427 to 6465ae8 Compare May 30, 2023 09:25
Copy link
Contributor

@flar flar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's late and I just got started on looking at this, but I wanted to hit send before I turned in.

I'd love to get rid of flow/rtree, but it is still used in a couple of places that need to be taught to use Dl classes...


private:
struct Span;
struct SpanLine;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span and SpanLine have very simple definitions (ignoring the methods on SpanLine which can just be declared here for simplicity). Would it be better to define them here rather than forward reference them?

Copy link
Member Author

@knopp knopp May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping the header files as terse as possible, but that's just personal preference. I don't mind removing the forward declaration if you think it's better.

Copy link
Contributor

@flar flar Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style guide recommends against them and provides some reasoning:
https://google.github.io/styleguide/cppguide.html#Forward_Declarations

In this case, is there a potential danger in that the constructor allows stack allocation of the object which has a vector of a forwardly-defined type in it? Maybe that is new as of the change to bulk load rectangles, though, as I see you've already inlined their definitions.

display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
}

std::vector<SkIRect> DlRegion::getRects(bool deband) const {
std::vector<SkIRect> rects;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the algorithm below only works as long as rects remains sorted by bottom coordinate? Is that worth mentioning here in a comment.

New rectangles are inserted at the end of the rects vector and they are only ever grown upward to absorb a previous rect so they can never violate the bottom ordering.

Just some things that I had to realize about the code before I could see if it was working as intended.

Copy link
Member Author

@knopp knopp May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Rects are stored in SpanLines. Each span line contains rects with same top and bottom coordinates. Span lines are non overlapping and always kept sorted, which has the added benefit of being able to use binary search (upper_bound) when looking for initial span line in addRect.

display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
@jonahwilliams
Copy link
Member

Does this improve the behavior of RTree for issues like flutter/flutter#126202 ?

@knopp
Copy link
Member Author

knopp commented May 30, 2023

Does this improve the behavior of RTree for issues like flutter/flutter#126202 ?

It does. I just tested the example, DlRTree completely disappears from the profiler :)

Screenshot 2023-05-30 at 19 52 40

@knopp
Copy link
Member Author

knopp commented May 30, 2023

I'm not sure why the clang_tidy checks are timing out. It only seems to be with this PR, regardless of whether it's rebased or not.

@jonahwilliams
Copy link
Member

I think we're having some infra issues right now, I'm seeing this across builds. Also woot!

@knopp
Copy link
Member Author

knopp commented May 30, 2023

Actually, I found it. But at 4.2% for 5000 rectangles it's a decent improvement nonetheless.

Screenshot 2023-05-30 at 21 19 49

For this particular call site we could chose not to deband the rectangles which could shave off some of the 1.3% spent in getRects at the expense of DisplayListBuilder::DrawDisplayList having to deal with more rectangles.

Edit: Made debanding optional and got it down to 3.4%.

Screenshot 2023-05-30 at 21 50 53

@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from 423ec35 to 81e3805 Compare May 30, 2023 19:35
Copy link
Contributor

@flar flar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through the algorithm pretty well and it looks fine to me.

But, I wonder if there is any value in having Region be mutable and simultaneously able to accept new rectangles or give out lists? Would a Builder paradigm work better, perhaps combined with a DlRegion::Make(vector of rectangles)?

The reason that this might be better is that the region object itself would be immutable and easier to share when needed and the algorithm could sort the provided rectangles (into a new vector perhaps) by starting y then starting x to make insertion much easier and faster.

display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Show resolved Hide resolved
display_list/geometry/dl_region.cc Show resolved Hide resolved
display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Outdated Show resolved Hide resolved
display_list/geometry/dl_region.cc Show resolved Hide resolved
@knopp
Copy link
Member Author

knopp commented Jun 1, 2023

But, I wonder if there is any value in having Region be mutable and simultaneously able to accept new rectangles or give out lists? Would a Builder paradigm work better, perhaps combined with a DlRegion::Make(vector of rectangles)?

Probably not for any use-case that we have for this. It would be worth doing if bulk loading is faster.

The reason that this might be better is that the region object itself would be immutable and easier to share when needed and the algorithm could sort the provided rectangles (into a new vector perhaps) by starting y then starting x to make insertion much easier and faster.

This would be interesting to profile. Right now there is a binary search for each added rectangle. But we're only searching through span lines, which is less than number of rectangles.

If presorted, we can know that we don't need to touch spans above current rect top. It might also make difference for adding span to spanline if the rects are sorted horizontally.

This would be a significantly different algorithm. But maybe worth implementing and benchmarking against the one in PR.

@knopp
Copy link
Member Author

knopp commented Jun 1, 2023

Just pre-sorting the rects without modifying anything else I already got faster result in first two benchmarks (that's including std::sort). This is definitely worth exploring.

@flar
Copy link
Contributor

flar commented Jun 1, 2023

The way I've done it in past lives is to start with y-then-x sorted rectangles and to keep an active list of rectangles that are currently being sliced off into lines. New rectangles are added to the active list as you reach their starting Y and rectangles are pruned from that list as you reach their ending Y. The final list of lines are created in order as you go with their full complement of spans. The active list is kept in X order to make slicing the spans for each line simple. Since the rectangles were originally sub-sorted by X, they often come in in order so there is an insertion merge of new rectangles into the active list, but no need to sort the active rectangles other than that.

Using a list of pointers to the original bulk rectangle objects makes the sorting go faster with simpler moves and inserts of pointers. Then the active list of rectangles can actually be maintained in place in the sorted list of pointers, slowly consuming it as the active list shrinks and grows and is copied into Line objects in the final region.

@knopp
Copy link
Member Author

knopp commented Jun 2, 2023

So I tried this implementation: https://gist.github.com/knopp/1029e85421aabf4ec13a644d10ebc515

It does one pass through a sorted list of rectangles, but in pathological cases where there is lot of large overdrawn rectangles (and thus many overlapping rectangles in each span) it performs much worse than the original algorithm. That's likely because in the original algorithm each large rectangle can possibly reduce the number of span lines and spans, reducing the work needed for subsequent rectangles.

The best result I get so far is a variation of original algorithm, where the rectangles are presorted by Y axis, which removes the need of binary search for each rectangles: Here are the benchmarks for reference:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny               1348 us         1344 us          492
BM_RegionBenchmarkDlRegion2/Tiny               712 us          712 us          981
BM_RegionBenchmarkDlRegionSorted/Tiny          547 us          547 us         1285
BM_RegionBenchmarkSkRegion/Tiny              70130 us        70090 us           10
BM_RegionBenchmarkDlRegion/Small              3030 us         3030 us          232
BM_RegionBenchmarkDlRegion2/Small             1668 us         1668 us          423
BM_RegionBenchmarkDlRegionSorted/Small        1246 us         1246 us          558
BM_RegionBenchmarkSkRegion/Small            121010 us       120951 us            6
BM_RegionBenchmarkDlRegion/Medium             1434 us         1434 us          488
BM_RegionBenchmarkDlRegion2/Medium            2323 us         2323 us          302
BM_RegionBenchmarkDlRegionSorted/Medium       1049 us         1049 us          670
BM_RegionBenchmarkSkRegion/Medium            21712 us        21712 us           32
BM_RegionBenchmarkDlRegion/Large               395 us          395 us         1791
BM_RegionBenchmarkDlRegion2/Large             4146 us         4146 us          169
BM_RegionBenchmarkDlRegionSorted/Large         384 us          384 us         1829
BM_RegionBenchmarkSkRegion/Large              1484 us         1484 us          474

DlRegion is the original region implementation.
DlRegion2 is the one from gist - it regresses with medium/large rectangles
DlRegionSorted is the sorted variation of original region I'm going to go with. Has best results across all test data.

@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from 9edd122 to 64c8151 Compare June 2, 2023 23:17
@flar
Copy link
Contributor

flar commented Jun 3, 2023

So I tried this implementation: https://gist.github.com/knopp/1029e85421aabf4ec13a644d10ebc515

It does one pass through a sorted list of rectangles, but in pathological cases where there is lot of large overdrawn rectangles (and thus many overlapping rectangles in each span) it performs much worse than the original algorithm. That's likely because in the original algorithm each large rectangle can possibly reduce the number of span lines and spans, reducing the work needed for subsequent rectangles.

There is a lot of work to manage the spans. In my vision, the spans are already known on each line and it is just a matter of slicing them off into a SpanLine. I'll see if I can mock up what I'm talking about.

@knopp
Copy link
Member Author

knopp commented Jun 3, 2023

There is a lot of work to manage the spans. In my vision, the spans are already known on each line and it is just a matter of slicing them off into a SpanLine. I'll see if I can mock up what I'm talking about.

The problem there is that even if I completely commented out the body of DlRegion2::SpanLine::insertSpan it was still significantly slower than the original algorithm in some cases. My assumption is that continuous merging of spans and span lines reduces the work needed to add each subsequent rectangle and for medium / large size rects that is a significant amount.

If you could put together something that we can benchmark against current implementation that would be great. It is possible that I've been looking at the current algorithm for way too long and I am missing something.

@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch 2 times, most recently from ea9f47d to 2774ba3 Compare June 4, 2023 16:24
@knopp knopp force-pushed the rtree_improve_non_overlapping_rectangles branch from fa996f5 to 794ce5c Compare June 5, 2023 17:37
@flar
Copy link
Contributor

flar commented Jun 5, 2023

You're good to go then!

@knopp knopp merged commit 042ebae into flutter:main Jun 5, 2023
30 checks passed
@knopp knopp deleted the rtree_improve_non_overlapping_rectangles branch June 5, 2023 18:41
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 5, 2023
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Jun 5, 2023
…128282)

flutter/engine@7f12e34...220ece4

2023-06-05 zanderso@users.noreply.github.com Ensure Dart roll script picks up udpates to dart-sdk/tools (flutter/engine#42576)
2023-06-05 chillers@google.com Run dependabot in off peak hours (flutter/engine#42572)
2023-06-05 flar@google.com fix bounds of inverted rendered rectangles (flutter/engine#42556)
2023-06-05 skia-flutter-autoroll@skia.org Roll Fuchsia Mac SDK from fgfgAhpxFpse7Xi4i... to tCxDcZ3yi0rnKGVHt... (flutter/engine#42574)
2023-06-05 kjlubick@users.noreply.github.com Add missing #include of SkCFObject.h (flutter/engine#42573)
2023-06-05 matej.knopp@gmail.com Improve getting non-overlapping rectangles from RTree (flutter/engine#42399)

Also rolling transitive DEPS:
  fuchsia/sdk/core/mac-amd64 from fgfgAhpxFpse to tCxDcZ3yi0rn

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC jonahwilliams@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
@flar
Copy link
Contributor

flar commented Jun 7, 2023

I played around with a more deterministic algorithm and ran into the same issues as you did - that it is faster for the smaller rect benchmarks, but gets problematic as the rectangles increase size. Really it is related to how many overlapping rectangles there are in the active list of rects with the smaller sizes having a more limited list of 10-20 active rects per loop (most of which are not overlapping), up to 300 active rects (which produce only 1-2 spans between them) for the large size. Some improvement on how to manage the generation of spans from active rects would help - perhaps a mini RTree of active spans with the top level being the actual spans that will get added for each line?

My gist: https://gist.github.com/flar/8ebcde27481bbedb5225b5577490bfcc

The results on a profile build:

-----------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations
-----------------------------------------------------------------------------
BM_RegionBenchmarkDlRegion/Tiny           595 us          593 us         1162
BM_RegionBenchmarkDlRegion2/Tiny          348 us          347 us         2012  (42% faster)
BM_RegionBenchmarkSkRegion/Tiny         73261 us        73058 us            9
BM_RegionBenchmarkDlRegion/Small         1227 us         1223 us          571
BM_RegionBenchmarkDlRegion2/Small         626 us          625 us         1120  (49% faster)
BM_RegionBenchmarkSkRegion/Small       124687 us       124355 us            6
BM_RegionBenchmarkDlRegion/Medium        1091 us         1088 us          641
BM_RegionBenchmarkDlRegion2/Medium        724 us          723 us          962  (34% faster)
BM_RegionBenchmarkSkRegion/Medium       22905 us        22844 us           30
BM_RegionBenchmarkDlRegion/Large          396 us          396 us         1767
BM_RegionBenchmarkDlRegion2/Large        1666 us         1662 us          421  (320% slower)
BM_RegionBenchmarkSkRegion/Large         1543 us         1540 us          455

One thing that concerns me is that the rectangles used for these benchmarks aren't really representative of what we see in a Flutter app. Overlapping rectangles are seen given the rendering of backgrounds of widgets covered by a few rendering ops in the widget content, but not nearly as bad as 300 active rects per span line. Still, these benchmarks do show the lack of scalability of this algorithm, but if we throw away a 42-49% improvement for the common case (and I'm not saying that Tiny and Small are representative either) then it would be nice to fix the quadratic behavior of the faster algorithm and even accept a (somewhat smaller) regression for the non-representative "Large" benchmark case to get faster common case behavior.

The stats outputs for the various benchmark cases (that use 2000 input rects) are:

Benchmark case Spanlines avg spans/line avg active rects/loop
Tiny 2500 7.626 7.89524
Small 2353 17.9949 25.748
Medium 272 2.875 97.7298
Large 39 1.17949 323.6

@knopp
Copy link
Member Author

knopp commented Jun 8, 2023

I think optimizing for many smaller rectangles is more likely to cover possible edge cases than optimizing for large rectangles (see flutter/flutter#126202).

I'm going to play with your implementation a bit but so far I like. So I'd say that improvements in small/tiny/medium are more important than regression in large, as long the regression isn't catastrophic :)

@knopp
Copy link
Member Author

knopp commented Jun 8, 2023

@flar, I think the tradeoff that your addRects implementation makes is clearly better. I replaced the old implementation with it in #42620 and it works really well. With custom SpanBuffer that keeps all span data together the numbers are even better:

main branch:

BM_RegionBenchmarkDlRegion/Tiny          614 us          613 us         1026
BM_RegionBenchmarkDlRegion/Small        1326 us         1325 us          521
BM_RegionBenchmarkDlRegion/Medium       1049 us         1049 us          669
BM_RegionBenchmarkDlRegion/Large         381 us          381 us         1808

new code (your addRect + custom SpanBuffer)

BM_DlRegion_AddRects/Tiny                      202 us          202 us         3487
BM_DlRegion_AddRects/Small                     505 us          503 us         1349
BM_DlRegion_AddRects/Medium                    491 us          491 us         1414
BM_DlRegion_AddRects/Large                    1346 us         1346 us          516

I don't think the regression of /Large is going to be a problem in practice.

Do you want to make a PR for this, or should we just keep it in #42620 ?

@flar
Copy link
Contributor

flar commented Jun 8, 2023

I'll take a look at #42620.

Another thing to consider is what the callers do with this info.

iOS will join all of the rects into 1 if there are more than 2 of them. Android will always join them all into 1. At least Fuchsia transfers the answer into its own version of a region.

So, for Android it could do with a much much faster "just get me the bounds of all ops that intersect this rect" function that avoids the old quadratic behavior and even the new region construction costs". iOS doesn't really find much more use than that. Fuchsia would probably prefer to not deband the rects (hard to tell how that interacts with their HitRegion construction).

Can iOS or Android be upgrading to handing off a region instead of processing the rects individually?

@knopp
Copy link
Member Author

knopp commented Jun 8, 2023

Returning DlRegion instead of the rects from RTree would be beneficial. I'm currently implementing platform views on macOS. Part of it is optimizing the number of layers used for platform views. Thought some of the RTree information will be used for hit testing as well. For embedder API eventually we'll pass the overlay regions as array of rectangles, but that's after some processing.

The way iOS handles overlay regions right now is quite unfortunate. But if it eventually gets migrated to embedder API the both the platform views and overlays would get much cheaper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
3 participants