New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve getting non-overlapping rectangles from RTree #42399
Improve getting non-overlapping rectangles from RTree #42399
Conversation
3422516
to
35078a2
Compare
b1b1427
to
6465ae8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's late and I just got started on looking at this, but I wanted to hit send before I turned in.
I'd love to get rid of flow/rtree, but it is still used in a couple of places that need to be taught to use Dl classes...
display_list/geometry/dl_region.h
Outdated
|
||
private: | ||
struct Span; | ||
struct SpanLine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span and SpanLine have very simple definitions (ignoring the methods on SpanLine which can just be declared here for simplicity). Would it be better to define them here rather than forward reference them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping the header files as terse as possible, but that's just personal preference. I don't mind removing the forward declaration if you think it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style guide recommends against them and provides some reasoning:
https://google.github.io/styleguide/cppguide.html#Forward_Declarations
In this case, is there a potential danger in that the constructor allows stack allocation of the object which has a vector of a forwardly-defined type in it? Maybe that is new as of the change to bulk load rectangles, though, as I see you've already inlined their definitions.
} | ||
|
||
std::vector<SkIRect> DlRegion::getRects(bool deband) const { | ||
std::vector<SkIRect> rects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the algorithm below only works as long as rects remains sorted by bottom coordinate? Is that worth mentioning here in a comment.
New rectangles are inserted at the end of the rects vector and they are only ever grown upward to absorb a previous rect so they can never violate the bottom ordering.
Just some things that I had to realize about the code before I could see if it was working as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Rects are stored in SpanLines. Each span line contains rects with same top and bottom coordinates. Span lines are non overlapping and always kept sorted, which has the added benefit of being able to use binary search (upper_bound
) when looking for initial span line in addRect
.
Does this improve the behavior of RTree for issues like flutter/flutter#126202 ? |
It does. I just tested the example, DlRTree completely disappears from the profiler :) |
I'm not sure why the |
I think we're having some infra issues right now, I'm seeing this across builds. Also woot! |
423ec35
to
81e3805
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the algorithm pretty well and it looks fine to me.
But, I wonder if there is any value in having Region be mutable and simultaneously able to accept new rectangles or give out lists? Would a Builder paradigm work better, perhaps combined with a DlRegion::Make(vector of rectangles)
?
The reason that this might be better is that the region object itself would be immutable and easier to share when needed and the algorithm could sort the provided rectangles (into a new vector perhaps) by starting y then starting x to make insertion much easier and faster.
Probably not for any use-case that we have for this. It would be worth doing if bulk loading is faster.
This would be interesting to profile. Right now there is a binary search for each added rectangle. But we're only searching through span lines, which is less than number of rectangles. If presorted, we can know that we don't need to touch spans above current rect top. It might also make difference for adding span to spanline if the rects are sorted horizontally. This would be a significantly different algorithm. But maybe worth implementing and benchmarking against the one in PR. |
Just pre-sorting the rects without modifying anything else I already got faster result in first two benchmarks (that's including std::sort). This is definitely worth exploring. |
The way I've done it in past lives is to start with y-then-x sorted rectangles and to keep an active list of rectangles that are currently being sliced off into lines. New rectangles are added to the active list as you reach their starting Y and rectangles are pruned from that list as you reach their ending Y. The final list of lines are created in order as you go with their full complement of spans. The active list is kept in X order to make slicing the spans for each line simple. Since the rectangles were originally sub-sorted by X, they often come in in order so there is an insertion merge of new rectangles into the active list, but no need to sort the active rectangles other than that. Using a list of pointers to the original bulk rectangle objects makes the sorting go faster with simpler moves and inserts of pointers. Then the active list of rectangles can actually be maintained in place in the sorted list of pointers, slowly consuming it as the active list shrinks and grows and is copied into Line objects in the final region. |
So I tried this implementation: https://gist.github.com/knopp/1029e85421aabf4ec13a644d10ebc515 It does one pass through a sorted list of rectangles, but in pathological cases where there is lot of large overdrawn rectangles (and thus many overlapping rectangles in each span) it performs much worse than the original algorithm. That's likely because in the original algorithm each large rectangle can possibly reduce the number of span lines and spans, reducing the work needed for subsequent rectangles. The best result I get so far is a variation of original algorithm, where the rectangles are presorted by Y axis, which removes the need of binary search for each rectangles: Here are the benchmarks for reference:
|
9edd122
to
64c8151
Compare
There is a lot of work to manage the spans. In my vision, the spans are already known on each line and it is just a matter of slicing them off into a SpanLine. I'll see if I can mock up what I'm talking about. |
The problem there is that even if I completely commented out the body of If you could put together something that we can benchmark against current implementation that would be great. It is possible that I've been looking at the current algorithm for way too long and I am missing something. |
ea9f47d
to
2774ba3
Compare
fa996f5
to
794ce5c
Compare
You're good to go then! |
…128282) flutter/engine@7f12e34...220ece4 2023-06-05 zanderso@users.noreply.github.com Ensure Dart roll script picks up udpates to dart-sdk/tools (flutter/engine#42576) 2023-06-05 chillers@google.com Run dependabot in off peak hours (flutter/engine#42572) 2023-06-05 flar@google.com fix bounds of inverted rendered rectangles (flutter/engine#42556) 2023-06-05 skia-flutter-autoroll@skia.org Roll Fuchsia Mac SDK from fgfgAhpxFpse7Xi4i... to tCxDcZ3yi0rnKGVHt... (flutter/engine#42574) 2023-06-05 kjlubick@users.noreply.github.com Add missing #include of SkCFObject.h (flutter/engine#42573) 2023-06-05 matej.knopp@gmail.com Improve getting non-overlapping rectangles from RTree (flutter/engine#42399) Also rolling transitive DEPS: fuchsia/sdk/core/mac-amd64 from fgfgAhpxFpse to tCxDcZ3yi0rn If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC jonahwilliams@google.com,rmistry@google.com,zra@google.com on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
I played around with a more deterministic algorithm and ran into the same issues as you did - that it is faster for the smaller rect benchmarks, but gets problematic as the rectangles increase size. Really it is related to how many overlapping rectangles there are in the active list of rects with the smaller sizes having a more limited list of 10-20 active rects per loop (most of which are not overlapping), up to 300 active rects (which produce only 1-2 spans between them) for the large size. Some improvement on how to manage the generation of spans from active rects would help - perhaps a mini RTree of active spans with the top level being the actual spans that will get added for each line? My gist: https://gist.github.com/flar/8ebcde27481bbedb5225b5577490bfcc The results on a profile build:
One thing that concerns me is that the rectangles used for these benchmarks aren't really representative of what we see in a Flutter app. Overlapping rectangles are seen given the rendering of backgrounds of widgets covered by a few rendering ops in the widget content, but not nearly as bad as 300 active rects per span line. Still, these benchmarks do show the lack of scalability of this algorithm, but if we throw away a 42-49% improvement for the common case (and I'm not saying that Tiny and Small are representative either) then it would be nice to fix the quadratic behavior of the faster algorithm and even accept a (somewhat smaller) regression for the non-representative "Large" benchmark case to get faster common case behavior. The stats outputs for the various benchmark cases (that use 2000 input rects) are:
|
I think optimizing for many smaller rectangles is more likely to cover possible edge cases than optimizing for large rectangles (see flutter/flutter#126202). I'm going to play with your implementation a bit but so far I like. So I'd say that improvements in small/tiny/medium are more important than regression in large, as long the regression isn't catastrophic :) |
@flar, I think the tradeoff that your main branch:
new code (your
I don't think the regression of /Large is going to be a problem in practice. Do you want to make a PR for this, or should we just keep it in #42620 ? |
I'll take a look at #42620. Another thing to consider is what the callers do with this info. iOS will join all of the rects into 1 if there are more than 2 of them. Android will always join them all into 1. At least Fuchsia transfers the answer into its own version of a region. So, for Android it could do with a much much faster "just get me the bounds of all ops that intersect this rect" function that avoids the old quadratic behavior and even the new region construction costs". iOS doesn't really find much more use than that. Fuchsia would probably prefer to not deband the rects (hard to tell how that interacts with their Can iOS or Android be upgrading to handing off a region instead of processing the rects individually? |
Returning DlRegion instead of the rects from RTree would be beneficial. I'm currently implementing platform views on macOS. Part of it is optimizing the number of layers used for platform views. Thought some of the RTree information will be used for hit testing as well. For embedder API eventually we'll pass the overlay regions as array of rectangles, but that's after some processing. The way iOS handles overlay regions right now is quite unfortunate. But if it eventually gets migrated to embedder API the both the platform views and overlays would get much cheaper. |
Fixes flutter/flutter#116070
Fixes flutter/flutter#126202
Introduces
DlRegion
class which implements subset ofSkRegion
required to get non-overlapping rectangles from region.The implementation is different and faster than
SkRegion
for this particular use-case (display_list_region_benchmarks
):Edit: Updated benchmark to latest revision and natively (initial run went through rosetta)
Pre-launch Checklist
///
).If you need help, consider asking for advice on the #hackers-new channel on Discord.