New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve polyline framerate for map moves, i.e. panning (performance i… #1423
Conversation
…s unchanged for zooming/rotating), by reducing repaints and doing less work on the critical path.
Thanks again :) @moonag will you get time to look at this and compare to your approach any time soon? If not, I'll have a look at this one over yours (as I know yours still has a few bugs). If you can fix, then we can do a direct performance comparison. @ignatz If you ever join the Discord server, let us know who you are (ping me) and I'll give you the Awesome Person role - has no effect as of now (might in future), but a special thanks from us. |
@JaffaKetchup @ignatz I always say to be careful optimization that aren't extensively profiled like previously we have had code in flutter_map where someone added a bunch of code to cache a calculation for instance, but the reality is if you took some time to profile it, that code even though it was caching and resulted in a 5% (using example numbers here) uplift in performance while panning, when zooming since it had to regenerate the cache it caused a 10% performance loss. When in reality all you had to do was remove the caching code and gain 5% performance overall. Especially things related to caching its very important to know exactly why we are doing it and make sure that it is justifiable. Just to be curious, would you be willing to profile your PR while zooming before and after? I am curious to see how this caching mechanism profiles in the "worst case"! I will try and get some work done in the next few days and then we can work on merging the two methods, if I do not please just review this make sure it is good; we can always change it later and as long as while it is panning it doesn't harm performance I would be comfortable if I am not able to get any work done by the end of the week. |
Hey folks, Thanks for the prompt response. Much appreciated. Let me start saying that I don't have any strong opinions on "what is the correct" way. There's many strategies and the devil is in the detail. Let me try to get to all your points
You're preaching to the choir :). I'd go further to say that any project that is serious about peformance needs to invest into continuous monitoring. Many of the topics you're touching on are very sensitive to implementation details, render surface, flutter engine, ... .
You're correct that the patch, as is, only helps with panning (I tried to be transparent about this, see PR title). Theoretically, we could even try to cover rotating and zoom as well, at the end of the day it's the same transformation that is already applied to map tiles.
There's a lot in here to dissect. To be more specific, I think you're alluding to the fact that while GPU rendering tends to be faster than CPU rendering, there's some non-trivial overhead associated that becomes particularly noticeable for many small canvases. By batching you intend to amortize that overhead by rendering multiple, potentially small, poly(gons,lines) into a single canvas. That seems like a good idea and should work well with the current feature set. Playing devils advocate, I construct some issues, e.g. someone wanted to wrap the individual polygons into gesture detectors or similar. Am I guessing correctly that you've seen some good results with zooming and panning as well? 👍 Moreover, note that Flutter doesn't always render to SKIA gpu surfaces, notably web doesn't. There are cases where batching, as an optimization, will be of limited use. At the same time, you could imagine the Flutter engine implementing an optimization to render small canvases on the CPU instead (maybe we should request that anyways) making batching unnecessary. Some browsers do that already. For example, https://groups.google.com/a/chromium.org/g/blink-dev/c/NPSQdiXSK4w/m/jgzIaJPJxh8J is a discussion of the leaflet and the chrome folks about map tiles being rendered unexpectedly on the CPU. Which brings us full circle to your plea for measuring. You don't need to convince me :). Finally, touching on your statement "Batching is the correct way to get real performance improvements because of how GPUs render things": doing less work on the GPU, either by offloading work to the CPU or just not redrawing canvases using RepaintBoundaries, also seem to provide real performance benefits in the light of GPU idiosyncrasies and even for non GPU surfaces. Based on the thread linked above, that seems to be something the leaflet folks did at least consider (we could also render on separate isolates and stream the results in. Not too unlike how tile loading works)
I guess you mostly need to merge the bounding boxes of batched polygons/polylines.
Sure thing. Do you have a benchmark I could run. Meanwhile let me share some less scientific results. Unsurprisingly, the work done on the GPU seems pretty unchanged, however there's quite a bit of overhead on the UI thread.
I can also offer you a polygon version of my batch. I haven't shared it because I haven't broken it out of my further customized flutter_map yet (I'm trying here to pay back step by step). |
Hi! I have just finished my first writeup for polyline batching! I am using the polyline example app in flutter_map with this added code snippet which adds 1000 polylines to the map and zooming out to show the entire block of green. for(int i = 0; i < 1000; i++)
Polyline(
points: [
LatLng(60.5 + (i /100), -10.09 - (i /100)),
LatLng(61.3498 - (i /100), -16.2603 + (i /100)),
LatLng(63.8566 + (i /100), 12.3522 - (i /100)),
],
strokeWidth: 1,
color: Colors.green,
), Using master branch on my Z fold 4 I am getting a raster time of 450-750ms, and a UI time around 8ms in profile mode while zooming into the block of green the Raster time increases to about 1500ms (less than 1 frame per second). With my batching implementation raster time is a consistent 70ms when zoomed all the way out and the UI time is about 3-4ms. When Zooming into the blob performance increases dramatically with a raster time of 32ms and a UI time of 2.5ms. This implies to me that the limiting factor for performance here is the pixel fill rate; as the further I zoom in the greater the performance. Obviously batching will not result in a 10-20x performance in the real world, but 3x or more is realistic in many situations. This profile is early into the optimization and only supports non-dotted polylines. Tested on my fork on commit #d763ddb. Here is a CPU flame profile with my branch during the zoomed out phase. Please try out my branch to profile it's performance in your app! You can change your pubspec like this: flutter_map:
git: https://github.com/MooNag/flutter_map Lastly I would like to address your helpful comments! Yeah I would love to see the implementation being able to cache projections in some meaningful way. I have tried before and the overhead for re-calculating the projections is less than the cost of caching it in every case I have tested due to what I believe are cache misses in the CPU itself. Yeah exactly, GPU rendering is faster than CPU rendering by many many many times, but all of this is being rendered on the GPU anyways, the detail here is that there is an issue with how the CPU instructs the GPU to render things. GPUs render many things at once, (think about pixel fill that I mentioned earlier), so in graphics programming (I am not a GOOD graphics programmer); we use a concept called batching, where the CPU can tell the GPU that it want's to render a material, lets say a green "pen" with green color and a width of 4. Then we send it a list of things to draw, lets say 500 different polygon offsets. The CPU is really good at calculating where on screen to draw the polygons (using the map projection) so it can tell the GPU exactly what "pixels" are each path made of all at once. Then the gpu will render all 500 polygons with that "pen" quickly due to it being really good at doing many things at once. This is called batching, and one command is sent per unique material. Currently flutter_map calculates a single polygon, then tells the gpu to draw 1 polygon. The problem with this is that the time it takes to tell the GPU to do something is much longer than the time it takes for the GPU to render the polygon. This is why batching is so important to rendering fast. That is potentially a good point about flutter_web. We will have to test and profile GPU acceleration for web rendering. I am not sure if it works or does not. Either way, batching will not harm performance in this case but we should profile to make absolutely certain. I belive the link you sent is exactly about overhead to setting up GPU acceleration which doesn't really apply to flutter since that process happens when booting up the app, not when loading the map! So this is a good thing for us that we do not have to worry about it! We can always assume that GPU acceleration will be active and used when it is available as flutter relies on GPU acceleration. Awesome profiles! I am curious if the extra UI overhead is coming from too much caching of the bounding boxes and related information. This is just speculation though and you will want to properly profile to see where that extra time is coming from!
Awesome! No worries or stress to get it out immediately. Please do contribute your improvements one at a time, that is ideal! Let me know if there are any questions for me or if you find any bugs with my implementation! |
Awesome that sounds like a great relative improvement. I will definitely go and try out your changes - nice. Given your powerful phone, it's a bit sad that that we still seem to be dropping around every other frame (or more being so close to the 30fps render time threshold). Btw, do you refer to avg or max times, just so that we can talk about the same thing going forward?
Could you expand? This hypothesis goes against my expectations. Since you're filling only a few medium sized canvases I wouldn't suspect gpu bandwidth limitations.
Maybe we're talking past each other, I was never suggesting to cache projections. As you point out, projections are cheap especially in the greater scheme of things. I wouldn't expect them to add significantly to the UI thread. I don't think you need to speculate on cache hit rates to understand the magnitude of non-impact. Re batching and GPU overhead, it seems we're both saying the same thing 👍
I don't think that's correct. IIUC, leaflet was seeing issues while rendering certain tile sets. Depending on their oversampling rate the tile would either be rendered on the CPU or the GPU (which also seemed to impact their visual fidelity) based on heuristics of the browser renderer. My two take aways from that were mostly: 1 . The flutter engine doesn't implement such a heuristic today but it easily could and it could greatly benefit a whole array of applications
Now you're just pulling my leg :) - Different flutter render surfaces use different strategies depending on their implementations: https://github.com/flutter/engine/tree/aa4b3ea2f733aa53edbbc6198099cfc1b21671fb/shell/platform.
Note that my patch doesn't cache any additional bounding boxes. I haven't looked into the details. Naively, since I'm not doing any extra work it might be spent in the engine. Just speculation, might be interesting to look. That said, this is academic. Note that the increase in UI times for my example doesn't add any extra jank, since render times are dominated by the GPU rasterization. If I'm pumping out a new frame only every ~90ms, for keeping the render pipeline saturated it doesn't matter if my UI thread times are 10ms, 20ms or 90ms. Maybe I wasn't quite clear on my last response, I'm actually very interested in your batching. I could see it work better for my use case than my more naive repaint boundary approach (I'll share some results as soon as I get to it). Your preliminary results of 32ms rasterization times, while promising, make me think that combining both approaches could still be beneficial in reducing jank even further. |
I just quickly wanted to share some early qualitative results: I took your polyline implementation for a spin and it gives me a significantly greater performance uplift (both on android and web). Great job! EDIT: Thinking about your clustering approach a bit more, clustering by polyline style properties will lead in unexpected visual results since you're ultimately changing the draw order. |
Good work, I think what would also be useful are some replicable test cases we use, eg some with a large set of polys, and another with a set of polys that may have a lot of points but less of them iyswim (where things like simplification could come into play). Then it may be easier to compare and work on future improvements as well on different devices/web. |
@moonag I took the liberty to iterate on your polyline layer: https://github.com/ignatz/flutter_map-clone/blob/fast_polylines_batching/lib/src/layer/polyline_layer.dart. The only differences are:
Otherwise it's virtually the same and has the same performance characteristics as yours. It should be in a PR'able state. Combining your approach with avoiding repaints on panning/rotating would require quite a few changes. I'd definitely postpone that. What do you think? Would you like to merge my changes back into your or want me to send out a new PR for polyline batching (credit for the proposed approach clearly goes to you)? |
FYI, gave the polygons a shot as well: https://github.com/ignatz/flutter_map-clone/tree/fast_polygons. It should pretty much work including hole drawing: |
Happy new year everyone! Everyone's hopefully very busy over the holidays, so no need to rush :). I prepared, https://github.com/ignatz/flutter_map-clone/tree/fast_poly, just to give folks an overview of what's in flight. It's a culmination of everything we've discussed:
I kept the changes for poly*, batching, and caching separate. Let me know ho you'd like to proceed. We can probably close the PR at hand, since the direction has changed quite a bit with @moonag 's input. |
Hey, sorry about that! I am going to be quite busy with the FRC season starting up for the next few months. I took a quick read over your batching stuff and it looks mostly good! There are a few optimizations that could potentially be made; for instance using the library provided distance method and potentially some stuff with offset generation and iterating efficiency but that would need to be profiled! Ill try and get to some review tomorrow assuming I can get my main OS install back to being in a fully ideal state with my new GPU. It will be likely that caching points will cause a performance loss overall but again that will need to be extensively profiled before any conclusions can be made and I will be happy to do a lot of that profiling here as well as PR in some code improvements into your branch for merging into FM (I think that is possible). We can definitely do a new PR for these changes if that seems best to you otherwise the changes can be made here. I will likely close my PR but still push my other optimizations into your branch for merging. Sorry about being so slow about this on my end! |
No worries, I appreciate your time. Thanks to your input this turned out more performant than I had initially hoped for ;). I'll hold off on sending out any new PRs before you had a chance to take a look. We can then discuss how to proceed. I'd love to hear more about the further optimizations your thinking off. More frames more better. That said, I would suggest to start with a straight-forward implementation. We can always iterate and improve. Cheers mate. |
// If we were smarter we could set willChange to true during | ||
// zooming/rotating the map and to false during moves. | ||
willChange: true, | ||
isComplex: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you noticed if the raster caching resulting from isComplex: true
suffer from lack of anti-aliasing just like when caching using a RepaintBoundary
? It's usually not that glaring, but for geometries with thin strokes like polylines the jagged strokes became a bit glaring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't noticed. Is there an example I can use to validate myself? Would you suggest to not raster cache at all or rather make it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just to avoid confusion, this PR is a bit stale. I've moved my own focus over to https://github.com/ignatz/flutter_map-clone/tree/fast_poly, which does both: raster caching but more importantly reduce the canvas draw operations (based on MooNag's proposal). The only reason I haven't closed this PR yet it because I was waiting for MooNag's input to tell me how they'd like to handle the PR/swaperoo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to continue wherever! I can close my PR once we merge yours in. If you need to make a new PR feel free to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Let's keep the ball rolling. I opened #1442. And I will close this PR to avoid confusion.
|
||
double _dist2(Offset v, Offset w) { | ||
return _sqr(v.dx - w.dx) + _sqr(v.dy - w.dy); | ||
return sqrt(_sqr(v.dx - w.dx) + _sqr(v.dy - w.dy)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before simplifying, this one line:
final totalDistance = _dist(o0, o1);
Is the only thing that depends on:
double _dist(Offset v, Offset w) {
return sqrt(_dist2(v, w));
}
double _dist2(Offset v, Offset w) {
return _sqr(v.dx - w.dx) + _sqr(v.dy - w.dy);
}
double _sqr(double x) {
return x * x;
}
I propose removing not only _dist2
but all of those functions and simply doing something like:
final totalDistance = (o0 - o1).distance;
Closing in favor of #1442 |
…s unchanged for zooming/rotating), by reducing repaints and doing less work on the critical path.
This is a pretty straight forward change (with some opportunistic cleanups).
I noticed #1381 as a "competing" approach for the Polygon layer. Note that my approach should likely also work in a straightforward fashion for the PolygonLayer w/o the need for any batching. I'm not sure how well they'd be combinable, i.e. avoiding repaints of clustered draws in the light of culling.
Anyway, I'm getting significantly improved draw times (and thus frame rates):