Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve polyline framerate for map moves, i.e. panning (performance i… #1423

Closed
wants to merge 1 commit into from

Conversation

ignatz
Copy link
Contributor

@ignatz ignatz commented Dec 27, 2022

…s unchanged for zooming/rotating), by reducing repaints and doing less work on the critical path.

This is a pretty straight forward change (with some opportunistic cleanups).

I noticed #1381 as a "competing" approach for the Polygon layer. Note that my approach should likely also work in a straightforward fashion for the PolygonLayer w/o the need for any batching. I'm not sure how well they'd be combinable, i.e. avoiding repaints of clustered draws in the light of culling.

Anyway, I'm getting significantly improved draw times (and thus frame rates):

  • Before: Screenshot_2022-12-27-22-11-32-55_1b269ddfa23dc77342248c30e8616ea5
  • After: Screenshot_2022-12-27-22-32-26-03_1b269ddfa23dc77342248c30e8616ea5

…s unchanged for zooming/rotating), by reducing repaints and doing less work on the critical path.
@JaffaKetchup
Copy link
Member

Thanks again :)

@moonag will you get time to look at this and compare to your approach any time soon? If not, I'll have a look at this one over yours (as I know yours still has a few bugs). If you can fix, then we can do a direct performance comparison.

@ignatz If you ever join the Discord server, let us know who you are (ping me) and I'll give you the Awesome Person role - has no effect as of now (might in future), but a special thanks from us.

@mootw
Copy link
Collaborator

mootw commented Dec 28, 2022

@JaffaKetchup @ignatz
Yeah! I may be able to do some work on it later tonight or tomorrow. I can also do the batching stuff for polylines too. From what I can see quickly looking over the changes here this method just avoids repainting when panning, which will improve perceived panning performance; but it's more of a performance optimization bandaid (not in a bad way! It just has limited scope) rather than actually improving the architecture. Batching is the correct way to get real performance improvements because of how GPUs render things. Both of these changes should be compatible, but batching should be done first, then this second. It will require fin-nicking with culling to get them to both work together but that is fine.

I always say to be careful optimization that aren't extensively profiled like previously we have had code in flutter_map where someone added a bunch of code to cache a calculation for instance, but the reality is if you took some time to profile it, that code even though it was caching and resulted in a 5% (using example numbers here) uplift in performance while panning, when zooming since it had to regenerate the cache it caused a 10% performance loss. When in reality all you had to do was remove the caching code and gain 5% performance overall. Especially things related to caching its very important to know exactly why we are doing it and make sure that it is justifiable.

Just to be curious, would you be willing to profile your PR while zooming before and after? I am curious to see how this caching mechanism profiles in the "worst case"!

I will try and get some work done in the next few days and then we can work on merging the two methods, if I do not please just review this make sure it is good; we can always change it later and as long as while it is panning it doesn't harm performance I would be comfortable if I am not able to get any work done by the end of the week.

@ignatz
Copy link
Contributor Author

ignatz commented Dec 28, 2022

Hey folks,

Thanks for the prompt response. Much appreciated. Let me start saying that I don't have any strong opinions on "what is the correct" way. There's many strategies and the devil is in the detail.

Let me try to get to all your points

I always say to be careful optimization that aren't extensively profiled...

You're preaching to the choir :). I'd go further to say that any project that is serious about peformance needs to invest into continuous monitoring. Many of the topics you're touching on are very sensitive to implementation details, render surface, flutter engine, ... .

From what I can see quickly looking over the changes here this method just avoids repainting when panning, which will improve perceived panning performance; but it's more of a performance optimization bandaid (not in a bad way! It just has limited scope) rather than actually improving the architecture.

You're correct that the patch, as is, only helps with panning (I tried to be transparent about this, see PR title). Theoretically, we could even try to cover rotating and zoom as well, at the end of the day it's the same transformation that is already applied to map tiles.

Batching is the correct way to get real performance improvements because of how GPUs render things.

There's a lot in here to dissect. To be more specific, I think you're alluding to the fact that while GPU rendering tends to be faster than CPU rendering, there's some non-trivial overhead associated that becomes particularly noticeable for many small canvases. By batching you intend to amortize that overhead by rendering multiple, potentially small, poly(gons,lines) into a single canvas.

That seems like a good idea and should work well with the current feature set. Playing devils advocate, I construct some issues, e.g. someone wanted to wrap the individual polygons into gesture detectors or similar. Am I guessing correctly that you've seen some good results with zooming and panning as well? 👍

Moreover, note that Flutter doesn't always render to SKIA gpu surfaces, notably web doesn't. There are cases where batching, as an optimization, will be of limited use. At the same time, you could imagine the Flutter engine implementing an optimization to render small canvases on the CPU instead (maybe we should request that anyways) making batching unnecessary. Some browsers do that already. For example, https://groups.google.com/a/chromium.org/g/blink-dev/c/NPSQdiXSK4w/m/jgzIaJPJxh8J is a discussion of the leaflet and the chrome folks about map tiles being rendered unexpectedly on the CPU. Which brings us full circle to your plea for measuring. You don't need to convince me :).

Finally, touching on your statement "Batching is the correct way to get real performance improvements because of how GPUs render things": doing less work on the GPU, either by offloading work to the CPU or just not redrawing canvases using RepaintBoundaries, also seem to provide real performance benefits in the light of GPU idiosyncrasies and even for non GPU surfaces. Based on the thread linked above, that seems to be something the leaflet folks did at least consider (we could also render on separate isolates and stream the results in. Not too unlike how tile loading works)

Both of these changes should be compatible, but batching should be done first, then this second. It will require fin-nicking with culling to get them to both work together but that is fine.

I guess you mostly need to merge the bounding boxes of batched polygons/polylines.

Just to be curious, would you be willing to profile your PR while zooming before and after? I am curious to see how this caching mechanism profiles in the "worst case"!

Sure thing. Do you have a benchmark I could run. Meanwhile let me share some less scientific results. Unsurprisingly, the work done on the GPU seems pretty unchanged, however there's quite a bit of overhead on the UI thread.

  • Before: Screenshot_2022-12-29-00-07-02-84_1b269ddfa23dc77342248c30e8616ea5
  • After: Screenshot_2022-12-29-00-04-18-78_1b269ddfa23dc77342248c30e8616ea5

I can also do the batching stuff for polylines too.

I can also offer you a polygon version of my batch. I haven't shared it because I haven't broken it out of my further customized flutter_map yet (I'm trying here to pay back step by step).

@mootw
Copy link
Collaborator

mootw commented Dec 29, 2022

Hi! I have just finished my first writeup for polyline batching!

I am using the polyline example app in flutter_map with this added code snippet which adds 1000 polylines to the map and zooming out to show the entire block of green.
This situation is ideal for batching since there are 1000 polylines that can share the same exact draw call. Normal use will have lower batching ratios and not see the same performance uplift.

for(int i = 0; i < 1000; i++)
        Polyline(
          points: [
            LatLng(60.5 + (i /100), -10.09 - (i /100)),
            LatLng(61.3498 - (i /100), -16.2603 + (i /100)),
            LatLng(63.8566 + (i /100), 12.3522 - (i /100)),
          ],
          strokeWidth: 1,
          color: Colors.green,
        ),

Using master branch on my Z fold 4 I am getting a raster time of 450-750ms, and a UI time around 8ms in profile mode while zooming into the block of green the Raster time increases to about 1500ms (less than 1 frame per second).

With my batching implementation raster time is a consistent 70ms when zoomed all the way out and the UI time is about 3-4ms. When Zooming into the blob performance increases dramatically with a raster time of 32ms and a UI time of 2.5ms.

This implies to me that the limiting factor for performance here is the pixel fill rate; as the further I zoom in the greater the performance.

Obviously batching will not result in a 10-20x performance in the real world, but 3x or more is realistic in many situations.

This profile is early into the optimization and only supports non-dotted polylines. Tested on my fork on commit #d763ddb.
I have also included the "benchmark" code in my commits for both polygons and polylines in the example app!

Here is a CPU flame profile with my branch during the zoomed out phase.
image
Here is a profile of flutter_map master in the same setup. I have hidden core flutter libraries in this snippet due to the chart being extremely tall. As you can see draw calls are extremely expensive.
image

Please try out my branch to profile it's performance in your app! You can change your pubspec like this:

flutter_map:
    git: https://github.com/MooNag/flutter_map

Lastly I would like to address your helpful comments! Yeah I would love to see the implementation being able to cache projections in some meaningful way. I have tried before and the overhead for re-calculating the projections is less than the cost of caching it in every case I have tested due to what I believe are cache misses in the CPU itself. Yeah exactly, GPU rendering is faster than CPU rendering by many many many times, but all of this is being rendered on the GPU anyways, the detail here is that there is an issue with how the CPU instructs the GPU to render things. GPUs render many things at once, (think about pixel fill that I mentioned earlier), so in graphics programming (I am not a GOOD graphics programmer); we use a concept called batching, where the CPU can tell the GPU that it want's to render a material, lets say a green "pen" with green color and a width of 4. Then we send it a list of things to draw, lets say 500 different polygon offsets. The CPU is really good at calculating where on screen to draw the polygons (using the map projection) so it can tell the GPU exactly what "pixels" are each path made of all at once. Then the gpu will render all 500 polygons with that "pen" quickly due to it being really good at doing many things at once. This is called batching, and one command is sent per unique material. Currently flutter_map calculates a single polygon, then tells the gpu to draw 1 polygon. The problem with this is that the time it takes to tell the GPU to do something is much longer than the time it takes for the GPU to render the polygon. This is why batching is so important to rendering fast.

That is potentially a good point about flutter_web. We will have to test and profile GPU acceleration for web rendering. I am not sure if it works or does not. Either way, batching will not harm performance in this case but we should profile to make absolutely certain.

I belive the link you sent is exactly about overhead to setting up GPU acceleration which doesn't really apply to flutter since that process happens when booting up the app, not when loading the map! So this is a good thing for us that we do not have to worry about it! We can always assume that GPU acceleration will be active and used when it is available as flutter relies on GPU acceleration.

Awesome profiles! I am curious if the extra UI overhead is coming from too much caching of the bounding boxes and related information. This is just speculation though and you will want to properly profile to see where that extra time is coming from!

I can also offer you a polygon version of my batch.

Awesome! No worries or stress to get it out immediately. Please do contribute your improvements one at a time, that is ideal!

Let me know if there are any questions for me or if you find any bugs with my implementation!

@ignatz
Copy link
Contributor Author

ignatz commented Dec 29, 2022

Using master branch on my Z fold 4 I am getting a raster time of 450-750ms, and a UI time around 8ms in profile mode while zooming into the block of green the Raster time increases to about 1500ms (less than 1 frame per second).

With my batching implementation raster time is a consistent 70ms when zoomed all the way out and the UI time is about 3-4ms. When Zooming into the blob performance increases dramatically with a raster time of 32ms and a UI time of 2.5ms.

Awesome that sounds like a great relative improvement. I will definitely go and try out your changes - nice. Given your powerful phone, it's a bit sad that that we still seem to be dropping around every other frame (or more being so close to the 30fps render time threshold). Btw, do you refer to avg or max times, just so that we can talk about the same thing going forward?

This implies to me that the limiting factor for performance here is the pixel fill rate; as the further I zoom in the greater the performance.

Could you expand? This hypothesis goes against my expectations. Since you're filling only a few medium sized canvases I wouldn't suspect gpu bandwidth limitations.

Yeah I would love to see the implementation being able to cache projections in some meaningful way. I have tried before and the overhead for re-calculating the projections is less than the cost of caching it in every case I have tested due to what I believe are cache misses in the CPU itself.

Maybe we're talking past each other, I was never suggesting to cache projections. As you point out, projections are cheap especially in the greater scheme of things. I wouldn't expect them to add significantly to the UI thread. I don't think you need to speculate on cache hit rates to understand the magnitude of non-impact.

Re batching and GPU overhead, it seems we're both saying the same thing 👍

I belive the link you sent is exactly about overhead to setting up GPU acceleration which doesn't really apply to flutter since that process happens when booting up the app, not when loading the map! So this is a good thing for us that we do not have to worry about it!

I don't think that's correct. IIUC, leaflet was seeing issues while rendering certain tile sets. Depending on their oversampling rate the tile would either be rendered on the CPU or the GPU (which also seemed to impact their visual fidelity) based on heuristics of the browser renderer. My two take aways from that were mostly:

1 . The flutter engine doesn't implement such a heuristic today but it easily could and it could greatly benefit a whole array of applications
2. We ourselves could choose to push the rendering of small canvases on the CPU (ideally separate isolates) to take load off the GPU which is currently mostly buckled by overhead.

We can always assume that GPU acceleration will be active and used when it is available as flutter relies on GPU acceleration.

Now you're just pulling my leg :) - Different flutter render surfaces use different strategies depending on their implementations: https://github.com/flutter/engine/tree/aa4b3ea2f733aa53edbbc6198099cfc1b21671fb/shell/platform.

Awesome profiles! I am curious if the extra UI overhead is coming from too much caching of the bounding boxes and related information. This is just speculation though and you will want to properly profile to see where that extra time is coming from!

Note that my patch doesn't cache any additional bounding boxes. I haven't looked into the details. Naively, since I'm not doing any extra work it might be spent in the engine. Just speculation, might be interesting to look. That said, this is academic. Note that the increase in UI times for my example doesn't add any extra jank, since render times are dominated by the GPU rasterization. If I'm pumping out a new frame only every ~90ms, for keeping the render pipeline saturated it doesn't matter if my UI thread times are 10ms, 20ms or 90ms.

Maybe I wasn't quite clear on my last response, I'm actually very interested in your batching. I could see it work better for my use case than my more naive repaint boundary approach (I'll share some results as soon as I get to it). Your preliminary results of 32ms rasterization times, while promising, make me think that combining both approaches could still be beneficial in reducing jank even further.

@ignatz
Copy link
Contributor Author

ignatz commented Dec 29, 2022

I just quickly wanted to share some early qualitative results: I took your polyline implementation for a spin and it gives me a significantly greater performance uplift (both on android and web). Great job!

EDIT: Thinking about your clustering approach a bit more, clustering by polyline style properties will lead in unexpected visual results since you're ultimately changing the draw order.

@ibrierley
Copy link
Collaborator

Good work, I think what would also be useful are some replicable test cases we use, eg some with a large set of polys, and another with a set of polys that may have a lot of points but less of them iyswim (where things like simplification could come into play). Then it may be easier to compare and work on future improvements as well on different devices/web.

@ignatz
Copy link
Contributor Author

ignatz commented Dec 29, 2022

@moonag I took the liberty to iterate on your polyline layer: https://github.com/ignatz/flutter_map-clone/blob/fast_polylines_batching/lib/src/layer/polyline_layer.dart. The only differences are:

  • should work with all styles
  • doesn't change the order in which polylines are drawn from what the user provided.

Otherwise it's virtually the same and has the same performance characteristics as yours. It should be in a PR'able state.

Combining your approach with avoiding repaints on panning/rotating would require quite a few changes. I'd definitely postpone that. What do you think? Would you like to merge my changes back into your or want me to send out a new PR for polyline batching (credit for the proposed approach clearly goes to you)?

@ignatz
Copy link
Contributor Author

ignatz commented Dec 30, 2022

FYI, gave the polygons a shot as well: https://github.com/ignatz/flutter_map-clone/tree/fast_polygons. It should pretty much work including hole drawing:

image

@ignatz
Copy link
Contributor Author

ignatz commented Jan 2, 2023

Happy new year everyone! Everyone's hopefully very busy over the holidays, so no need to rush :).

I prepared, https://github.com/ignatz/flutter_map-clone/tree/fast_poly, just to give folks an overview of what's in flight. It's a culmination of everything we've discussed:

  1. draw batching for polygons and polylines
  2. as well as caching paints to minimize redraws on panning (given in-scope poly*s don't change).

I kept the changes for poly*, batching, and caching separate.

Let me know ho you'd like to proceed. We can probably close the PR at hand, since the direction has changed quite a bit with @moonag 's input.

@JaffaKetchup
Copy link
Member

Hey @ignatz. Re-pinging @moonag for you :)

@mootw
Copy link
Collaborator

mootw commented Jan 10, 2023

Hey, sorry about that! I am going to be quite busy with the FRC season starting up for the next few months. I took a quick read over your batching stuff and it looks mostly good! There are a few optimizations that could potentially be made; for instance using the library provided distance method and potentially some stuff with offset generation and iterating efficiency but that would need to be profiled! Ill try and get to some review tomorrow assuming I can get my main OS install back to being in a fully ideal state with my new GPU. It will be likely that caching points will cause a performance loss overall but again that will need to be extensively profiled before any conclusions can be made and I will be happy to do a lot of that profiling here as well as PR in some code improvements into your branch for merging into FM (I think that is possible). We can definitely do a new PR for these changes if that seems best to you otherwise the changes can be made here. I will likely close my PR but still push my other optimizations into your branch for merging. Sorry about being so slow about this on my end!

@ignatz
Copy link
Contributor Author

ignatz commented Jan 10, 2023

No worries, I appreciate your time. Thanks to your input this turned out more performant than I had initially hoped for ;). I'll hold off on sending out any new PRs before you had a chance to take a look. We can then discuss how to proceed.

I'd love to hear more about the further optimizations your thinking off. More frames more better. That said, I would suggest to start with a straight-forward implementation. We can always iterate and improve.

Cheers mate.

// If we were smarter we could set willChange to true during
// zooming/rotating the map and to false during moves.
willChange: true,
isComplex: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you noticed if the raster caching resulting from isComplex: true suffer from lack of anti-aliasing just like when caching using a RepaintBoundary? It's usually not that glaring, but for geometries with thin strokes like polylines the jagged strokes became a bit glaring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't noticed. Is there an example I can use to validate myself? Would you suggest to not raster cache at all or rather make it configurable?

Copy link
Contributor Author

@ignatz ignatz Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just to avoid confusion, this PR is a bit stale. I've moved my own focus over to https://github.com/ignatz/flutter_map-clone/tree/fast_poly, which does both: raster caching but more importantly reduce the canvas draw operations (based on MooNag's proposal). The only reason I haven't closed this PR yet it because I was waiting for MooNag's input to tell me how they'd like to handle the PR/swaperoo)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to continue wherever! I can close my PR once we merge yours in. If you need to make a new PR feel free to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Let's keep the ball rolling. I opened #1442. And I will close this PR to avoid confusion.


double _dist2(Offset v, Offset w) {
return _sqr(v.dx - w.dx) + _sqr(v.dy - w.dy);
return sqrt(_sqr(v.dx - w.dx) + _sqr(v.dy - w.dy));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before simplifying, this one line:

final totalDistance = _dist(o0, o1);

Is the only thing that depends on:

double _dist(Offset v, Offset w) {
  return sqrt(_dist2(v, w));
}

double _dist2(Offset v, Offset w) {
  return _sqr(v.dx - w.dx) + _sqr(v.dy - w.dy);
}

double _sqr(double x) {
  return x * x;
}

I propose removing not only _dist2 but all of those functions and simply doing something like:

final totalDistance = (o0 - o1).distance;

@ignatz
Copy link
Contributor Author

ignatz commented Feb 4, 2023

Closing in favor of #1442

@ignatz ignatz closed this Feb 4, 2023
@fleaflet fleaflet locked and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants