Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize geometry.dart in dart:ui #94596

Open
dnfield opened this issue Dec 3, 2021 · 15 comments
Open

Vectorize geometry.dart in dart:ui #94596

dnfield opened this issue Dec 3, 2021 · 15 comments
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) engine flutter/engine repository. See also e: labels. P2 Important issues not at the top of the work list perf: speed Performance issues related to (mostly rendering) speed team-engine Owned by Engine team triaged-engine Triaged by Engine team

Comments

@dnfield
Copy link
Contributor

dnfield commented Dec 3, 2021

We should be able to back OffsetBase by a Float64x2 and then vectorize the operators on Offset and Size as well as some other things like lerp. This should show some improvements in performance.

It might or might not be an improvement to vectorize Rect. For better or worse, we switched Rect to use 64 bit doubles internally, so we could not vectorize it as a Float32x4, but it. might help to do 2x Float64x2s. At the same time, when we pass Rects back to Skia we give it 32 bit values - the main problem is that when we use Rect in the framework, we don't want precision loss from casting to/from 32 and 64 bit values to throw things off, which was the motivation for going to 64 bit doubles to begin with.

I'm not sure if this could help web as well or not /cc @yjbanov or @hterkelsen

@dnfield dnfield added engine flutter/engine repository. See also e: labels. c: performance Relates to speed or footprint issues (see "perf:" labels) perf: speed Performance issues related to (mostly rendering) speed P2 Important issues not at the top of the work list labels Dec 3, 2021
@jonahwilliams
Copy link
Member

Now this is podracing

Something to be aware of is the fact that on the web simd types are emulated, so they will be extremely slow and should be avoided at all costs. That is ok though, web can use the existing impl.

You might want to consider dropping some resolution and moving to float32x4, which would give you more of a speedup. (I also have an old branch around with simd matrix utils, I will find it next week)

The other big thing to be afraid of is that x64 and arm support for simd is not exactly the same and there are some operations that are emulated.

Also simd types can't be const , so it may not be possible to use them as a backing storage, only as an intermediary

@dnfield
Copy link
Contributor Author

dnfield commented Dec 3, 2021

I'm working on a similar change for package:path_parsing. It seems to help on ARM (Pixel 4) more than x64 (macbook pro), but it doesn't seem to hurt on x64.

@dnfield
Copy link
Contributor Author

dnfield commented Dec 3, 2021

And yeah, if we drop precision we might go faster but then we get weird bugs around losing precision in the framework, which is why we stopped doing it to begin with :\

@jonahwilliams
Copy link
Member

For things like semantic rects the precision probably doesn't matter

@zanderso
Copy link
Member

zanderso commented Dec 3, 2021

Related: dart-lang/sdk#41950

@zanderso
Copy link
Member

zanderso commented Dec 3, 2021

Also: #60538

@jonahwilliams
Copy link
Member

also fyi @yjbanov , because I believe flutter web pays an additional cost for double precision that isn't necessary on the web engine.

@dnfield
Copy link
Contributor Author

dnfield commented Dec 5, 2021

I'm kind of wondering if this should just be a compiler optimization instead. If it's really hard to do, maybe it could be made easier with some kind of pragma, e.g.

@pragma('vm:simd_add', ['dx', 'dy'], ['other.dx', 'other.dy'])
Offset operator +(Offset other) => Offset(dx + other.dx, dy + other.dy);

@alexmarkov wdyt?

@alexmarkov
Copy link
Contributor

If we store dx and dy separately then we would need to pack/unpack them to/from simd vectors before/after each operation, which may eat all the performance improvement from using a vector operation. It would be more efficient to store them as vectors at the first place. So I would suggest to use Float64x2 explicitly instead of introducing such pragmas for vectorization.

When values are mutable it's okay to create new Float64x2 objects. If that becomes a bottleneck, we can teach AOT compiler to unbox simd values in more cases (currently Float64x2 objects are not allocated for intermediate values during calculation, but still allocated when passing values as parameters, returning values or storing them in fields).

@dnfield
Copy link
Contributor Author

dnfield commented Dec 6, 2021

Part of the problem is that the Float64x2 type is not const constructible.

@alexmarkov
Copy link
Contributor

I think it is possible to make const constructors for SIMD types as they are built-in and can be treated as int / double, but it might require a special exception and support for SIMD constants in CFE.

Related: dart-lang/sdk#31487.

@mraleph
Copy link
Member

mraleph commented Dec 7, 2021

@dnfield do you have a link to the change to path_parsing that you are working on (and microbenchmark you are testing it on?) I'd would not be surprised if we simply don't properly unbox all involved doubles in the parsing loop.

@dnfield
Copy link
Contributor Author

dnfield commented Dec 7, 2021

see dnfield/dart_path_parsing#8

That PR is slightly faster than the original, measured on an Android Go ARM 32 bit device. It is very similar on my MacbookPro however. It gets even faster if I remove the decomposeArctoCubic routine and let Skia take care of that in C++ - which seems a little unfortunate to me.

@mraleph
Copy link
Member

mraleph commented Dec 7, 2021

@dnfield based on the cursory glance I'd make a guess that most of the improvements come from the fact that you save on indirection rather than from the vectorization (though vectorization might play some minor role). You are going from storing floating point values through indirection in a heap allocated object to storing them inline in the SIMD objects (which are also unboxed).

@mraleph
Copy link
Member

mraleph commented Dec 10, 2021

(/cc @leafpetersen @lrhn this might be considered an example where value object (or structs) could help to write performant code)

@flutter-triage-bot flutter-triage-bot bot added team-engine Owned by Engine team triaged-engine Triaged by Engine team labels Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: performance Relates to speed or footprint issues (see "perf:" labels) engine flutter/engine repository. See also e: labels. P2 Important issues not at the top of the work list perf: speed Performance issues related to (mostly rendering) speed team-engine Owned by Engine team triaged-engine Triaged by Engine team
Projects
None yet
Development

No branches or pull requests

7 participants