Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core,render: Improve performance of "round to nearest, ties to even" float operations #16250

Merged
merged 4 commits into from
May 8, 2024

Conversation

kjarosh
Copy link
Member

@kjarosh kjarosh commented May 8, 2024

This patch improves performance of ecma_conversions::round_to_even() and matrix::round_to_i32():

  1. by using f64::round_ties_even()/f32::round_ties_even(), which have been stable since 1.77.0, instead of a custom algorithm; and
  2. by removing an unnecessary comparison to i32::MIN, as casting a float to an integer automatically saturates values smaller than the minimum integer value to the minimum value of the integer type.

A primitive benchmark shows around 20% faster execution of these operations. In the case of matrix::round_to_i32() we can potentially expect noticeable improvements in overall performance as this operation is quite commonly used.

Platform Operation Type Improvement
amd64 ecma_conversions::round_to_even() f64 ~21%
WASM amd64 ecma_conversions::round_to_even() f64 ~20%
amd64 matrix::round_to_i32() f32 ~22%
WASM amd64 matrix::round_to_i32() f32 ~23%

Those functions have been covered by tests to ensure the behavior hasn't changed.

This patch improves performance of ecma_conversions::round_to_even():
1. by using f64::round_ties_even(), which has been stable
   since 1.77.0, instead of a custom algorithm; and
2. by removing an unnecessary comparison to i32::MIN,
   as casting a float to an integer automatically saturates
   values smaller than the minimum integer value to the minimum
   value of the integer type.
This patch improves performance of matrix::round_to_i32()
by using f32::round_ties_even(), which has been stable
since 1.77.0, instead of a custom algorithm.
Copy link
Collaborator

@adrian17 adrian17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Out of curiosity, did you find any content that's particularly slow due to these, or is this PR more inspired by round_ties_even being stabilized?

@kjarosh
Copy link
Member Author

kjarosh commented May 8, 2024

I remember analyzing performance of some SWF game where the results showed that sometimes matrix::round_to_i32 took a noticeable amount of time. I tried optimizing matrix operations using SIMD intrinsics (round_ties_even was unstable at the time), but that was extremely complicated, and I basically would have had to implement portable SIMD.

Today I noticed that round_ties_even has been stabilized so that was a no-brainer optimization to do :)

@torokati44
Copy link
Member

torokati44 commented May 8, 2024

and I basically would have had to implement portable SIMD.

We use the wide crate for this over in https://github.com/ruffle-rs/h263-rs/. It maps neatly to intrinsics on supported platforms, and expands to scalar stuff otherwise.

EDIT: That is, in the colorspace conversion part (the yuv package). In the decoding itself, the most critical part was the IDCT, which thankfully gets autovectorized anyway.

@adrian17 adrian17 merged commit b99bdad into ruffle-rs:master May 8, 2024
17 checks passed
@kjarosh kjarosh deleted the round-to-even branch May 8, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants