Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix3 multiplication is 4x slower in 0.18 #543

Closed
KeyboardDanni opened this issue Mar 10, 2022 · 5 comments
Closed

Matrix3 multiplication is 4x slower in 0.18 #543

KeyboardDanni opened this issue Mar 10, 2022 · 5 comments
Labels

Comments

@KeyboardDanni
Copy link

AMD Ryzen 7 3700X
Windows 10 19044.1566
rustc 1.60.0-nightly (1e12aef3f 2022-02-13)

The other day I was testing sprite performance in my engine and noticed it was a lot lower than usual. Investigated the cause and found it was when I updated cgmath from 0.17 to 0.18.

With 0.17, my benchmark that does 100k translations, rotations, and scales on a Matrix3:

test bench_transform_matrix ... bench: 670,110 ns/iter (+/- 13,076)

With 0.18:

test bench_transform_matrix ... bench: 2,755,590 ns/iter (+/- 10,904)

The bench itself
Code for the transform wrapper

Since the transform happens for each sprite, the performance difference adds up quickly.

Using default features with -O2/-O3. Compiler target is x86_64-pc-windows-msvc.

@kvark
Copy link
Collaborator

kvark commented Mar 11, 2022

Wow that's quite concerning! Thank you for bringing this up.

@kvark kvark added the bug label Mar 11, 2022
@peppidesu
Copy link

peppidesu commented Apr 19, 2023

comparing the benchmarks provided by the crate shows a similar (3x) increase:

cargo bench --feature rand _bench_matrix3_mul_m    (0.18.0)
>> test _bench_matrix3_mul_m       ... bench:          13 ns/iter (+/- 0)
cargo bench _bench_matrix3_mul_m                   (0.17.0)
>> test _bench_matrix3_mul_m       ... bench:          4 ns/iter (+/- 0)

There was no significant increase for any of the other benchmarks regarding Matrix3

@peppidesu
Copy link

It seems the problem is already fixed on the master branch, getting 4ns results there.

@peppidesu
Copy link

peppidesu commented Apr 19, 2023

Found the issue:

cgmath/src/vector.rs

Lines 311 to 337 in 637c566

impl<S: BaseNum> ElementWise for $VectorN<S> {
#[inline] default_fn!( add_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field + rhs.$field),+) } );
#[inline] default_fn!( sub_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field - rhs.$field),+) } );
#[inline] default_fn!( mul_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field * rhs.$field),+) } );
#[inline] default_fn!( div_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field / rhs.$field),+) } );
#[inline] fn rem_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field % rhs.$field),+) }
#[inline] default_fn!( add_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field += rhs.$field);+ } );
#[inline] default_fn!( sub_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field -= rhs.$field);+ } );
#[inline] default_fn!( mul_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field *= rhs.$field);+ } );
#[inline] default_fn!( div_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field /= rhs.$field);+ } );
#[inline] fn rem_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field %= rhs.$field);+ }
}
impl<S: BaseNum> ElementWise<S> for $VectorN<S> {
#[inline] default_fn!( add_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field + rhs),+) } );
#[inline] default_fn!( sub_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field - rhs),+) } );
#[inline] default_fn!( mul_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field * rhs),+) } );
#[inline] default_fn!( div_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field / rhs),+) } );
#[inline] fn rem_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field % rhs),+) }
#[inline] default_fn!( add_assign_element_wise(&mut self, rhs: S) { $(self.$field += rhs);+ } );
#[inline] default_fn!( sub_assign_element_wise(&mut self, rhs: S) { $(self.$field -= rhs);+ } );
#[inline] default_fn!( mul_assign_element_wise(&mut self, rhs: S) { $(self.$field *= rhs);+ } );
#[inline] default_fn!( div_assign_element_wise(&mut self, rhs: S) { $(self.$field /= rhs);+ } );
#[inline] fn rem_assign_element_wise(&mut self, rhs: S) { $(self.$field %= rhs);+ }
}

This is on v0.18.0. The addition of default_fn! prevents #[inline] from working, which is causing the slowdown. This was fixed in #548 by moving the #[inline] into the macro.

@kvark
Copy link
Collaborator

kvark commented May 30, 2023

Great, thank you for investigation!

@kvark kvark closed this as completed May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants