Matrix3 multiplication is 4x slower in 0.18 #543

KeyboardDanni · 2022-03-10T00:29:03Z

AMD Ryzen 7 3700X
Windows 10 19044.1566
rustc 1.60.0-nightly (1e12aef3f 2022-02-13)

The other day I was testing sprite performance in my engine and noticed it was a lot lower than usual. Investigated the cause and found it was when I updated cgmath from 0.17 to 0.18.

With 0.17, my benchmark that does 100k translations, rotations, and scales on a Matrix3:

test bench_transform_matrix ... bench: 670,110 ns/iter (+/- 13,076)

With 0.18:

test bench_transform_matrix ... bench: 2,755,590 ns/iter (+/- 10,904)

The bench itself
Code for the transform wrapper

Since the transform happens for each sprite, the performance difference adds up quickly.

Using default features with -O2/-O3. Compiler target is x86_64-pc-windows-msvc.

The text was updated successfully, but these errors were encountered:

kvark · 2022-03-11T05:36:01Z

Wow that's quite concerning! Thank you for bringing this up.

peppidesu · 2023-04-19T16:56:36Z

comparing the benchmarks provided by the crate shows a similar (3x) increase:

cargo bench --feature rand _bench_matrix3_mul_m    (0.18.0)
>> test _bench_matrix3_mul_m       ... bench:          13 ns/iter (+/- 0)
cargo bench _bench_matrix3_mul_m                   (0.17.0)
>> test _bench_matrix3_mul_m       ... bench:          4 ns/iter (+/- 0)

There was no significant increase for any of the other benchmarks regarding Matrix3

peppidesu · 2023-04-19T17:42:26Z

It seems the problem is already fixed on the master branch, getting 4ns results there.

peppidesu · 2023-04-19T17:47:40Z

Found the issue:

cgmath/src/vector.rs

Lines 311 to 337 in 637c566

    
           impl<S: BaseNum> ElementWise for $VectorN<S> { 
        
               #[inline] default_fn!( add_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field + rhs.$field),+) } ); 
        
               #[inline] default_fn!( sub_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field - rhs.$field),+) } ); 
        
               #[inline] default_fn!( mul_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field * rhs.$field),+) } ); 
        
               #[inline] default_fn!( div_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field / rhs.$field),+) } ); 
        
               #[inline] fn rem_element_wise(self, rhs: $VectorN<S>) -> $VectorN<S> { $VectorN::new($(self.$field % rhs.$field),+) } 
        
               #[inline] default_fn!( add_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field += rhs.$field);+ } ); 
        
               #[inline] default_fn!( sub_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field -= rhs.$field);+ } ); 
        
               #[inline] default_fn!( mul_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field *= rhs.$field);+ } ); 
        
               #[inline] default_fn!( div_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field /= rhs.$field);+ } ); 
        
               #[inline] fn rem_assign_element_wise(&mut self, rhs: $VectorN<S>) { $(self.$field %= rhs.$field);+ } 
        
           } 
        
           impl<S: BaseNum> ElementWise<S> for $VectorN<S> { 
        
               #[inline] default_fn!( add_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field + rhs),+) } ); 
        
               #[inline] default_fn!( sub_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field - rhs),+) } ); 
        
               #[inline] default_fn!( mul_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field * rhs),+) } ); 
        
               #[inline] default_fn!( div_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field / rhs),+) } ); 
        
               #[inline] fn rem_element_wise(self, rhs: S) -> $VectorN<S> { $VectorN::new($(self.$field % rhs),+) } 
        
               #[inline] default_fn!( add_assign_element_wise(&mut self, rhs: S) { $(self.$field += rhs);+ } ); 
        
               #[inline] default_fn!( sub_assign_element_wise(&mut self, rhs: S) { $(self.$field -= rhs);+ } ); 
        
               #[inline] default_fn!( mul_assign_element_wise(&mut self, rhs: S) { $(self.$field *= rhs);+ } ); 
        
               #[inline] default_fn!( div_assign_element_wise(&mut self, rhs: S) { $(self.$field /= rhs);+ } ); 
        
               #[inline] fn rem_assign_element_wise(&mut self, rhs: S) { $(self.$field %= rhs);+ } 
        
           }

This is on v0.18.0. The addition of default_fn! prevents #[inline] from working, which is causing the slowdown. This was fixed in #548 by moving the #[inline] into the macro.

kvark · 2023-05-30T16:05:27Z

Great, thank you for investigation!

kvark added the bug label Mar 11, 2022

kvark closed this as completed May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix3 multiplication is 4x slower in 0.18 #543

Matrix3 multiplication is 4x slower in 0.18 #543

KeyboardDanni commented Mar 10, 2022

kvark commented Mar 11, 2022

peppidesu commented Apr 19, 2023 •

edited

Loading

peppidesu commented Apr 19, 2023

peppidesu commented Apr 19, 2023 •

edited

Loading

kvark commented May 30, 2023

Matrix3 multiplication is 4x slower in 0.18 #543

Matrix3 multiplication is 4x slower in 0.18 #543

Comments

KeyboardDanni commented Mar 10, 2022

kvark commented Mar 11, 2022

peppidesu commented Apr 19, 2023 • edited Loading

peppidesu commented Apr 19, 2023

peppidesu commented Apr 19, 2023 • edited Loading

kvark commented May 30, 2023

peppidesu commented Apr 19, 2023 •

edited

Loading

peppidesu commented Apr 19, 2023 •

edited

Loading