-
Notifications
You must be signed in to change notification settings - Fork 24
Use u128 for multiplication, remove asm #38
Conversation
I haven't attempted to use |
src/uint.rs
Outdated
if x.overflowing_pow($uint_ty::from(2)).1 || x.overflowing_pow($uint_ty::from(3)).1 { | ||
return TestResult::discard(); | ||
} | ||
let (p2, o) = x.overflowing_pow($uint_ty::from(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some strange whitespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!
Here's the generated asm for
|
That's more like it
|
Benchcmp results before removing the zero check vs after (notice that small numbers are slower, but if you know you're going to multiply by a small number then you might as well use
|
I think that the speed increase over the inline assembly is probably due to the loads being spread out, which makes them easier to parallelise. Both the Rust inline ASM and the C inline ASM do all their loads upfront. It looks like LLVM's emitting some dark magic with Compilers are really good. |
This uses the (now stable!)
u128
type for multiplication. Additionally, the inline assembly is no longer the fastest option (see benchcmp results below) so I removed it completely.Benchcmp results (existing u64-based impl for stable vs u128):
Benchcmp results (existing asm-based impl for nightly vs u128):