-
Notifications
You must be signed in to change notification settings - Fork 24
Optimize binary operations on bigints #26
Conversation
|
|
||
| ret[i] = res2; | ||
| carry = overflow1 as u64 + overflow2 as u64; | ||
| if carry != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to check carry here?
why not just
let (res1, overflow1) = ($fn)(me[i], you[i]);
let (res2, overflow2) = ($fn)(res1, carry);
<ptr::write...>
let carry = overflow1 | overflow2
since overflow can happen only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, because the benchmarks told me to. I'll check them again to make certain it's a win though.
| rustc-hex = { version = "1.0", optional = true } | ||
| heapsize = { version = "0.4", optional = true } | ||
| byteorder = { version = "1", default-features = false } | ||
| crunchy = "0.1.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your crunchy dependency introduced must have non-std variant
did you try to compile bigint with --no-default-features ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, thanks. It doesn't even require a core dependency, it's 100% macro code.
EDIT: Looks like it doesn't matter, compiling with --no-default-features works even with the version of crunchy that doesn't include #![no_std]. I guess it's inferred(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, leave it as it is
I will update myself if something goes wrong in no-std binaries
This is one of the first times that I've seen non-marginal wins from loop unrolling and inlining, each of them doubling the speed in this case. This (as well as significant algorithmic improvements) now causes the Rust versions of all operations to be significantly faster than the inline-asm ones. Yes, even the asm implementations of
U256::addandU256::sub, which were 4 instructions each. I have no good answer as to why that is, I'll check out the disassembly as part of a write-up/investigation. My guess is vectorization, although in terms of explaining optimisations that's about 1 level above "a wizard did it".Benchmarks (using the updated benchmarks included in this PR)
Before (Rust):
Before (asm):
After (Rust):
Benchcmp results (Before (Rust) vs After (Rust)):
Benchcmp results (Before (asm) vs After (Rust)):
I should note that the
u256benchmarks are the ones to look at, since that's what we actually use in Parity.