Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use u128 for multiplication, remove asm #38

Merged
merged 3 commits into from May 11, 2018

Conversation

Projects
None yet
4 participants
@Vurich
Copy link
Contributor

Vurich commented May 11, 2018

This uses the (now stable!) u128 type for multiplication. Additionally, the inline assembly is no longer the fastest option (see benchcmp results below) so I removed it completely.

Benchcmp results (existing u64-based impl for stable vs u128):

 name            u64.bench ns/iter  u128.bench ns/iter  diff ns/iter   diff %  speedup 
 u128_mul        63,974             22,677                   -41,297  -64.55%   x 2.82 
 u256_full_mul   243,159            73,416                  -169,743  -69.81%   x 3.31 
 u256_mul        268,750            85,797                  -182,953  -68.08%   x 3.13 
 u256_mul_small  1,608              558                       -1,050  -65.30%   x 2.88  
 u512_mul        1,103,888          365,976                 -737,912  -66.85%   x 3.02 
 u512_mul_small  10,459             3,942                     -6,517  -62.31%   x 2.65 

Benchcmp results (existing asm-based impl for nightly vs u128):

 name            inline_asm.bench ns/iter  u128.bench ns/iter  diff ns/iter   diff %  speedup 
 u256_mul        95,843                    85,797                   -10,046  -10.48%   x 1.12 
 u256_mul_small  789                       558                         -231  -29.28%   x 1.41 

@Vurich Vurich requested a review from NikVolf May 11, 2018

@Vurich

This comment has been minimized.

Copy link
Contributor Author

Vurich commented May 11, 2018

I haven't attempted to use u128 for addition or subtraction (or to store the numbers using u128), since I knew that the biggest win would be multiplication and I wanted to get that pushed first.

@debris

debris approved these changes May 11, 2018

src/uint.rs Outdated
if x.overflowing_pow($uint_ty::from(2)).1 || x.overflowing_pow($uint_ty::from(3)).1 {
return TestResult::discard();
}
let (p2, o) = x.overflowing_pow($uint_ty::from(2));

This comment has been minimized.

@NikVolf

NikVolf May 11, 2018

Member

some strange whitespaces

@NikVolf
Copy link
Member

NikVolf left a comment

Amazing!

@Vurich

This comment has been minimized.

Copy link
Contributor Author

Vurich commented May 11, 2018

Here's the generated asm for U256::full_mul, looks like there's still room for improvement

(lldb) dis -n disassemble
libbigint.so`disassemble:
libbigint.so[0x58750] <+0>:   pushq  %rbp
libbigint.so[0x58751] <+1>:   movq   %rsp, %rbp
libbigint.so[0x58754] <+4>:   pushq  %r15
libbigint.so[0x58756] <+6>:   pushq  %r14
libbigint.so[0x58758] <+8>:   pushq  %r13
libbigint.so[0x5875a] <+10>:  pushq  %r12
libbigint.so[0x5875c] <+12>:  pushq  %rbx
libbigint.so[0x5875d] <+13>:  subq   $0x18, %rsp
libbigint.so[0x58761] <+17>:  movq   0x10(%rbp), %r8
libbigint.so[0x58765] <+21>:  movq   0x18(%rbp), %r14
libbigint.so[0x58769] <+25>:  movq   0x20(%rbp), %r13
libbigint.so[0x5876d] <+29>:  movq   0x28(%rbp), %r15
libbigint.so[0x58771] <+33>:  movq   0x30(%rbp), %rsi
libbigint.so[0x58775] <+37>:  movq   0x38(%rbp), %r12
libbigint.so[0x58779] <+41>:  movq   %rsi, %rax
libbigint.so[0x5877c] <+44>:  mulq   %r8
libbigint.so[0x5877f] <+47>:  movq   %rdx, %rcx
libbigint.so[0x58782] <+50>:  movq   0x40(%rbp), %rdx
libbigint.so[0x58786] <+54>:  movq   %rdx, -0x30(%rbp)
libbigint.so[0x5878a] <+58>:  testq  %r8, %r8
libbigint.so[0x5878d] <+61>:  cmoveq %r8, %rcx
libbigint.so[0x58791] <+65>:  movq   0x48(%rbp), %rbx
libbigint.so[0x58795] <+69>:  cmoveq %r8, %rax
libbigint.so[0x58799] <+73>:  movq   %rax, -0x38(%rbp)
libbigint.so[0x5879d] <+77>:  testq  %r14, %r14
libbigint.so[0x587a0] <+80>:  je     0x5895d                   ; <+525> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 456 at lib.rs:34
libbigint.so[0x587a6] <+86>:  movq   %rsi, %rax
libbigint.so[0x587a9] <+89>:  mulq   %r14
libbigint.so[0x587ac] <+92>:  movq   %rdx, %r11
libbigint.so[0x587af] <+95>:  addq   %rax, %rcx
libbigint.so[0x587b2] <+98>:  adcq   $0x0, %r11
libbigint.so[0x587b6] <+102>: testq  %r13, %r13
libbigint.so[0x587b9] <+105>: je     0x58969                   ; <+537> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 468 at lib.rs:34
libbigint.so[0x587bf] <+111>: movq   %rsi, %rax
libbigint.so[0x587c2] <+114>: mulq   %r13
libbigint.so[0x587c5] <+117>: movq   %rdx, %r10
libbigint.so[0x587c8] <+120>: addq   %rax, %r11
libbigint.so[0x587cb] <+123>: adcq   $0x0, %r10
libbigint.so[0x587cf] <+127>: testq  %r15, %r15
libbigint.so[0x587d2] <+130>: je     0x58975                   ; <+549> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 480 at lib.rs:34
libbigint.so[0x587d8] <+136>: movq   %rsi, %rax
libbigint.so[0x587db] <+139>: mulq   %r15
libbigint.so[0x587de] <+142>: movq   %rdx, %rsi
libbigint.so[0x587e1] <+145>: addq   %rax, %r10
libbigint.so[0x587e4] <+148>: adcq   $0x0, %rsi
libbigint.so[0x587e8] <+152>: testq  %r8, %r8
libbigint.so[0x587eb] <+155>: je     0x58980                   ; <+560> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 491 at lib.rs:34
libbigint.so[0x587f1] <+161>: movq   %r12, %rax
libbigint.so[0x587f4] <+164>: mulq   %r8
libbigint.so[0x587f7] <+167>: addq   %rax, %rcx
libbigint.so[0x587fa] <+170>: adcq   %rdx, %r11
libbigint.so[0x587fd] <+173>: setb   %al
libbigint.so[0x58800] <+176>: movzbl %al, %r9d
libbigint.so[0x58804] <+180>: movq   %rcx, -0x40(%rbp)
libbigint.so[0x58808] <+184>: movq   %r9, %rax
libbigint.so[0x5880b] <+187>: orq    %r14, %rax
libbigint.so[0x5880e] <+190>: je     0x5882b                   ; <+219> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 150 at lib.rs:34
libbigint.so[0x58810] <+192>: movq   %r12, %rax
libbigint.so[0x58813] <+195>: mulq   %r14
libbigint.so[0x58816] <+198>: addq   %rax, %r11
libbigint.so[0x58819] <+201>: adcq   %rdx, %r9
libbigint.so[0x5881c] <+204>: setb   %al
libbigint.so[0x5881f] <+207>: addq   %r9, %r10
libbigint.so[0x58822] <+210>: setb   %cl
libbigint.so[0x58825] <+213>: orb    %al, %cl
libbigint.so[0x58827] <+215>: movzbl %cl, %r9d
libbigint.so[0x5882b] <+219>: movq   %r9, %rax
libbigint.so[0x5882e] <+222>: orq    %r13, %rax
libbigint.so[0x58831] <+225>: je     0x5884e                   ; <+254> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 185 at lib.rs:34
libbigint.so[0x58833] <+227>: movq   %r12, %rax
libbigint.so[0x58836] <+230>: mulq   %r13
libbigint.so[0x58839] <+233>: addq   %rax, %r10
libbigint.so[0x5883c] <+236>: adcq   %rdx, %r9
libbigint.so[0x5883f] <+239>: setb   %al
libbigint.so[0x58842] <+242>: addq   %r9, %rsi
libbigint.so[0x58845] <+245>: setb   %cl
libbigint.so[0x58848] <+248>: orb    %al, %cl
libbigint.so[0x5884a] <+250>: movzbl %cl, %r9d
libbigint.so[0x5884e] <+254>: movq   %r9, %rax
libbigint.so[0x58851] <+257>: orq    %r15, %rax
libbigint.so[0x58854] <+260>: je     0x58998                   ; <+584> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 515 at lib.rs:34
libbigint.so[0x5885a] <+266>: movq   %r12, %rax
libbigint.so[0x5885d] <+269>: mulq   %r15
libbigint.so[0x58860] <+272>: addq   %rax, %rsi
libbigint.so[0x58863] <+275>: adcq   %rdx, %r9
libbigint.so[0x58866] <+278>: movq   %rbx, %r12
libbigint.so[0x58869] <+281>: testq  %r8, %r8
libbigint.so[0x5886c] <+284>: je     0x589a7                   ; <+599> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 530 at lib.rs:34
libbigint.so[0x58872] <+290>: movq   -0x30(%rbp), %rax
libbigint.so[0x58876] <+294>: mulq   %r8
libbigint.so[0x58879] <+297>: addq   %rax, %r11
libbigint.so[0x5887c] <+300>: adcq   %rdx, %r10
libbigint.so[0x5887f] <+303>: setb   %al
libbigint.so[0x58882] <+306>: movzbl %al, %ebx
libbigint.so[0x58885] <+309>: movq   %rbx, %rax
libbigint.so[0x58888] <+312>: orq    %r14, %rax
libbigint.so[0x5888b] <+315>: je     0x588a8                   ; <+344> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 275 at lib.rs:34
libbigint.so[0x5888d] <+317>: movq   -0x30(%rbp), %rax
libbigint.so[0x58891] <+321>: mulq   %r14
libbigint.so[0x58894] <+324>: addq   %rax, %r10
libbigint.so[0x58897] <+327>: adcq   %rdx, %rbx
libbigint.so[0x5889a] <+330>: setb   %al
libbigint.so[0x5889d] <+333>: addq   %rbx, %rsi
libbigint.so[0x588a0] <+336>: setb   %cl
libbigint.so[0x588a3] <+339>: orb    %al, %cl
libbigint.so[0x588a5] <+341>: movzbl %cl, %ebx
libbigint.so[0x588a8] <+344>: movq   %rbx, %rax
libbigint.so[0x588ab] <+347>: orq    %r13, %rax
libbigint.so[0x588ae] <+350>: je     0x588cb                   ; <+379> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 310 at lib.rs:34
libbigint.so[0x588b0] <+352>: movq   -0x30(%rbp), %rax
libbigint.so[0x588b4] <+356>: mulq   %r13
libbigint.so[0x588b7] <+359>: addq   %rax, %rsi
libbigint.so[0x588ba] <+362>: adcq   %rdx, %rbx
libbigint.so[0x588bd] <+365>: setb   %al
libbigint.so[0x588c0] <+368>: addq   %rbx, %r9
libbigint.so[0x588c3] <+371>: setb   %cl
libbigint.so[0x588c6] <+374>: orb    %al, %cl
libbigint.so[0x588c8] <+376>: movzbl %cl, %ebx
libbigint.so[0x588cb] <+379>: movq   %rbx, %rax
libbigint.so[0x588ce] <+382>: orq    %r15, %rax
libbigint.so[0x588d1] <+385>: je     0x589ba                   ; <+618> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 549 at lib.rs:34
libbigint.so[0x588d7] <+391>: movq   -0x30(%rbp), %rax
libbigint.so[0x588db] <+395>: mulq   %r15
libbigint.so[0x588de] <+398>: addq   %rax, %r9
libbigint.so[0x588e1] <+401>: adcq   %rdx, %rbx
libbigint.so[0x588e4] <+404>: testq  %r8, %r8
libbigint.so[0x588e7] <+407>: je     0x589c5                   ; <+629> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 560 at lib.rs:34
libbigint.so[0x588ed] <+413>: movq   %r12, %rax
libbigint.so[0x588f0] <+416>: mulq   %r8
libbigint.so[0x588f3] <+419>: addq   %rax, %r10
libbigint.so[0x588f6] <+422>: adcq   %rdx, %rsi
libbigint.so[0x588f9] <+425>: setb   %al
libbigint.so[0x588fc] <+428>: movzbl %al, %ecx
libbigint.so[0x588ff] <+431>: movq   %rcx, %rax
libbigint.so[0x58902] <+434>: orq    %r14, %rax
libbigint.so[0x58905] <+437>: je     0x58921                   ; <+465> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 396 at lib.rs:34
libbigint.so[0x58907] <+439>: movq   %r12, %rax
libbigint.so[0x5890a] <+442>: mulq   %r14
libbigint.so[0x5890d] <+445>: addq   %rax, %rsi
libbigint.so[0x58910] <+448>: adcq   %rdx, %rcx
libbigint.so[0x58913] <+451>: setb   %al
libbigint.so[0x58916] <+454>: addq   %rcx, %r9
libbigint.so[0x58919] <+457>: setb   %cl
libbigint.so[0x5891c] <+460>: orb    %al, %cl
libbigint.so[0x5891e] <+462>: movzbl %cl, %ecx
libbigint.so[0x58921] <+465>: movq   %rcx, %rax
libbigint.so[0x58924] <+468>: orq    %r13, %rax
libbigint.so[0x58927] <+471>: je     0x58943                   ; <+499> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 430 at lib.rs:34
libbigint.so[0x58929] <+473>: movq   %r12, %rax
libbigint.so[0x5892c] <+476>: mulq   %r13
libbigint.so[0x5892f] <+479>: addq   %rax, %r9
libbigint.so[0x58932] <+482>: adcq   %rdx, %rcx
libbigint.so[0x58935] <+485>: setb   %al
libbigint.so[0x58938] <+488>: addq   %rcx, %rbx
libbigint.so[0x5893b] <+491>: setb   %cl
libbigint.so[0x5893e] <+494>: orb    %al, %cl
libbigint.so[0x58940] <+496>: movzbl %cl, %ecx
libbigint.so[0x58943] <+499>: movq   %rcx, %rax
libbigint.so[0x58946] <+502>: orq    %r15, %rax
libbigint.so[0x58949] <+505>: je     0x589d8                   ; <+648> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 579 at lib.rs:34
libbigint.so[0x5894f] <+511>: movq   %r12, %rax
libbigint.so[0x58952] <+514>: mulq   %r15
libbigint.so[0x58955] <+517>: addq   %rax, %rbx
libbigint.so[0x58958] <+520>: adcq   %rdx, %rcx
libbigint.so[0x5895b] <+523>: jmp    0x589da                   ; <+650> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 581 at lib.rs:34
libbigint.so[0x5895d] <+525>: xorl   %r11d, %r11d
libbigint.so[0x58960] <+528>: testq  %r13, %r13
libbigint.so[0x58963] <+531>: jne    0x587bf                   ; <+111> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 42 at lib.rs:34
libbigint.so[0x58969] <+537>: xorl   %r10d, %r10d
libbigint.so[0x5896c] <+540>: testq  %r15, %r15
libbigint.so[0x5896f] <+543>: jne    0x587d8                   ; <+136> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 67 at lib.rs:34
libbigint.so[0x58975] <+549>: xorl   %esi, %esi
libbigint.so[0x58977] <+551>: testq  %r8, %r8
libbigint.so[0x5897a] <+554>: jne    0x587f1                   ; <+161> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 92 at lib.rs:34
libbigint.so[0x58980] <+560>: xorl   %r9d, %r9d
libbigint.so[0x58983] <+563>: movq   %rcx, -0x40(%rbp)
libbigint.so[0x58987] <+567>: movq   %r9, %rax
libbigint.so[0x5898a] <+570>: orq    %r14, %rax
libbigint.so[0x5898d] <+573>: jne    0x58810                   ; <+192> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 123 at lib.rs:34
libbigint.so[0x58993] <+579>: jmp    0x5882b                   ; <+219> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 150 at lib.rs:34
libbigint.so[0x58998] <+584>: xorl   %r9d, %r9d
libbigint.so[0x5899b] <+587>: movq   %rbx, %r12
libbigint.so[0x5899e] <+590>: testq  %r8, %r8
libbigint.so[0x589a1] <+593>: jne    0x58872                   ; <+290> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 221 at lib.rs:34
libbigint.so[0x589a7] <+599>: xorl   %ebx, %ebx
libbigint.so[0x589a9] <+601>: movq   %rbx, %rax
libbigint.so[0x589ac] <+604>: orq    %r14, %rax
libbigint.so[0x589af] <+607>: jne    0x5888d                   ; <+317> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 248 at lib.rs:34
libbigint.so[0x589b5] <+613>: jmp    0x588a8                   ; <+344> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 275 at lib.rs:34
libbigint.so[0x589ba] <+618>: xorl   %ebx, %ebx
libbigint.so[0x589bc] <+620>: testq  %r8, %r8
libbigint.so[0x589bf] <+623>: jne    0x588ed                   ; <+413> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 344 at lib.rs:34
libbigint.so[0x589c5] <+629>: xorl   %ecx, %ecx
libbigint.so[0x589c7] <+631>: movq   %rcx, %rax
libbigint.so[0x589ca] <+634>: orq    %r14, %rax
libbigint.so[0x589cd] <+637>: jne    0x58907                   ; <+439> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 370 at lib.rs:34
libbigint.so[0x589d3] <+643>: jmp    0x58921                   ; <+465> [inlined] bigint::uint::U256::full_mul::hbc51c1594ac73c0f + 396 at lib.rs:34
libbigint.so[0x589d8] <+648>: xorl   %ecx, %ecx
libbigint.so[0x589da] <+650>: movq   -0x38(%rbp), %rax
libbigint.so[0x589de] <+654>: movq   %rax, (%rdi)
libbigint.so[0x589e1] <+657>: movq   -0x40(%rbp), %rax
libbigint.so[0x589e5] <+661>: movq   %rax, 0x8(%rdi)
libbigint.so[0x589e9] <+665>: movq   %r11, 0x10(%rdi)
libbigint.so[0x589ed] <+669>: movq   %r10, 0x18(%rdi)
libbigint.so[0x589f1] <+673>: movq   %rsi, 0x20(%rdi)
libbigint.so[0x589f5] <+677>: movq   %r9, 0x28(%rdi)
libbigint.so[0x589f9] <+681>: movq   %rbx, 0x30(%rdi)
libbigint.so[0x589fd] <+685>: movq   %rcx, 0x38(%rdi)
libbigint.so[0x58a01] <+689>: movq   %rdi, %rax
libbigint.so[0x58a04] <+692>: addq   $0x18, %rsp
libbigint.so[0x58a08] <+696>: popq   %rbx
libbigint.so[0x58a09] <+697>: popq   %r12
libbigint.so[0x58a0b] <+699>: popq   %r13
libbigint.so[0x58a0d] <+701>: popq   %r14
libbigint.so[0x58a0f] <+703>: popq   %r15
libbigint.so[0x58a11] <+705>: popq   %rbp
libbigint.so[0x58a12] <+706>: retq   

@Vurich Vurich force-pushed the Vurich:master branch from 497009a to b7582db May 11, 2018

@Vurich Vurich force-pushed the Vurich:master branch from b7582db to c7480f1 May 11, 2018

@Vurich

This comment has been minimized.

Copy link
Contributor Author

Vurich commented May 11, 2018

That's more like it

libbigint.so`disassemble:
libbigint.so[0x580e0] <+0>:   pushq  %rbp
libbigint.so[0x580e1] <+1>:   movq   %rsp, %rbp
libbigint.so[0x580e4] <+4>:   pushq  %r15
libbigint.so[0x580e6] <+6>:   pushq  %r14
libbigint.so[0x580e8] <+8>:   pushq  %r13
libbigint.so[0x580ea] <+10>:  pushq  %r12
libbigint.so[0x580ec] <+12>:  pushq  %rbx
libbigint.so[0x580ed] <+13>:  subq   $0x48, %rsp
libbigint.so[0x580f1] <+17>:  movq   0x10(%rbp), %r11
libbigint.so[0x580f5] <+21>:  movq   0x18(%rbp), %rsi
libbigint.so[0x580f9] <+25>:  movq   %rsi, -0x38(%rbp)
libbigint.so[0x580fd] <+29>:  movq   0x30(%rbp), %rcx
libbigint.so[0x58101] <+33>:  movq   %rcx, %rax
libbigint.so[0x58104] <+36>:  mulq   %r11
libbigint.so[0x58107] <+39>:  movq   %rdx, %r9
libbigint.so[0x5810a] <+42>:  movq   %rax, -0x70(%rbp)
libbigint.so[0x5810e] <+46>:  movq   %rcx, %rax
libbigint.so[0x58111] <+49>:  mulq   %rsi
libbigint.so[0x58114] <+52>:  movq   %rax, %rbx
libbigint.so[0x58117] <+55>:  movq   %rdx, %r8
libbigint.so[0x5811a] <+58>:  movq   0x20(%rbp), %rsi
libbigint.so[0x5811e] <+62>:  movq   %rcx, %rax
libbigint.so[0x58121] <+65>:  mulq   %rsi
libbigint.so[0x58124] <+68>:  movq   %rsi, %r13
libbigint.so[0x58127] <+71>:  movq   %r13, -0x40(%rbp)
libbigint.so[0x5812b] <+75>:  movq   %rax, %r10
libbigint.so[0x5812e] <+78>:  movq   %rdx, %r14
libbigint.so[0x58131] <+81>:  movq   0x28(%rbp), %rdx
libbigint.so[0x58135] <+85>:  movq   %rdx, -0x48(%rbp)
libbigint.so[0x58139] <+89>:  movq   %rcx, %rax
libbigint.so[0x5813c] <+92>:  mulq   %rdx
libbigint.so[0x5813f] <+95>:  movq   %rax, %r12
libbigint.so[0x58142] <+98>:  movq   %rdx, -0x58(%rbp)
libbigint.so[0x58146] <+102>: movq   0x38(%rbp), %r15
libbigint.so[0x5814a] <+106>: movq   %r15, %rax
libbigint.so[0x5814d] <+109>: mulq   %r11
libbigint.so[0x58150] <+112>: addq   %r9, %rbx
libbigint.so[0x58153] <+115>: adcq   %r8, %r10
libbigint.so[0x58156] <+118>: pushfq 
libbigint.so[0x58157] <+119>: popq   %rcx
libbigint.so[0x58158] <+120>: addq   %rax, %rbx
libbigint.so[0x5815b] <+123>: movq   %rbx, -0x68(%rbp)
libbigint.so[0x5815f] <+127>: adcq   %rdx, %r10
libbigint.so[0x58162] <+130>: pushfq 
libbigint.so[0x58163] <+131>: popq   %r8
libbigint.so[0x58165] <+133>: pushq  %rcx
libbigint.so[0x58166] <+134>: popfq  
libbigint.so[0x58167] <+135>: adcq   %r14, %r12
libbigint.so[0x5816a] <+138>: pushfq 
libbigint.so[0x5816b] <+139>: popq   %rax
libbigint.so[0x5816c] <+140>: movq   %rax, -0x50(%rbp)
libbigint.so[0x58170] <+144>: movq   %r15, %rax
libbigint.so[0x58173] <+147>: movq   -0x38(%rbp), %rsi
libbigint.so[0x58177] <+151>: mulq   %rsi
libbigint.so[0x5817a] <+154>: movq   %rdx, %rbx
libbigint.so[0x5817d] <+157>: movq   %rax, %r9
libbigint.so[0x58180] <+160>: addq   %r10, %r9
libbigint.so[0x58183] <+163>: adcq   $0x0, %rbx
libbigint.so[0x58187] <+167>: pushq  %r8
libbigint.so[0x58189] <+169>: popfq  
libbigint.so[0x5818a] <+170>: adcq   $0x0, %rbx
libbigint.so[0x5818e] <+174>: setb   -0x29(%rbp)
libbigint.so[0x58192] <+178>: addq   %r12, %rbx
libbigint.so[0x58195] <+181>: setb   %r10b
libbigint.so[0x58199] <+185>: movq   %r15, %rax
libbigint.so[0x5819c] <+188>: mulq   %r13
libbigint.so[0x5819f] <+191>: movq   %rax, %r12
libbigint.so[0x581a2] <+194>: movq   %rdx, %r8
libbigint.so[0x581a5] <+197>: movq   0x40(%rbp), %r14
libbigint.so[0x581a9] <+201>: movq   %r14, %rax
libbigint.so[0x581ac] <+204>: mulq   %r11
libbigint.so[0x581af] <+207>: movq   %rdx, %r13
libbigint.so[0x581b2] <+210>: movq   %rax, %rcx
libbigint.so[0x581b5] <+213>: movq   %r14, %rax
libbigint.so[0x581b8] <+216>: mulq   %rsi
libbigint.so[0x581bb] <+219>: movq   %rdx, %rsi
libbigint.so[0x581be] <+222>: addq   %r9, %rcx
libbigint.so[0x581c1] <+225>: movq   %rcx, -0x60(%rbp)
libbigint.so[0x581c5] <+229>: leaq   (%r12,%rbx), %rcx
libbigint.so[0x581c9] <+233>: adcq   %rcx, %r13
libbigint.so[0x581cc] <+236>: pushfq 
libbigint.so[0x581cd] <+237>: popq   %rcx
libbigint.so[0x581ce] <+238>: addq   %rax, %r13
libbigint.so[0x581d1] <+241>: adcq   $0x0, %rsi
libbigint.so[0x581d5] <+245>: pushq  %rcx
libbigint.so[0x581d6] <+246>: popfq  
libbigint.so[0x581d7] <+247>: adcq   $0x0, %rsi
libbigint.so[0x581db] <+251>: setb   -0x2a(%rbp)
libbigint.so[0x581df] <+255>: orb    -0x29(%rbp), %r10b
libbigint.so[0x581e3] <+259>: addq   %r12, %rbx
libbigint.so[0x581e6] <+262>: movzbl %r10b, %ebx
libbigint.so[0x581ea] <+266>: adcq   %r8, %rbx
libbigint.so[0x581ed] <+269>: setb   %al
libbigint.so[0x581f0] <+272>: movq   -0x50(%rbp), %rcx
libbigint.so[0x581f4] <+276>: pushq  %rcx
libbigint.so[0x581f5] <+277>: popfq  
libbigint.so[0x581f6] <+278>: adcq   -0x58(%rbp), %rbx
libbigint.so[0x581fa] <+282>: setb   %r8b
libbigint.so[0x581fe] <+286>: orb    %al, %r8b
libbigint.so[0x58201] <+289>: movq   %r15, %rax
libbigint.so[0x58204] <+292>: mulq   -0x48(%rbp)
libbigint.so[0x58208] <+296>: movq   %rdx, %r12
libbigint.so[0x5820b] <+299>: movq   %rax, %rcx
libbigint.so[0x5820e] <+302>: addq   %rbx, %rcx
libbigint.so[0x58211] <+305>: movzbl %r8b, %eax
libbigint.so[0x58215] <+309>: adcq   %rax, %r12
libbigint.so[0x58218] <+312>: addq   %rsi, %rcx
libbigint.so[0x5821b] <+315>: setb   %r10b
libbigint.so[0x5821f] <+319>: movq   %r14, %rax
libbigint.so[0x58222] <+322>: mulq   -0x40(%rbp)
libbigint.so[0x58226] <+326>: movq   %rax, %r8
libbigint.so[0x58229] <+329>: movq   %rdx, %rsi
libbigint.so[0x5822c] <+332>: movq   0x48(%rbp), %r15
libbigint.so[0x58230] <+336>: movq   %r15, %rax
libbigint.so[0x58233] <+339>: mulq   %r11
libbigint.so[0x58236] <+342>: movq   %rdx, %r9
libbigint.so[0x58239] <+345>: movq   %rax, %r11
libbigint.so[0x5823c] <+348>: movq   %r15, %rax
libbigint.so[0x5823f] <+351>: mulq   -0x38(%rbp)
libbigint.so[0x58243] <+355>: movq   %rdx, %rbx
libbigint.so[0x58246] <+358>: addq   %r13, %r11
libbigint.so[0x58249] <+361>: leaq   (%r8,%rcx), %rdx
libbigint.so[0x5824d] <+365>: adcq   %rdx, %r9
libbigint.so[0x58250] <+368>: pushfq 
libbigint.so[0x58251] <+369>: popq   %rdx
libbigint.so[0x58252] <+370>: addq   %rax, %r9
libbigint.so[0x58255] <+373>: adcq   $0x0, %rbx
libbigint.so[0x58259] <+377>: pushq  %rdx
libbigint.so[0x5825a] <+378>: popfq  
libbigint.so[0x5825b] <+379>: adcq   $0x0, %rbx
libbigint.so[0x5825f] <+383>: setb   %r13b
libbigint.so[0x58263] <+387>: orb    -0x2a(%rbp), %r10b
libbigint.so[0x58267] <+391>: addq   %r8, %rcx
libbigint.so[0x5826a] <+394>: movzbl %r10b, %ecx
libbigint.so[0x5826e] <+398>: adcq   %rsi, %rcx
libbigint.so[0x58271] <+401>: setb   %al
libbigint.so[0x58274] <+404>: addq   %r12, %rcx
libbigint.so[0x58277] <+407>: setb   %r8b
libbigint.so[0x5827b] <+411>: orb    %al, %r8b
libbigint.so[0x5827e] <+414>: movq   %r14, %rax
libbigint.so[0x58281] <+417>: movq   -0x48(%rbp), %r14
libbigint.so[0x58285] <+421>: mulq   %r14
libbigint.so[0x58288] <+424>: movq   %rdx, %r10
libbigint.so[0x5828b] <+427>: movq   %rax, %rsi
libbigint.so[0x5828e] <+430>: addq   %rcx, %rsi
libbigint.so[0x58291] <+433>: movzbl %r8b, %eax
libbigint.so[0x58295] <+437>: adcq   %rax, %r10
libbigint.so[0x58298] <+440>: addq   %rbx, %rsi
libbigint.so[0x5829b] <+443>: setb   %cl
libbigint.so[0x5829e] <+446>: orb    %r13b, %cl
libbigint.so[0x582a1] <+449>: movq   %r15, %rax
libbigint.so[0x582a4] <+452>: mulq   -0x40(%rbp)
libbigint.so[0x582a8] <+456>: movq   %rdx, %rbx
libbigint.so[0x582ab] <+459>: movq   %rax, %r8
libbigint.so[0x582ae] <+462>: addq   %rsi, %r8
libbigint.so[0x582b1] <+465>: movzbl %cl, %eax
libbigint.so[0x582b4] <+468>: adcq   %rax, %rbx
libbigint.so[0x582b7] <+471>: setb   %al
libbigint.so[0x582ba] <+474>: addq   %r10, %rbx
libbigint.so[0x582bd] <+477>: setb   %cl
libbigint.so[0x582c0] <+480>: orb    %al, %cl
libbigint.so[0x582c2] <+482>: movq   %r15, %rax
libbigint.so[0x582c5] <+485>: mulq   %r14
libbigint.so[0x582c8] <+488>: addq   %rbx, %rax
libbigint.so[0x582cb] <+491>: movzbl %cl, %ecx
libbigint.so[0x582ce] <+494>: adcq   %rcx, %rdx
libbigint.so[0x582d1] <+497>: movq   -0x70(%rbp), %rcx
libbigint.so[0x582d5] <+501>: movq   %rcx, (%rdi)
libbigint.so[0x582d8] <+504>: movq   -0x68(%rbp), %rcx
libbigint.so[0x582dc] <+508>: movq   %rcx, 0x8(%rdi)
libbigint.so[0x582e0] <+512>: movq   -0x60(%rbp), %rcx
libbigint.so[0x582e4] <+516>: movq   %rcx, 0x10(%rdi)
libbigint.so[0x582e8] <+520>: movq   %r11, 0x18(%rdi)
libbigint.so[0x582ec] <+524>: movq   %r9, 0x20(%rdi)
libbigint.so[0x582f0] <+528>: movq   %r8, 0x28(%rdi)
libbigint.so[0x582f4] <+532>: movq   %rax, 0x30(%rdi)
libbigint.so[0x582f8] <+536>: movq   %rdx, 0x38(%rdi)
libbigint.so[0x582fc] <+540>: movq   %rdi, %rax
libbigint.so[0x582ff] <+543>: addq   $0x48, %rsp
libbigint.so[0x58303] <+547>: popq   %rbx
libbigint.so[0x58304] <+548>: popq   %r12
libbigint.so[0x58306] <+550>: popq   %r13
libbigint.so[0x58308] <+552>: popq   %r14
libbigint.so[0x5830a] <+554>: popq   %r15
libbigint.so[0x5830c] <+556>: popq   %rbp
libbigint.so[0x5830d] <+557>: retq   
@Vurich

This comment has been minimized.

Copy link
Contributor Author

Vurich commented May 11, 2018

Benchcmp results before removing the zero check vs after (notice that small numbers are slower, but if you know you're going to multiply by a small number then you might as well use overflowing_mul_u32).

 name            u128.bench ns/iter  onlycheck512.bench ns/iter  diff ns/iter   diff %  speedup 
 u128_mul        22,677              20,588                            -2,089   -9.21%   x 1.10 
 u256_add        24,294              25,937                             1,643    6.76%   x 0.94 
 u256_from_be    6                   6                                      0    0.00%   x 1.00 
 u256_from_le    9                   9                                      0    0.00%   x 1.00 
 u256_full_mul   73,416              44,425                           -28,991  -39.49%   x 1.65 
 u256_mul        85,797              74,585                           -11,212  -13.07%   x 1.15 
 u256_mul_small  558                 635                                   77   13.80%   x 0.88 
 u256_sub        25,904              25,908                                 4    0.02%   x 1.00 
 u512_add        30,691              30,699                                 8    0.03%   x 1.00 
 u512_mul        365,976             365,511                             -465   -0.13%   x 1.00 
 u512_mul_small  3,942               3,938                                 -4   -0.10%   x 1.00 
 u512_sub        33,361              33,353                                -8   -0.02%   x 1.00 
@Vurich

This comment has been minimized.

Copy link
Contributor Author

Vurich commented May 11, 2018

I think that the speed increase over the inline assembly is probably due to the loads being spread out, which makes them easier to parallelise. Both the Rust inline ASM and the C inline ASM do all their loads upfront. It looks like LLVM's emitting some dark magic with pushf/popf too.

Compilers are really good.

@debris debris merged commit 44a3133 into paritytech:master May 11, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pefish

This comment has been minimized.

Copy link

pefish commented on e58c922 Aug 13, 2018

error[E0658]: 128-bit type is unstable

This comment has been minimized.

Copy link
Contributor Author

Vurich replied Aug 13, 2018

Run rustup update @pefish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.