Use 128-bit Widening Multiply on More Platforms #62

CryZe · 2025-07-07T15:26:30Z

The 128-bit widening multiplication was previously gated by simply checking the target pointer width. This works as a simple heuristic, but a better heuristic can be used:

Most 64-bit architectures except SPARC64 and Wasm64 support the 128-bit widening multiplication, so it shouldn't be used on those two architectures.
The target pointer width doesn't always indicate that we are dealing with a 64-bit architecture, as there are ABIs that reduce the pointer width, especially on AArch64 and x86-64.
WebAssembly (regardless of pointer width) supports 64-bit to 128-bit widening multiplication with the wide-arithmetic proposal.

The wide-arithmetic proposal is available since the LLVM 20 update and works perfectly for this use case as can be seen here:

https://rust.godbolt.org/z/9jY7fxqxK

Using wasmtime explore, we can see it compiles down to the ideal instructions on x86-64:

mulx rax, rdx, r10
xor rax, rdx

Based on the same change in foldhash.

src/lib.rs

The 128-bit widening multiplication was previously gated by simply checking the target pointer width. This works as a simple heuristic, but a better heuristic can be used: 1. Most 64-bit architectures except SPARC64 and Wasm64 support the 128-bit widening multiplication, so it shouldn't be used on those two architectures. 2. The target pointer width doesn't always indicate that we are dealing with a 64-bit architecture, as there are ABIs that reduce the pointer width, especially on AArch64 and x86-64. 3. WebAssembly (regardless of pointer width) supports 64-bit to 128-bit widening multiplication with the `wide-arithmetic` proposal. The `wide-arithmetic` proposal is available since the LLVM 20 update and works perfectly for this use case as can be seen here: https://rust.godbolt.org/z/9jY7fxqxK Using `wasmtime explore`, we can see it compiles down to the ideal instructions on x86-64: ```nasm mulx rax, rdx, r10 xor rax, rdx ``` Based on the same change in [`foldhash`](orlp/foldhash#17).

WaffleLapkin · 2025-08-06T13:45:40Z

src/lib.rs

        // We compute the full u64 x u64 -> u128 product, this is a single mul
        // instruction on x86-64, one mul plus one mulhi on ARM64.
-        let full = (x as u128) * (y as u128);
+        let full = (x as u128).wrapping_mul(y as u128);


See orlp/foldhash#16 for why this change was applied

scottmcm reviewed Jul 7, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

CryZe force-pushed the 128-bit-on-more-platforms branch from 146ff74 to 6849c16 Compare July 7, 2025 17:22

WaffleLapkin reviewed Aug 6, 2025

View reviewed changes

WaffleLapkin approved these changes Aug 6, 2025

View reviewed changes

WaffleLapkin merged commit 1a998d5 into rust-lang:master Aug 6, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use 128-bit Widening Multiply on More Platforms #62

Use 128-bit Widening Multiply on More Platforms #62

Uh oh!

CryZe commented Jul 7, 2025

Uh oh!

Uh oh!

WaffleLapkin Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use 128-bit Widening Multiply on More Platforms #62

Use 128-bit Widening Multiply on More Platforms #62

Uh oh!

Conversation

CryZe commented Jul 7, 2025

Uh oh!

Uh oh!

WaffleLapkin Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants