Skip to content

Conversation

@CryZe
Copy link
Contributor

@CryZe CryZe commented Jul 7, 2025

The 128-bit widening multiplication was previously gated by simply checking the target pointer width. This works as a simple heuristic, but a better heuristic can be used:

  1. Most 64-bit architectures except SPARC64 and Wasm64 support the 128-bit widening multiplication, so it shouldn't be used on those two architectures.
  2. The target pointer width doesn't always indicate that we are dealing with a 64-bit architecture, as there are ABIs that reduce the pointer width, especially on AArch64 and x86-64.
  3. WebAssembly (regardless of pointer width) supports 64-bit to 128-bit widening multiplication with the wide-arithmetic proposal.

The wide-arithmetic proposal is available since the LLVM 20 update and works perfectly for this use case as can be seen here:

https://rust.godbolt.org/z/9jY7fxqxK

Using wasmtime explore, we can see it compiles down to the ideal instructions on x86-64:

mulx rax, rdx, r10
xor rax, rdx

Based on the same change in foldhash.

The 128-bit widening multiplication was previously gated by simply
checking the target pointer width. This works as a simple heuristic, but
a better heuristic can be used:

1. Most 64-bit architectures except SPARC64 and Wasm64 support the
   128-bit widening multiplication, so it shouldn't be used on those two
   architectures.
2. The target pointer width doesn't always indicate that we are dealing
   with a 64-bit architecture, as there are ABIs that reduce the pointer
   width, especially on AArch64 and x86-64.
3. WebAssembly (regardless of pointer width) supports 64-bit to 128-bit
   widening multiplication with the `wide-arithmetic` proposal.

The `wide-arithmetic` proposal is available since the LLVM 20 update and
works perfectly for this use case as can be seen here:

https://rust.godbolt.org/z/9jY7fxqxK

Using `wasmtime explore`, we can see it compiles down to the ideal
instructions on x86-64:

```nasm
mulx rax, rdx, r10
xor rax, rdx
```

Based on the same change in
[`foldhash`](orlp/foldhash#17).
@CryZe CryZe force-pushed the 128-bit-on-more-platforms branch from 146ff74 to 6849c16 Compare July 7, 2025 17:22
// We compute the full u64 x u64 -> u128 product, this is a single mul
// instruction on x86-64, one mul plus one mulhi on ARM64.
let full = (x as u128) * (y as u128);
let full = (x as u128).wrapping_mul(y as u128);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See orlp/foldhash#16 for why this change was applied

@WaffleLapkin WaffleLapkin merged commit 1a998d5 into rust-lang:master Aug 6, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants