<__msvc_int128.hpp>: use __umulh on ARM64/ARM64EC (#6184)#6281
Open
Adesh4477 wants to merge 1 commit into
Open
<__msvc_int128.hpp>: use __umulh on ARM64/ARM64EC (#6184)#6281Adesh4477 wants to merge 1 commit into
Adesh4477 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an ARM64/ARM64EC runtime fast-path for _Base128::_UMul128() in <__msvc_int128.hpp> by computing the high 64 bits via __umulh() and the low 64 bits via a normal 64-bit multiply, avoiding the existing Knuth base-2^32 fallback in non-constant-evaluated code.
Changes:
- Add an ARM64/ARM64EC
__umulh-based implementation for the high half of the 128-bit product. - Keep the existing constexpr/Knuth fallback for constant evaluation and other targets.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds an ARM64/ARM64EC fast path to _Base128::_UMul128 that uses the __umulh intrinsic for the high 64 bits and a plain 64-bit multiply for the low 64 bits, in place of the Knuth-base-2^32 fallback. Microbench on Snapdragon X Elite (5M random uint64 pairs * 5 reps): Knuth fallback : ~82 ms (~3.27 ns/op) __umulh path : ~27 ms (~1.08 ns/op) Speedup : ~3.03x Disassembly collapses from ~30 ops (incl. /GS cookie push) to 4 ops (umulh / mul / str / ret). _STL_128_INTRINSICS is intentionally not enabled for ARM64; that macro also gates _addcarry_u64, _subborrow_u64, __shiftleft128, __shiftright128, and _udiv128/_div128, which have no direct single-instruction ARM64 equivalents and are out of scope for this change. Per the issue author, x64 is intentionally not modified -- _umul128 remains preferable there.
d4723b9 to
00f4ead
Compare
AlexGuteniev
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On ARM64 _UMul128 falls through to the Knuth fallback because _STL_128_INTRINSICS is x64-only. ARM64 has umulh as a single instruction so we can do this in two ops
instead of ~thirty. Patch adds the obvious #elif branch using __umulh for the high half and a regular 64-bit * for the low half.
Tested locally this microbench on Snapdragon X Elite (5M random uint64 pairs * 5 reps):
Knuth fallback : ~82 ms (~3.27 ns/op)
__umulh path : ~27 ms (~1.08 ns/op)
Speedup : ~3.03x
Per the issue author, x64 is intentionally not modified -- _umul128 remains preferable there.
Fixes #6184