Skip to content

refactor: optimize number/uint32/base/muldw implementation for better performance#11702

Open
impawstarlight wants to merge 4 commits intostdlib-js:developfrom
impawstarlight:refactor/optimize-umuldw
Open

refactor: optimize number/uint32/base/muldw implementation for better performance#11702
impawstarlight wants to merge 4 commits intostdlib-js:developfrom
impawstarlight:refactor/optimize-umuldw

Conversation

@impawstarlight
Copy link
Copy Markdown
Contributor

Resolves none.

Description

What is the purpose of this pull request?

This pull request:

  • Optimizes number/uint32/base/muldw implementation to improve performance by moving the isnan check out of the lower level implementation.
  • Utilizes number/uint32/base/mul for computing the lower half of the double word product to reduce unnecessary calculation.
  • Remove NaN tests for the lower level implementation.

Related Issues

Does this pull request have any related issues?

No.

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

AI Assistance

When authoring the changes proposed in this PR, did you use any kind of AI assistance?

  • Yes
  • No

If you answered "yes" above, how did you use AI assistance?

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Disclosure

If you answered "yes" to using AI assistance, please provide a short disclosure indicating how you used AI assistance. This helps reviewers determine how much scrutiny to apply when reviewing your contribution. Example disclosures: "This PR was written primarily by Claude Code." or "I consulted ChatGPT to understand the codebase, but the proposed changes were fully authored manually by myself.".

{{TODO: add disclosure if applicable}}


@stdlib-js/reviewers

…`umul`

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: passed
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
@impawstarlight impawstarlight requested a review from a team April 21, 2026 19:58
@stdlib-bot stdlib-bot added the Needs Review A pull request which needs code review. label Apr 21, 2026
@stdlib-bot
Copy link
Copy Markdown
Contributor

stdlib-bot commented Apr 21, 2026

Coverage Report

Package Statements Branches Functions Lines
number/uint32/base/muldw $\color{green}185/185$
$\color{green}+0.00%$
$\color{green}8/8$
$\color{green}+0.00%$
$\color{green}2/2$
$\color{green}+0.00%$
$\color{green}185/185$
$\color{green}+0.00%$

The above coverage report was generated for the changes in this PR.

@kgryte kgryte added difficulty: 2 May require some initial design or R&D, but should be straightforward to resolve and/or implement. review: 5 and removed Needs Review A pull request which needs code review. labels Apr 21, 2026
Comment thread lib/node_modules/@stdlib/number/uint32/base/muldw/lib/assign.js Outdated
* // returns [ 954437176, 1908874354 ]
*/
function umuldw( a, b ) {
if ( isnan( a ) || isnan( b ) ) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@impawstarlight What is the rationale for keeping these checks here but not in assign.js?

Copy link
Copy Markdown
Contributor Author

@impawstarlight impawstarlight Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in particular, just trying to stay as close to the original as possible since the main tests also check for NaN handling. But I see how this might be inconsistent in relation to mul or imul since we dont do any input validation there. Should it be removed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd say let's go ahead and remove it. If we are not going to include it in assign, we shouldn't deviate in the main export.

Co-authored-by: Athan <kgryte@gmail.com>
Signed-off-by: Athan <kgryte@gmail.com>

out[ offset ] = ( ( ha*hb ) + w1 + k ) >>> 0; // compute the higher 32 bits and cast to an unsigned 32-bit integer
out[ offset + stride ] = ( ( t << 16 ) + w3) >>> 0; // compute the lower 32 bits and cast to an unsigned 32-bit integer
out[ offset + stride ] = umul( a, b ) >>> 0; // compute the lower 32 bits and cast to an unsigned 32-bit integer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@impawstarlight I am a bit dense, but how does this manage to produce the same result? Previously, the logic for computing the lower 32 bits doesn't exceed the max uint32, but, here, a*b could, resulting in wraparound, which is a bit counterintuitive to me that it achieves the same result.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, this boils down to a call to imul, but not obvious to me why imul is faster than a bit shift plus addition.

Copy link
Copy Markdown
Contributor Author

@impawstarlight impawstarlight Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the logic for computing the lower 32 bits doesn't exceed the max uint32, but, here, a*b could, resulting in wraparound, which is a bit counterintuitive to me that it achieves the same result.

Actually, the way it avoids overflow is a clever engineering trick for extracting those bits that would normally overflow outside the lower 32 bits. This is done because these overflow bits contribute to the higher 32 bits and hence necessary for that calculation.

But for the lower 32 bits, we could very well make do with allowing overflow if we didn't have to calculate the higher 32 bits, like here in our imul polyfill.

Ultimately, it is fully equivalent to imul because of what its purpose is - calculate the low 32-bit of a 32x32 mult - which is basically the definition of imul.

So the wrap around behavour of imul is also happening in the shift-add approach, just not very obvious because they are handled through the 16-bit splitting logic while eliminating any intermediate overflow.

Copy link
Copy Markdown
Contributor Author

@impawstarlight impawstarlight Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, this boils down to a call to imul, but not obvious to me why imul is faster than a bit shift plus addition.

imul is probably faster here because otherwise we were doing 3 operations before:

w3 = ( t & LOW_WORD_MASK ) >>> 0;
...
out[ offset + stride ] = ( ( t << 16 ) + w3) >>> 0;

So we're comparing AND + SHIFT + ADD vs IMUL. Although individual add and bitwise instructions are very fast, the combination is probably slower than a single IMUL instruction because of various other factors like intermediate moving around around between registers. Just my guess, but the benchmark approves.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good.

@kgryte kgryte added review: 4 Needs Discussion Needs further discussion. and removed review: 5 labels Apr 21, 2026
@kgryte kgryte removed the Needs Discussion Needs further discussion. label Apr 22, 2026
Comment thread lib/node_modules/@stdlib/number/uint32/base/muldw/lib/main.js Outdated
Comment thread lib/node_modules/@stdlib/number/uint32/base/muldw/lib/main.js Outdated
Co-authored-by: Athan <kgryte@gmail.com>
Signed-off-by: Athan <kgryte@gmail.com>
Copy link
Copy Markdown
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@impawstarlight With the removal of the isnan checks in the main export, do the corresponding tests also need to be removed?

@kgryte kgryte added the Needs Changes Pull request which needs changes before being merged. label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

difficulty: 2 May require some initial design or R&D, but should be straightforward to resolve and/or implement. Needs Changes Pull request which needs changes before being merged. review: 4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants