Skip to content

Conversation

@missa-prime
Copy link
Contributor

@missa-prime missa-prime commented Jun 24, 2025

The changes described below are meant to resolve the performance regression introduced by the x86_64 cbrt double precision floating point scalar intrinsic in #24470.

  1. Check for +0, -0, +INF, -INF, and NaN before any other input values.
  2. If these special values are found, return immediately with minimal modifications to the result register.
  3. Performance testing shows the modified intrinsic improves throughput by 65.1% over the original intrinsic on average for the special values while throughput drops by 5.5% for the normal value range (-INF, -2^(-1022)], [2^(-1022), INF).

The commands to run all relevant micro-benchmarks are posted below.

make test TEST="micro:CbrtPerf.CbrtPerfRanges"
make test TEST="micro:CbrtPerf.CbrtPerfSpecialValues"

The results of all tests posted below were captured with an Intel® Xeon 8488C using OpenJDK v26-b1 as the baseline version. The term baseline1 refers to runs with the intrinsic enabled and baseline2 refers to runs with the intrinsic disabled.

Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the changes provide a significant uplift over baseline1 except for a mild regression in the (2^(-1022) <= |x| < INF) input range, which is expected due to the extra checks. When comparing against baseline2, the modified intrinsic significantly still outperforms for the inputs (-INF < x < INF) that require heavy compute. However, the special value inputs that trigger fast path returns still perform better with baseline2.

Input range(s) Baseline1 (ops/ms) Change (ops/ms) Change vs baseline1 (%)
[-2^(-1022), 2^(-1022)] 18470 20847 +12.87
(-INF, -2^(-1022)], [2^(-1022), INF) 210538 198925 -5.52
[0] 344990 627561 +81.91
[-0] 291983 629941 +115.75
[INF] 382685 542211 +41.68
[-INF] 386174 542291 +40.43
[NaN] 421700 615157 +45.88
Input range(s) Baseline2 (ops/ms) Change (ops/ms) Change vs baseline2 (%)
[-2^(-1022), 2^(-1022)] 7072 20847 +194.78
(-INF, -2^(-1022)], [2^(-1022), INF) 147884 198925 +34.51
[0] 1890520 627561 -66.80
[-0] 1890404 629941 -66.68
[INF] 1247633 542211 -56.54
[-INF] 1242287 542291 -56.35
[NaN] 1253700 615157 -50.93

Finally, the jtreg:test/jdk/java/lang/Math/CubeRootTests.java test passed with the changes.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8358179: Performance regression in Math.cbrt (Bug - P3)(⚠️ The fixVersion in this issue is [25] but the fixVersion in .jcheck/conf is 26, a new backport will be created when this pr is integrated.)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25962/head:pull/25962
$ git checkout pull/25962

Update a local copy of the PR:
$ git checkout pull/25962
$ git pull https://git.openjdk.org/jdk.git pull/25962/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25962

View PR using the GUI difftool:
$ git pr show -t 25962

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25962.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 24, 2025

👋 Welcome back missa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 24, 2025

@missa-prime This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8358179: Performance regression in Math.cbrt

Reviewed-by: sviswanathan, sparasa, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 213 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@sviswa7, @eme64) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Jun 24, 2025

@missa-prime The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 24, 2025
@missa-prime missa-prime marked this pull request as ready for review June 25, 2025 18:38
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 25, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2025

Webrevs

@missa-prime
Copy link
Contributor Author

@eme64 There seems to be an environment configuration issue (unrelated to code changes) on the Windows machine(s) this PR is landing on when running pre-submit tests. Could you run the internal tests on this PR to verify everything is ok?

@eme64
Copy link
Contributor

eme64 commented Jun 26, 2025

@missa-prime I just launched some testing. Yes, it seems the windows failures are unrelated - I've seen them on other PRs too.

@eme64
Copy link
Contributor

eme64 commented Jun 26, 2025

@missa-prime All tests passed in my internal testing.

@eme64
Copy link
Contributor

eme64 commented Jun 26, 2025

I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 27, 2025
@missa-prime
Copy link
Contributor Author

I'll hold off with approval until someone else who is more knowledgeable has reviewed first. But feel free to ping me for a second review.

@eme64 Second review with the latest changes?

@eme64
Copy link
Contributor

eme64 commented Jun 30, 2025

@missa-prime The patch still looks good, though I ran testing again because of the new changes. Should complete in about 24h.

@theRealAph
Copy link
Contributor

The changes described below are meant to resolve the performance regression introduced by the x86_64 cbrt double precision floating point scalar intrinsic in #24470.

Please add the performance for arguments in the normal range to this list.

@missa-prime
Copy link
Contributor Author

The changes described below are meant to resolve the performance regression introduced by the x86_64 cbrt double precision floating point scalar intrinsic in #24470.

Please add the performance for arguments in the normal range to this list.

Sure, I added a line covering this.

Copy link
Contributor

@vamsi-parasa vamsi-parasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did independent testing by running the correctness tests and performance benchmarks. The change looks good to me.

Thanks,
Vamsi

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not review the patch in detail, but looks reasonable.

Tests are passing on my end with commit 3 / v01.

@missa-prime Thanks for taking care of this!

@missa-prime
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jul 1, 2025
@openjdk
Copy link

openjdk bot commented Jul 1, 2025

@missa-prime
Your change (at version 615169d) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented Jul 1, 2025

Thanks a lot @eme64.
/sponsor

@openjdk
Copy link

openjdk bot commented Jul 1, 2025

Going to push as commit 38f59f8.
Since your change was applied there have been 221 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 1, 2025
@openjdk openjdk bot closed this Jul 1, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 1, 2025
@openjdk
Copy link

openjdk bot commented Jul 1, 2025

@sviswa7 @missa-prime Pushed as commit 38f59f8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@missa-prime missa-prime deleted the user/missa-prime/cbrt branch July 1, 2025 15:43
__ jmp(B1_4);

__ bind(L_2TAG_PACKET_6_0_1);
__ movsd(xmm0, ExternalAddress(NEG_INF), r11 /*rscratch*/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that NEG_INF is now unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

7 participants