Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5) #3536

Closed
wants to merge 4 commits into from

Conversation

@DamonFool
Copy link
Member

@DamonFool DamonFool commented Apr 16, 2021

Hi all,

I'd like to optimize the StubRoutines::dpow() for Math.pow(x, 0.5).

In the pow and sqrt discussion [1], Joe taught me that the Java library implementation of pow has been optimized for pow(x, 2.0) [2] and pow(x, 0.5) [3].
However, the hotspot StubRoutines::dpow() only implements the same opt for pow(x, 2.0), but still not for pow(x, 0.5).
This patch optimizes StubRoutines::dpow() for pow(x, 0.5).

Although not all Math.pow(x, 0.5) can be replaced with sqrt(x), we can still do it safely for the following cases:

  1. x >= 0.0 (fully implemented)
  2. x is +Inf (fully implemented)
  3. x is NaN (can be further divided into +NaN and -NaN and only +NaN is implemented)

The effect of this opt has been tested on serveral platforms showing 3.0x ~ 6.3x performance improvement.
And no performance drop was observed.

Testing:

  • tier1 ~ tier3 on Linux/x64

Thanks.
Best regards,
Jie

[1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-April/076220.html
[2]


[3]

Detailed performance numbers:

  • Linux/Intel
--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  218783.605 ?   838.379  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   45498.351 ?     7.558  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   45243.530 ?  1097.100  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.031 ?     0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  176106.602 ? 13127.650  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  219930.462 ?   181.922  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  204966.834 ?   329.032  ops/ms   <-- 4.5x up
MathBench.powDouble0Dot5Const       0  thrpt    8  203004.302 ?   684.072  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.121 ?     0.001  ops/ms   <-- 3.9x up
MathBench.powDoubleLoop             0  thrpt    8       0.031 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  178818.861 ? 16235.465  ops/ms
----------------------------
  • Linux/AMD
--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
MathBench.powDouble                 0  thrpt    8  100741.348 ? 207.766  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   33896.623 ? 103.352  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   34195.944 ? 230.703  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.039 ?   0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8   72000.166 ? 135.002  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score     Error   Units
MathBench.powDouble                 0  thrpt    8  100738.866 ? 222.820  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  100799.098 ?  95.537  ops/ms   <-- 3.0x up
MathBench.powDouble0Dot5Const       0  thrpt    8  100765.571 ? 178.436  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.244 ?   0.002  ops/ms   <-- 6.3x up
MathBench.powDoubleLoop             0  thrpt    8       0.038 ?   0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8   71758.725 ? 339.660  ops/ms
----------------------------
  • MacOS/Intel
--------- Before -----------
Benchmark                      (seed)   Mode  Cnt       Score      Error   Units
MathBench.powDouble                 0  thrpt    8  238064.722 ? 5181.318  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8   59235.979 ? 2046.519  ops/ms
MathBench.powDouble0Dot5Const       0  thrpt    8   59695.014 ? 1079.692  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.040 ?    0.001  ops/ms
MathBench.powDoubleLoop             0  thrpt    8       0.041 ?    0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  238391.026 ? 2743.385  ops/ms
----------------------------

--------- After -----------
Benchmark                      (seed)   Mode  Cnt       Score       Error   Units
MathBench.powDouble                 0  thrpt    8  238582.414 ?  3661.261  ops/ms
MathBench.powDouble0Dot5            0  thrpt    8  224102.701 ?  2846.892  ops/ms   <-- 3.8x up
MathBench.powDouble0Dot5Const       0  thrpt    8  224542.331 ? 19027.596  ops/ms
MathBench.powDouble0Dot5Loop        0  thrpt    8       0.158 ?     0.002  ops/ms   <-- 4.0x up
MathBench.powDoubleLoop             0  thrpt    8       0.041 ?     0.001  ops/ms
StrictMathBench.powDouble         N/A  thrpt    8  233689.504 ? 10141.034  ops/ms
----------------------------

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3536/head:pull/3536
$ git checkout pull/3536

Update a local copy of the PR:
$ git checkout pull/3536
$ git pull https://git.openjdk.java.net/jdk pull/3536/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3536

View PR using the GUI difftool:
$ git pr show -t 3536

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3536.diff

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Apr 16, 2021

/test
/label add hotspot-compiler
/cc hotspot-compiler

Loading

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 16, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 16, 2021

@DamonFool
The hotspot-compiler label was successfully added.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 16, 2021

@DamonFool The hotspot-compiler label was already applied.

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 16, 2021

Loading

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Thank you for testing all modes.

Note, when StubRoutines::dpow() is not used (-XX:-UseLibmIntrinsic or -XX:DisableIntrinsic=_pow) we use C code which also have this (and others) optimization already:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntimeTrans.cpp#L483

Loading

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2016, 2021, Intel Corporation.
Copy link
Contributor

@vnkozlov vnkozlov Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't modify copyright line of other company. Add new line for your company.

Loading

// pow(x, 0.5) isn't replaced with sqrt(x) for x < 0.0
if (a < 0.0) return;

double r1 = Math.sqrt(a);
Copy link
Contributor

@vnkozlov vnkozlov Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r1 should be static value to avoid compiling sqrt() method as intrinsic.

Loading

Copy link
Member Author

@DamonFool DamonFool Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r1 should be static value to avoid compiling sqrt() method as intrinsic.

Hi @vnkozlov ,

Thanks for your review.

Patch has been updated based on your comments.

To be honest, I didn't get why r1 should be static value.
I think both static and non-static should be OK for the test.

So what would happen is sqrt() is intrinsified?
Could you please make it more clearer?

Thanks.
Best regards,
Jie

Loading

Copy link
Contributor

@vnkozlov vnkozlov Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. I thought to calculate sqrt(a) outside test method so that only pow() code is tested.
Like calculating gold value not in compiled code.
But the test have to use sqrt() for each value a and sqrt is 1 HW instruction so my suggestion was stupid.
Please, revert this change. Original test code was fine.

Loading

Copy link
Member Author

@DamonFool DamonFool Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. I thought to calculate sqrt(a) outside test method so that only pow() code is tested.
Like calculating gold value not in compiled code.
But the test have to use sqrt() for each value a and sqrt is 1 HW instruction so my suggestion was stupid.
Please, revert this change. Original test code was fine.

Thanks for your clarification.
I got your point.

Updated.
Thanks.

Loading

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Good.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 17, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265325: Optimize StubRoutines::dpow() for Math.pow(x, 0.5)

Reviewed-by: kvn, neliasso

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 35 new commits pushed to the master branch:

  • d9e19f1: 8265226: (dc) API note in DatagramChannel.open should link to StandardProtocolFamily.UNIX
  • 49b9e68: 8262165: NMT report should state how many callsites had been skipped
  • e390e55: 8265066: Split ReservedSpace constructor to avoid default parameter
  • c607d12: 8249528: Remove obsolete comment in G1RootProcessor::process_java_roots
  • fa58aae: 8265245: depChecker_ don't have any functionalities
  • a2b0e0f: 8265323: Leftover local variables in PcDesc
  • 1ac25b8: 8264372: Threads::destroy_vm only ever returns true
  • 73d5f3b: 8265313: Obsolete the unused AssertOnSuspendWaitFailure and TraceSuspendWaitFailures flags
  • cb8394a: 8265304: Temporarily make Metal the default 2D rendering pipeline for macOS
  • 66f8987: 8265298: Hard VM crash when deadlock between "access" and higher ranked lock is detected
  • ... and 25 more: https://git.openjdk.java.net/jdk/compare/3423f3e1f5a2120e8f761a238c2929c44957760d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Apr 17, 2021
@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Apr 19, 2021

Good.

Thanks @vnkozlov .

May I get a second review for this change?
Or it's already OK to be pushed?
Thanks.

Loading

@neliasso
Copy link
Contributor

@neliasso neliasso commented Apr 19, 2021

May I get a second review for this change?
Or it's already OK to be pushed?

You need a second review for changes to Hotspot. I will provide it.

Loading

Copy link
Contributor

@neliasso neliasso left a comment

Approved.

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Apr 19, 2021

Approved.

Thanks @neliasso .

Loading

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Apr 19, 2021

/integrate

Loading

@openjdk openjdk bot closed this Apr 19, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 19, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 19, 2021

@DamonFool Since your change was applied there have been 37 commits pushed to the master branch:

  • d1c8c9e: 8197811: Test java/awt/Choice/PopupPosTest/PopupPosTest.java fails on Windows
  • 7d01c98: 8265414: Variable assigned but not used in G1FreeHumongousRegionClosure
  • d9e19f1: 8265226: (dc) API note in DatagramChannel.open should link to StandardProtocolFamily.UNIX
  • 49b9e68: 8262165: NMT report should state how many callsites had been skipped
  • e390e55: 8265066: Split ReservedSpace constructor to avoid default parameter
  • c607d12: 8249528: Remove obsolete comment in G1RootProcessor::process_java_roots
  • fa58aae: 8265245: depChecker_ don't have any functionalities
  • a2b0e0f: 8265323: Leftover local variables in PcDesc
  • 1ac25b8: 8264372: Threads::destroy_vm only ever returns true
  • 73d5f3b: 8265313: Obsolete the unused AssertOnSuspendWaitFailure and TraceSuspendWaitFailures flags
  • ... and 27 more: https://git.openjdk.java.net/jdk/compare/3423f3e1f5a2120e8f761a238c2929c44957760d...master

Your commit was automatically rebased without conflicts.

Pushed as commit b64a3fb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

@DamonFool DamonFool deleted the JDK-8265325 branch Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants