Skip to content

8338694: x86_64 intrinsic for tanh using libm #20657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

vamsi-parasa
Copy link
Contributor

@vamsi-parasa vamsi-parasa commented Aug 21, 2024

The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm

Benchmark (ops/ms) Stock JDK Tanh intrinsic Speedup
MathBench.tanhDouble 70900 95618 1.35x

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8338694: x86_64 intrinsic for tanh using libm (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657
$ git checkout pull/20657

Update a local copy of the PR:
$ git checkout pull/20657
$ git pull https://git.openjdk.org/jdk.git pull/20657/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20657

View PR using the GUI difftool:
$ git pr show -t 20657

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20657.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 21, 2024

👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 21, 2024

@vamsi-parasa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338694: x86_64 intrinsic for tanh using libm

Reviewed-by: kvn, jbhateja, sgibbons, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 122 new commits pushed to the master branch:

  • 2669e22: 8340793: Fix client builds after JDK-8337987
  • 85aed87: 8338405: JFR: Use FILE type for dcmds
  • caa751c: 8338546: Speed up ConstantPoolBuilder::classEntry(ClassDesc)
  • 279086d: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue
  • 3c97d24: 8340383: VM issues warning failure to find kernel32.dll on Windows nanoserver
  • 49d15ed: 8340657: [PPC64] SA determines wrong unextendedSP
  • e1c4d30: 8339299: C1 will miss type profile when inline final method
  • 3e673d9: 8340680: Fix typos in javax.lang.model.SourceVersion
  • 4cd8c75: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option
  • 4402482: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers
  • ... and 112 more: https://git.openjdk.org/jdk/compare/418bb42b95b177f5f31f756054d0dd83740c6686...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@asgibbons, @sviswa7, @vnkozlov, @jatin-bhateja) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Aug 21, 2024

@vamsi-parasa The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Aug 21, 2024
@vamsi-parasa vamsi-parasa marked this pull request as ready for review August 26, 2024 15:34
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 26, 2024
@mlbridge
Copy link

mlbridge bot commented Aug 26, 2024

@jddarcy
Copy link
Member

jddarcy commented Aug 26, 2024

This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method.

__ movdqu(xmm3, ExternalAddress(pv + 16), r11 /*rscratch*/);
__ mulpd(xmm1, xmm1);
__ movdqu(xmm4, ExternalAddress(pv + 32), r11 /*rscratch*/);
__ mulpd(xmm2, xmm1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would encourage either you add detailed comments or give meaningful names to the registers to ease the review process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from.

Where does the algorithm come from? What are its accuracy guarantees?

In addition, given the rarity of hyperbolic tangents in Java applications, do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a copy of this information? Should it be in the commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja This is based on Intel internal LIBM sources and so there is no public link available.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a copy of this information? Should it be in the commit?

@theRealAph The accuracy of standard (non fast mode) LIBM functions ensures errors of < 1 ulp. LIBM is part of Intel C++ compiler. The documentation can be found here: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/programming-tradeoffs-floating-point-applications.html.

@vamsi-parasa
Copy link
Contributor Author

This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method.

Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java)

@jddarcy
Copy link
Member

jddarcy commented Aug 27, 2024

This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method.

Thank You Joe for the suggestion. Will add more tests. (This PR passes the tier-1 tanh tests in the HyperbolicTests.Java)

Yes @vamsi-parasa ; running that test is a good backstop and it is written to be applicable to any implementation of {sinh, cosh, tanh} that meet the general quality-of-implementation criteria for java.lang.Math. To be explicit, the WorstCaseTests.java file, and for good measure all the java.lang.Math tests, should also be run too for a change like this.

For a hypothetical example, if an intrinsic used different polynomials for different ranges of the input, it would be a reasonable regression tests for that implementation to probe around the boundary of the transition between the polynomials to make sure the monotonicity requirements were being met.

That kind of check could be written to be generally applicable and be suitable for a regression tests in java/lang/Math or could be suitable for a regression test in the HotSpot area. HTH

@mlbridge
Copy link

mlbridge bot commented Aug 28, 2024

Mailing list message from Andrew Haley on core-libs-dev:

On 8/27/24 12:13, Jatin Bhateja wrote:

Hi @vamsi-parasa , Kindly also add a JMH micro benchmark, I did a first
run and see around 4% performance drop with attached micro on Sapphire
Rapids.
[test.txt](https://github.com/user-attachments/files/16761142/test.txt)

If I had to guess, that's because there are no infinities and no NaNs
so HotSpot will perfectly predict .

Given that this "normal" execution is the expected use of tanh(), it
seems to me that this intrinsic won't help in the usual case, and may
make things worse.

@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Aug 30, 2024

Hi Joe(@jddarcy) and Andrew (@theRealAph) ,

Please see the updates below:

This PR doesn't include any additional tests. It is often appropriate to add more regression testing when introducing a new implementation of a method.

Added 1500 regression tests in HyperbolicTests.java which compare the accuracy of the Math.tanh intrinsic by using StrictMath.tanh (which calls FdLibm.Tanh.compute) as a reference. The tests are passing within 2.5 ulps of the expected result. The tests are fairly exhaustive and also cover the boundary transitions.

Yes @vamsi-parasa ; running that test is a good backstop and it is written to be applicable to any implementation of {sinh, cosh, tanh} that meet the general quality-of-implementation criteria for java.lang.Math. To be explicit, the WorstCaseTests.java file, and for good measure all the java.lang.Math tests, should also be run too for a change like this.

Ran the WorstCaseTests.java and all the tests in java.lang.Math and they're passing on my local machine.

For a hypothetical example, if an intrinsic used different polynomials for different ranges of the input, it would be a reasonable regression tests for that implementation to probe around the boundary of the transition between the polynomials to make sure the monotonicity requirements were being met.

Added new tests in HyperbolicTests.java which probe around the various boundaries of transition. 1500 testcases and they passed within 2.5ulps of the reference StrictMath.tanh

That kind of check could be written to be generally applicable and be suitable for a regression tests in java/lang/Math or could be suitable for a regression test in the HotSpot area. HTH

Please let me know if anything more needs to be added.

static int testTanhIntrinsicWithReference() {
double b1 = 0.02;
double b2 = 5.1;
double b3 = 55 * Math.log(2)/2; // ~19.062
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use StrictMath.log here or, better use, precompute the value as a constant and document its conceptual origin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the updated code which uses the precomputed value of b3.


for(int i = 0; i < testCases.length; i++) {
double testCase = testCases[i];
failures += testTanhWithReferenceUlpDiff(testCase, StrictMath.tanh(testCase), 2.5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error.

For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction).

If the test is going to use randomness, then its jtreg tags should include

@key randomness

and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test is going to use randomness, then its jtreg tags should include

@key randomness

and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails.

Please see the test updated to use @key randomness and jdk.test.lib.RandomFactory to get and Random object.

The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error.
For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction).

So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms).

If there was a correctly rounded tanh to compare against, then this style of testing would be valid.

Are there any plan to intrinsify sinh or cosh?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of random we should generate offline additional correctly rounded fixed test points to cater to new algorithm using high precision arithmetic library and then simply extend the HyperbolicTests.java with these new fixed test points using existing ulp testing mechanism in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Sandhya(@sviswa7) for the suggestion! Will update the existing HyperbolicTests.java with new fixed point tests with quad precision reference values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Joe (@jddarcy),

As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in TanhTests.java using the quad precision tanh implementation in libquadmath library from gcc.

Please let me know if this looks good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamsi-parasa In my thoughts the best way to do this is add the additional tests points to HyperbolicTests.java itself in the testcases array of testTanh() method. We should remove all the other changes from HyperbolicTests.java. Also no need for separate TanhTests.java file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sandhya(@sviswa7), please see the updated code in HyperbolicTests.java which removes the previous random based tests with the new fixed point tests. Also removed the TanhTests.java.

@@ -807,7 +807,7 @@ void LIRGenerator::do_MathIntrinsic(Intrinsic* x) {
if (x->id() == vmIntrinsics::_dexp || x->id() == vmIntrinsics::_dlog ||
x->id() == vmIntrinsics::_dpow || x->id() == vmIntrinsics::_dcos ||
x->id() == vmIntrinsics::_dsin || x->id() == vmIntrinsics::_dtan ||
x->id() == vmIntrinsics::_dlog10) {
x->id() == vmIntrinsics::_dlog10 || x->id() == vmIntrinsics::_dtanh) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to have the tanh under #Ifdef _LP64 as we are generating stub only for 64 bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the newly added #ifdef in the updated code.

vamsi-parasa and others added 2 commits September 13, 2024 15:26
Copy link
Contributor

@asgibbons asgibbons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hand-verified the code.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

@openjdk
Copy link

openjdk bot commented Sep 20, 2024

⚠️ @vamsi-parasa the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout onetanh
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 20, 2024
@vamsi-parasa
Copy link
Contributor Author

Hello Vladimir (@vnkozlov),

Could you please run the tests for this PR and let us know?
We're hoping to integrate this PR soon.

Thanks,
Vamsi

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have only one nitpick. I will start testing.

@@ -167,6 +167,9 @@ bool Compiler::is_intrinsic_supported(vmIntrinsics::ID id) {
case vmIntrinsics::_dsin:
case vmIntrinsics::_dcos:
case vmIntrinsics::_dtan:
#if defined(X86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use #ifdef AMD64 for x64 only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vladimir! Please see the code updated with #ifdef AMD64.

@vamsi-parasa
Copy link
Contributor Author

Looks good. I have only one nitpick. I will start testing.

Thank you Vladimir!

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 23, 2024
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing passed.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 24, 2024
@vamsi-parasa
Copy link
Contributor Author

My testing passed.

Thank You Vladimir!

@vamsi-parasa
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 24, 2024
@openjdk
Copy link

openjdk bot commented Sep 24, 2024

@vamsi-parasa
Your change (at version 4dc2e36) is now ready to be sponsored by a Committer.

@jatin-bhateja
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 24, 2024

Going to push as commit 212e329.
Since your change was applied there have been 122 commits pushed to the master branch:

  • 2669e22: 8340793: Fix client builds after JDK-8337987
  • 85aed87: 8338405: JFR: Use FILE type for dcmds
  • caa751c: 8338546: Speed up ConstantPoolBuilder::classEntry(ClassDesc)
  • 279086d: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue
  • 3c97d24: 8340383: VM issues warning failure to find kernel32.dll on Windows nanoserver
  • 49d15ed: 8340657: [PPC64] SA determines wrong unextendedSP
  • e1c4d30: 8339299: C1 will miss type profile when inline final method
  • 3e673d9: 8340680: Fix typos in javax.lang.model.SourceVersion
  • 4cd8c75: 8340398: [JVMCI] Unintuitive behavior of UseJVMCICompiler option
  • 4402482: 8340585: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement.java fails with -XX:-UseCompressedClassPointers
  • ... and 112 more: https://git.openjdk.org/jdk/compare/418bb42b95b177f5f31f756054d0dd83740c6686...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 24, 2024
@openjdk openjdk bot closed this Sep 24, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 24, 2024
@openjdk
Copy link

openjdk bot commented Sep 24, 2024

@jatin-bhateja @vamsi-parasa Pushed as commit 212e329.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

//
// Special cases:
// tanh(NaN) = quiet NaN, and raise invalid exception
// tanh(INF) = that INF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

tanh(POSITIVE_INFINITY) = +1.0
tanh(NEGATIVE_INFINITY) = -1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

9 participants