Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8265491: Math Signum optimization for x86 #3581

Closed
wants to merge 16 commits into from

Conversation

mgkwill
Copy link
Contributor

@mgkwill mgkwill commented Apr 19, 2021

x86 Math.Signum() uses two floating point compares and a copy sign operation involving data movement to gpr and XMM.

We can optimize to one floating point compare and sign computation in XMM. We observe ~25% performance improvement with this optimization.

Base:

Benchmark                       Mode Cnt Score Error Units
Signum._1_signumFloatTest avgt 5 4.660 ? 0.040 ns/op
Signum._2_overheadFloat avgt 5 3.314 ? 0.023 ns/op
Signum._3_signumDoubleTest avgt 5 4.809 ? 0.043 ns/op
Signum._4_overheadDouble avgt 5 3.313 ? 0.015 ns/op

Optimized:
signum intrinsic patch

Benchmark                       Mode  Cnt  Score   Error  Units
Signum._1_signumFloatTest       avgt    5  3.769 ? 0.015  ns/op
Signum._2_overheadFloat         avgt    5  3.312 ? 0.025  ns/op
Signum._3_signumDoubleTest      avgt    5  3.765 ? 0.005  ns/op
Signum._4_overheadDouble        avgt    5  3.309 ? 0.010  ns/op

Signed-off-by: Marcus G K Williams marcus.williams@intel.com


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3581/head:pull/3581
$ git checkout pull/3581

Update a local copy of the PR:
$ git checkout pull/3581
$ git pull https://git.openjdk.java.net/jdk pull/3581/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3581

View PR using the GUI difftool:
$ git pr show -t 3581

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3581.diff

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 19, 2021

👋 Welcome back mgkwill! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 19, 2021
@openjdk
Copy link

openjdk bot commented Apr 19, 2021

@mgkwill The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Apr 19, 2021
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 19, 2021

/label hotspot-compiler

@mlbridge
Copy link

mlbridge bot commented Apr 19, 2021

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 19, 2021
@openjdk
Copy link

openjdk bot commented Apr 19, 2021

@mgkwill
The hotspot-compiler label was successfully added.

@jatin-bhateja
Copy link
Member

@mgkwill, your newly added benchmark has 4 micro worklets, please publish the results for all of them.

@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 20, 2021

@mgkwill, your newly added benchmark has 4 micro worklets, please publish the results for all of them.

@jatin-bhateja Certainly. Omission was an oversight. I've updated the description of the pull request with overhead results.

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
mgkwill added 5 commits April 21, 2021 10:33
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 21, 2021

Updated benchmarks from patch updated from comments of @DamonFool:

Benchmark                       Mode  Cnt  Score   Error  Units
Signum._1_signumFloatTest       avgt    5  3.769 ? 0.015  ns/op
Signum._2_overheadFloat         avgt    5  3.312 ? 0.025  ns/op
Signum._3_signumDoubleTest      avgt    5  3.765 ? 0.005  ns/op
Signum._4_overheadDouble        avgt    5  3.309 ? 0.010  ns/op

Updated Summary.

mgkwill added 2 commits April 22, 2021 11:07
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 22, 2021

  • Switched to 3 operand xorps/d on colleague suggestion to insure I don't introduce a hard to find register bug.

  • Implemented @DamonFool suggestion for Default Enable of UseSignumIntrinsic on x86. Slightly worried about !has_match_rule(opcode) for aarch64. But as long as it doesn't hit this I think it will work.

src/hotspot/cpu/aarch64/aarch64.ad:

const bool Matcher::match_rule_supported(int opcode) {
  if (!has_match_rule(opcode))
    return false;

Also moved matcher checks in src/hotspot/cpu/x86/x86.ad outside of

#ifndef _LP64
#endif // !LP64

Thanks for the review @DamonFool @dholmes-ora @jatin-bhateja.

Updated Performance Numbers are essentially unchanged:

Benchmark                       Mode  Cnt  Score   Error  Units
Signum._1_signumFloatTest       avgt    5  3.770 ? 0.004  ns/op
Signum._2_overheadFloat         avgt    5  3.349 ? 0.013  ns/op
Signum._3_signumDoubleTest      avgt    5  3.773 ? 0.080  ns/op
Signum._4_overheadDouble        avgt    5  3.312 ? 0.007  ns/op

@DamonFool
Copy link
Member

Looks good to me.
Thanks for your update.

Please make sure the copyright year has been updated (TestSignumIntrinsic.java?) and all the imports are actually needed (import org.openjdk.jmh.annotations.Setup; ?).

mgkwill added 2 commits April 23, 2021 10:03
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 23, 2021

Looks good to me.
Thanks for your update.

Please make sure the copyright year has been updated (TestSignumIntrinsic.java?) and all the imports are actually needed (import org.openjdk.jmh.annotations.Setup; ?).

Thanks @DamonFool. Updated with your suggestions.

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 27, 2021

Thanks for the review @sviswa7.

I've updated to StubRoutines::x86::float_sign_flip() & StubRoutines::x86::double_sign_flip() as I was using when the code was in the src/hotspot/cpu/x86/x86.ad. If I understood your comment correctly this should use 128 bit aligned.

I've added comments as suggested.

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@sviswa7
Copy link

sviswa7 commented Apr 27, 2021

@mgkwill I was wrong. It looks like the CodeEntryAlignment is set to 16 bytes i.e. 128 bits as minimum (globals_x86.hpp). So please continue to use the vector_float_sign_flip and vector_double_sign_flip masks. Xorps and xorpd are vector instructions and we should use the vector masks here.

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 28, 2021

@mgkwill I was wrong. It looks like the CodeEntryAlignment is set to 16 bytes i.e. 128 bits as minimum (globals_x86.hpp). So please continue to use the vector_float_sign_flip and vector_double_sign_flip masks. Xorps and xorpd are vector instructions and we should use the vector masks here.

Thanks @sviswa7. I've reverted to StubRoutines::x86::vector_float_sign_flip() StubRoutines::x86::vector_double_sign_flip()

@sviswa7
Copy link

sviswa7 commented Apr 28, 2021

Looks good to me. Thanks.

Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

mgkwill commented Apr 28, 2021

Thanks for your review @jatin-bhateja. I've implemented your suggested changes.

@jatin-bhateja
Copy link
Member

Thanks for your review @jatin-bhateja. I've implemented your suggested changes.

Thanks Marcus!

@sviswa7
Copy link

sviswa7 commented Apr 29, 2021

@vnkozlov @neliasso Could you please help review this small patch from Marcus.

Copy link

@neliasso neliasso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@openjdk
Copy link

openjdk bot commented Apr 29, 2021

@mgkwill This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265491: Math Signum optimization for x86

Reviewed-by: jiefu, jbhateja, neliasso

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 256 new commits pushed to the master branch:

  • 55cc0af: 8266185: Shenandoah: Fix incorrect comment/assertion messages
  • 880c138: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion.
  • 001c514: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate
  • 194bcec: 8265984: Concurrent GC: Some tests fail "assert(is_frame_safe(f)) failed: Frame must be safe"
  • 1d9ea3a: 8266083: Shenandoah: Consolidate dedup/no dedup oop closures
  • 80941f4: 8234446: Post-CMS workgroup hierarchy cleanup
  • ac760c7: 8266295: Remove unused _concurrent_iteration_safe_limit
  • b42d496: 8266388: C2: Improve constant ShiftCntV on x86
  • 05cfac9: 8266412: Remove redundant TemplateInterpreter entries
  • c5dc657: 8266056: runtime/stringtable/StringTableCleaningTest.java failed with "RuntimeException: Missing Callback in [10, 11]"
  • ... and 246 more: https://git.openjdk.java.net/jdk/compare/713483c77d3007309a3e019218f1ad26453e7c28...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@DamonFool, @jatin-bhateja, @neliasso) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 29, 2021
@sviswa7
Copy link

sviswa7 commented Apr 29, 2021

@neliasso Thanks a lot for the review.

@mgkwill
Copy link
Contributor Author

mgkwill commented May 3, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 3, 2021
@openjdk
Copy link

openjdk bot commented May 3, 2021

@mgkwill
Your change (at version a3fae39) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented May 3, 2021

/sponsor

@openjdk openjdk bot closed this May 3, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 3, 2021
@openjdk
Copy link

openjdk bot commented May 3, 2021

@sviswa7 @mgkwill Since your change was applied there have been 256 commits pushed to the master branch:

  • 55cc0af: 8266185: Shenandoah: Fix incorrect comment/assertion messages
  • 880c138: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion.
  • 001c514: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate
  • 194bcec: 8265984: Concurrent GC: Some tests fail "assert(is_frame_safe(f)) failed: Frame must be safe"
  • 1d9ea3a: 8266083: Shenandoah: Consolidate dedup/no dedup oop closures
  • 80941f4: 8234446: Post-CMS workgroup hierarchy cleanup
  • ac760c7: 8266295: Remove unused _concurrent_iteration_safe_limit
  • b42d496: 8266388: C2: Improve constant ShiftCntV on x86
  • 05cfac9: 8266412: Remove redundant TemplateInterpreter entries
  • c5dc657: 8266056: runtime/stringtable/StringTableCleaningTest.java failed with "RuntimeException: Missing Callback in [10, 11]"
  • ... and 246 more: https://git.openjdk.java.net/jdk/compare/713483c77d3007309a3e019218f1ad26453e7c28...master

Your commit was automatically rebased without conflicts.

Pushed as commit ff65920.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mgkwill mgkwill deleted the 8265491_signum_intrinsic branch May 3, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants