Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8285973: x86_64: Improve fp comparison and cmove for eq/ne #8525

Closed
wants to merge 9 commits into from

Conversation

merykitty
Copy link
Contributor

@merykitty merykitty commented May 4, 2022

Hi,

This patch optimises the matching rules for floating-point comparison with respects to eq/ne on x86-64

1, When the inputs of a comparison is the same (i.e isNaN patterns), ZF is always set, so we don't need cmpOpUCF2 for the eq/ne cases, which improves the sequence of If (CmpF x x) (Bool ne) from

ucomiss xmm0, xmm0
jp      label
jne     label

into

ucomiss xmm0, xmm0
jp      label

2, The move rules for cmpOpUCF2 is missing, which makes patterns such as x == y ? 1 : 0 to fall back to cmpOpU, which have a really high cost of fixing the flags, such as

    xorl    ecx, ecx
    ucomiss xmm0, xmm1
    jnp     done
    pushf
    andq    [rsp], 0xffffff2b
    popf
done:
    movl    eax, 1
    cmovel  eax, ecx

The patch changes this sequence into

xorl    ecx, ecx
ucomiss xmm0, xmm1
movl    eax, 1
cmovpl  eax, ecx
cmovnel eax, ecx

3, The patch also changes the pattern of isInfinite to be more optimised by using Math.abs to reduce 1 comparison and compares the result with MAX_VALUE since > is more optimised than == for floating-point types.

The benchmark results are as follow:

                                                 Before              After
Benchmark                      Mode  Cnt     Score     Error    Score     Error   Unit   Ratio
FPComparison.equalDouble       avgt    5  2876.242 ±  58.875  594.636 ±   8.922  ns/op    4.84
FPComparison.equalFloat        avgt    5  3062.430 ±  31.371  663.849 ±   3.656  ns/op    4.61
FPComparison.isFiniteDouble    avgt    5   475.749 ±  19.027  518.309 ± 107.352  ns/op    0.92
FPComparison.isFiniteFloat     avgt    5   506.525 ±  14.417  515.576 ±  14.669  ns/op    0.98
FPComparison.isInfiniteDouble  avgt    5  1232.800 ±  31.677  621.185 ±  11.935  ns/op    1.98
FPComparison.isInfiniteFloat   avgt    5  1234.708 ±  70.239  623.566 ±  15.206  ns/op    1.98
FPComparison.isNanDouble       avgt    5  2255.847 ±   7.238  400.124 ±   0.762  ns/op    5.64
FPComparison.isNanFloat        avgt    5  2567.044 ±  36.078  546.486 ±   1.509  ns/op    4.70

Thank you very much.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 reviewer)

Issue

  • JDK-8285973: x86_64: Improve fp comparison and cmove for eq/ne

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8525/head:pull/8525
$ git checkout pull/8525

Update a local copy of the PR:
$ git checkout pull/8525
$ git pull https://git.openjdk.java.net/jdk pull/8525/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8525

View PR using the GUI difftool:
$ git pr show -t 8525

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8525.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 4, 2022

👋 Welcome back merykitty! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label May 4, 2022
@openjdk
Copy link

openjdk bot commented May 4, 2022

@merykitty The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler core-libs labels May 4, 2022
@mlbridge
Copy link

mlbridge bot commented May 4, 2022

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Thank you for including tests. But you need additional other float compare (not just ea ne) tests since you removed some of Cmp instructions.

You need approval from core libs for Java methods changes (it affects all platforms). And we will get intrinsics for them soon (I think): #8459.

Please add comments to cmpOpU* operands explaining changes in them.

You did not explain removal of some float compare instructions.

src/hotspot/cpu/x86/x86_64.ad Show resolved Hide resolved
Copy link
Member

@PaulSandoz PaulSandoz left a comment

The changes to Float and Double look good. I don't think we need additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.

At first i thought we no longer need PR #8459 but it seems both PRs are complimentary, albeit PR #8459 has more modest performance gains for the intrinsics.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 4, 2022

The changes to Float and Double look good. I don't think we need additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.

Thank you, Paul for pointing the test. It means we need to run tier4 (which runs these tests with -Xcomp) to make sure methods are compiled by C2.

@merykitty
Copy link
Contributor Author

merykitty commented May 18, 2022

I have reverted the changes to java.lang.Float and java.lang.Double to not interfere with the intrinsic PR. More tests are added to cover all cases regarding floating-point comparison of compiled code.

The rules for fp comparison that output the result to rFlagRegsU are expensive and should be avoided. As a result, I removed the shortcut rules with memory or constant operands to reduce the number of match rules. Only the basic rules are kept.

Thanks.

@vamsi-parasa
Copy link
Contributor

vamsi-parasa commented May 19, 2022

Could you pls show the performance delta with the baseline and after the patch? Otherwise, people reviewing this PR have to manually compute how much improvement is obtained. For example, FPComparison.isNanFloat is showing 4.7x improvement. Kindly fill the delta column for rest of the data points.

Instead of two separate tables, the suggested table format for each row would be:
<benchmark> , <baseline throughput> , <this patch throughput>, <performance delta in % or x>

@sviswa7
Copy link

sviswa7 commented May 20, 2022

@merykitty Very nice work! The patch looks good to me.

1 similar comment
@sviswa7
Copy link

sviswa7 commented May 20, 2022

@merykitty Very nice work! The patch looks good to me.

@merykitty
Copy link
Contributor Author

merykitty commented May 21, 2022

@vnkozlov I have added comments to describe the changes in cmpOpUCF and the reasons behind the cmov_regUCF2_eq match rules. Using expand broke the build with Syntax Error: :For expand in cmovI_regUCF2_eq to work, parameter declaration order in cmovI_regUCF2_ne must follow matchrule.

@sviswa7 (x != y) ? a : b can be calculated using pseudocode as follow:

res = (!ZF || PF) ? a : b
    = !ZF ? a : (PF ? a : b)

which can be calculated using

cmovp  rb, ra // rb1 = PF ? ra : rb
cmovne rb, ra // rb2 = !ZF ? ra : rb1 = !ZF ? ra : (PF ? ra : rb)

Furthermore, since (x == y) == !(x != y), we have ((x == y) ? a : b) == ((x != y) ? b : a), which explains the implementation of cmov_regUCF2_eq.

@vamsi-parasa Thanks a lot for your suggestion, I have modified the PR description as you say.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Thank you for trying my suggestion and new comments. Approved.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 21, 2022

I started our testing. Please, wait results.

@openjdk
Copy link

openjdk bot commented May 21, 2022

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8285973: x86_64: Improve fp comparison and cmove for eq/ne

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 82 new commits pushed to the master branch:

  • 782ae38: 8286262: Windows: Cleanup deprecation warning suppression
  • 9f73fb5: 8225013: sanity/client/SwingSet/src/ScrollPaneDemoTest.java fails on Linux
  • 02fec1e: 8287155: Additional make typos
  • 5b7d066: 8287109: Distrust.java failed with CertificateExpiredException
  • 6a19220: 8286832: JavaDoc pages call browser history API too often
  • 9df93a1: 8286887: Remove logging from search.js
  • 646c8aa: 8286277: CDS VerifyError when calling clone() on object array
  • ef7a9f8: 8279185: Support for IsoFields in JapaneseDate/MinguoDate/ThaiBuddhistDate
  • ac274c4: 8286956: Loom: Define test groups for development/porting use
  • 5d8d6da: 8286972: Support the new loop induction variable related PopulateIndex IR node on x86
  • ... and 72 more: https://git.openjdk.java.net/jdk/compare/72bd41b844e03da4bcb19c2cb38d96975a9ebceb...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @PaulSandoz, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label May 21, 2022
@vnkozlov
Copy link
Contributor

vnkozlov commented May 21, 2022

I started our testing. Please, wait results.

Testing passed clean.

@merykitty
Copy link
Contributor Author

merykitty commented May 23, 2022

Thank you very much for your reviews and testing.
/integrate

@openjdk openjdk bot added the sponsor label May 23, 2022
@openjdk
Copy link

openjdk bot commented May 23, 2022

@merykitty
Your change (at version 7fcfe4a) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented May 24, 2022

/integrate

@openjdk
Copy link

openjdk bot commented May 24, 2022

@sviswa7 Only the author (@merykitty) is allowed to issue the integrate command. As this pull request is ready to be sponsored, and you are an eligible sponsor, did you mean to issue the /sponsor command?

@sviswa7
Copy link

sviswa7 commented May 24, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented May 24, 2022

Going to push as commit c1db70d.
Since your change was applied there have been 82 commits pushed to the master branch:

  • 782ae38: 8286262: Windows: Cleanup deprecation warning suppression
  • 9f73fb5: 8225013: sanity/client/SwingSet/src/ScrollPaneDemoTest.java fails on Linux
  • 02fec1e: 8287155: Additional make typos
  • 5b7d066: 8287109: Distrust.java failed with CertificateExpiredException
  • 6a19220: 8286832: JavaDoc pages call browser history API too often
  • 9df93a1: 8286887: Remove logging from search.js
  • 646c8aa: 8286277: CDS VerifyError when calling clone() on object array
  • ef7a9f8: 8279185: Support for IsoFields in JapaneseDate/MinguoDate/ThaiBuddhistDate
  • ac274c4: 8286956: Loom: Define test groups for development/porting use
  • 5d8d6da: 8286972: Support the new loop induction variable related PopulateIndex IR node on x86
  • ... and 72 more: https://git.openjdk.java.net/jdk/compare/72bd41b844e03da4bcb19c2cb38d96975a9ebceb...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated label May 24, 2022
@openjdk openjdk bot closed this May 24, 2022
@openjdk openjdk bot removed ready rfr sponsor labels May 24, 2022
@openjdk
Copy link

openjdk bot commented May 24, 2022

@sviswa7 @merykitty Pushed as commit c1db70d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs hotspot-compiler integrated
5 participants