Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

Closed
wants to merge 3 commits into from

Conversation

VladimirKempik
Copy link

@VladimirKempik VladimirKempik commented Nov 21, 2022

Please review this change.

It improves performance of Math.min/max intrinsics for Floats and Doubles.

The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
That requires additional logic to handle the case where only of of src is NaN.

Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with fadd-ing srcs into dst and checking the dst for NaN ( with fclass).

The results on the thead c910:

The results, thead c910:

before

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 54023.827 ± 268.645 ns/op
FpMinMaxIntrinsics.dMin avgt 25 54309.850 ± 323.551 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ± 12.114 ns/op
FpMinMaxIntrinsics.fMax avgt 25 53797.657 ± 15.816 ns/op
FpMinMaxIntrinsics.fMin avgt 25 54135.710 ± 313.185 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ± 13.424 ns/op
MaxMinOptimizeTest.dAdd avgt 25 650.810 ± 169.998 us/op
MaxMinOptimizeTest.dMax avgt 25 4561.967 ± 40.367 us/op
MaxMinOptimizeTest.dMin avgt 25 4589.100 ± 75.854 us/op
MaxMinOptimizeTest.dMul avgt 25 759.821 ± 240.092 us/op
MaxMinOptimizeTest.fAdd avgt 25 300.137 ± 13.495 us/op
MaxMinOptimizeTest.fMax avgt 25 4348.885 ± 20.061 us/op
MaxMinOptimizeTest.fMin avgt 25 4372.799 ± 27.296 us/op
MaxMinOptimizeTest.fMul avgt 25 304.024 ± 12.120 us/op

after

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 10545.196 ± 140.137 ns/op
FpMinMaxIntrinsics.dMin avgt 25 10454.525 ± 9.972 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ± 0.892 ns/op
FpMinMaxIntrinsics.fMax avgt 25 10449.709 ± 7.284 ns/op
FpMinMaxIntrinsics.fMin avgt 25 10445.261 ± 7.206 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ± 0.951 ns/op
MaxMinOptimizeTest.dAdd avgt 25 487.769 ± 170.711 us/op
MaxMinOptimizeTest.dMax avgt 25 929.394 ± 158.697 us/op
MaxMinOptimizeTest.dMin avgt 25 864.230 ± 284.794 us/op
MaxMinOptimizeTest.dMul avgt 25 894.116 ± 342.550 us/op
MaxMinOptimizeTest.fAdd avgt 25 284.664 ± 1.446 us/op
MaxMinOptimizeTest.fMax avgt 25 384.388 ± 15.004 us/op
MaxMinOptimizeTest.fMin avgt 25 371.952 ± 15.295 us/op
MaxMinOptimizeTest.fMul avgt 25 305.226 ± 12.467 us/op

significant improvement

On hifive u74 ( unmatched) the improvements is less significant:

hifive:

before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 30219.666 ± 12.878 ns/op
FpMinMaxIntrinsics.dMin avgt 25 30242.249 ± 31.374 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ± 2.803 ns/op
FpMinMaxIntrinsics.fMax avgt 25 30150.114 ± 22.421 ns/op
FpMinMaxIntrinsics.fMin avgt 25 30149.752 ± 20.813 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ± 4.251 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1143.582 ± 4.444 us/op
MaxMinOptimizeTest.dMax avgt 25 2556.317 ± 3.795 us/op
MaxMinOptimizeTest.dMin avgt 25 2556.569 ± 2.274 us/op
MaxMinOptimizeTest.dMul avgt 25 1142.769 ± 1.593 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.688 ± 7.342 us/op
MaxMinOptimizeTest.fMax avgt 25 2280.381 ± 1.535 us/op
MaxMinOptimizeTest.fMin avgt 25 2280.760 ± 1.532 us/op
MaxMinOptimizeTest.fMul avgt 25 748.991 ± 7.261 us/op

after:

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 27723.791 ± 22.784 ns/op
FpMinMaxIntrinsics.dMin avgt 25 27760.799 ± 45.411 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ± 2.829 ns/op
FpMinMaxIntrinsics.fMax avgt 25 25992.753 ± 23.788 ns/op
FpMinMaxIntrinsics.fMin avgt 25 25994.554 ± 32.060 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ± 2.169 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1144.128 ± 4.371 us/op
MaxMinOptimizeTest.dMax avgt 25 1968.145 ± 2.346 us/op
MaxMinOptimizeTest.dMin avgt 25 1970.249 ± 4.712 us/op
MaxMinOptimizeTest.dMul avgt 25 1143.356 ± 2.203 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.634 ± 7.229 us/op
MaxMinOptimizeTest.fMax avgt 25 1523.719 ± 0.570 us/op
MaxMinOptimizeTest.fMin avgt 25 1524.534 ± 1.109 us/op
MaxMinOptimizeTest.fMul avgt 25 748.643 ± 7.291 us/op

fAdd/dAdd and fMul/dMull is unaffected likely due to :

private double dAddBench(double a, double b) {
return Math.max(a, b) + Math.min(a, b);
}

private double dMulBench(double a, double b) {
return Math.max(a, b) * Math.min(a, b);
}
may get reduces to just a + b and a*b respectively without actually using min/max

Testing : tier1/tier2 in progress, will update this as soon as it finishes


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8297359: RISC-V: improve performance of floating Max Min intrinsics

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11276/head:pull/11276
$ git checkout pull/11276

Update a local copy of the PR:
$ git checkout pull/11276
$ git pull https://git.openjdk.org/jdk pull/11276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11276

View PR using the GUI difftool:
$ git pr show -t 11276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11276.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 21, 2022

👋 Welcome back vkempik! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2022
@openjdk
Copy link

openjdk bot commented Nov 21, 2022

@VladimirKempik The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 21, 2022
@mlbridge
Copy link

mlbridge bot commented Nov 21, 2022

Webrevs

@VladimirKempik
Copy link
Author

VladimirKempik commented Nov 22, 2022

Withdrawn, this version have issues when operating with infinity, I'll redo the change

test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)

@VladimirKempik VladimirKempik deleted the minmax_fadd branch March 29, 2023 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

1 participant