8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

VladimirKempik · 2022-11-21T20:48:00Z

Please review this change.

It improves performance of Math.min/max intrinsics for Floats and Doubles.

The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
That requires additional logic to handle the case where only of of src is NaN.

Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with fadd-ing srcs into dst and checking the dst for NaN ( with fclass).

The results on the thead c910:

The results, thead c910:

before

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 54023.827 ± 268.645 ns/op
FpMinMaxIntrinsics.dMin avgt 25 54309.850 ± 323.551 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ± 12.114 ns/op
FpMinMaxIntrinsics.fMax avgt 25 53797.657 ± 15.816 ns/op
FpMinMaxIntrinsics.fMin avgt 25 54135.710 ± 313.185 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ± 13.424 ns/op
MaxMinOptimizeTest.dAdd avgt 25 650.810 ± 169.998 us/op
MaxMinOptimizeTest.dMax avgt 25 4561.967 ± 40.367 us/op
MaxMinOptimizeTest.dMin avgt 25 4589.100 ± 75.854 us/op
MaxMinOptimizeTest.dMul avgt 25 759.821 ± 240.092 us/op
MaxMinOptimizeTest.fAdd avgt 25 300.137 ± 13.495 us/op
MaxMinOptimizeTest.fMax avgt 25 4348.885 ± 20.061 us/op
MaxMinOptimizeTest.fMin avgt 25 4372.799 ± 27.296 us/op
MaxMinOptimizeTest.fMul avgt 25 304.024 ± 12.120 us/op

after

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 10545.196 ± 140.137 ns/op
FpMinMaxIntrinsics.dMin avgt 25 10454.525 ± 9.972 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ± 0.892 ns/op
FpMinMaxIntrinsics.fMax avgt 25 10449.709 ± 7.284 ns/op
FpMinMaxIntrinsics.fMin avgt 25 10445.261 ± 7.206 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ± 0.951 ns/op
MaxMinOptimizeTest.dAdd avgt 25 487.769 ± 170.711 us/op
MaxMinOptimizeTest.dMax avgt 25 929.394 ± 158.697 us/op
MaxMinOptimizeTest.dMin avgt 25 864.230 ± 284.794 us/op
MaxMinOptimizeTest.dMul avgt 25 894.116 ± 342.550 us/op
MaxMinOptimizeTest.fAdd avgt 25 284.664 ± 1.446 us/op
MaxMinOptimizeTest.fMax avgt 25 384.388 ± 15.004 us/op
MaxMinOptimizeTest.fMin avgt 25 371.952 ± 15.295 us/op
MaxMinOptimizeTest.fMul avgt 25 305.226 ± 12.467 us/op

significant improvement

On hifive u74 ( unmatched) the improvements is less significant:

hifive:

before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 30219.666 ± 12.878 ns/op
FpMinMaxIntrinsics.dMin avgt 25 30242.249 ± 31.374 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ± 2.803 ns/op
FpMinMaxIntrinsics.fMax avgt 25 30150.114 ± 22.421 ns/op
FpMinMaxIntrinsics.fMin avgt 25 30149.752 ± 20.813 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ± 4.251 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1143.582 ± 4.444 us/op
MaxMinOptimizeTest.dMax avgt 25 2556.317 ± 3.795 us/op
MaxMinOptimizeTest.dMin avgt 25 2556.569 ± 2.274 us/op
MaxMinOptimizeTest.dMul avgt 25 1142.769 ± 1.593 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.688 ± 7.342 us/op
MaxMinOptimizeTest.fMax avgt 25 2280.381 ± 1.535 us/op
MaxMinOptimizeTest.fMin avgt 25 2280.760 ± 1.532 us/op
MaxMinOptimizeTest.fMul avgt 25 748.991 ± 7.261 us/op

after:

Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 27723.791 ± 22.784 ns/op
FpMinMaxIntrinsics.dMin avgt 25 27760.799 ± 45.411 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ± 2.829 ns/op
FpMinMaxIntrinsics.fMax avgt 25 25992.753 ± 23.788 ns/op
FpMinMaxIntrinsics.fMin avgt 25 25994.554 ± 32.060 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ± 2.169 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1144.128 ± 4.371 us/op
MaxMinOptimizeTest.dMax avgt 25 1968.145 ± 2.346 us/op
MaxMinOptimizeTest.dMin avgt 25 1970.249 ± 4.712 us/op
MaxMinOptimizeTest.dMul avgt 25 1143.356 ± 2.203 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.634 ± 7.229 us/op
MaxMinOptimizeTest.fMax avgt 25 1523.719 ± 0.570 us/op
MaxMinOptimizeTest.fMin avgt 25 1524.534 ± 1.109 us/op
MaxMinOptimizeTest.fMul avgt 25 748.643 ± 7.291 us/op

fAdd/dAdd and fMul/dMull is unaffected likely due to :

private double dAddBench(double a, double b) {
return Math.max(a, b) + Math.min(a, b);
}

private double dMulBench(double a, double b) {
return Math.max(a, b) * Math.min(a, b);
}
may get reduces to just a + b and a*b respectively without actually using min/max

Testing : tier1/tier2 in progress, will update this as soon as it finishes

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8297359: RISC-V: improve performance of floating Max Min intrinsics

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11276/head:pull/11276
$ git checkout pull/11276

Update a local copy of the PR:
$ git checkout pull/11276
$ git pull https://git.openjdk.org/jdk pull/11276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11276

View PR using the GUI difftool:
$ git pr show -t 11276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11276.diff

bridgekeeper · 2022-11-21T20:49:37Z

👋 Welcome back vkempik! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-11-21T20:52:57Z

@VladimirKempik The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-11-21T20:56:29Z

Webrevs

00: Full (a5f281e6)

VladimirKempik · 2022-11-22T08:29:10Z

Withdrawn, this version have issues when operating with infinity, I'll redo the change

test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)

VladimirKempik added 3 commits November 15, 2022 10:33

minmax with fadd and fclass

b675249

Merge branch 'master' into minmax_fadd

0d41939

remove unneeded space addition

a5f281e

openjdk bot added the rfr Pull request is ready for review label Nov 21, 2022

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 21, 2022

VladimirKempik closed this Nov 22, 2022

VladimirKempik deleted the minmax_fadd branch March 29, 2023 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

Uh oh!

VladimirKempik commented Nov 21, 2022 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Nov 21, 2022

Uh oh!

openjdk bot commented Nov 21, 2022

Uh oh!

mlbridge bot commented Nov 21, 2022

Uh oh!

VladimirKempik commented Nov 22, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

8297359: RISC-V: improve performance of floating Max Min intrinsics #11276

Uh oh!

Conversation

VladimirKempik commented Nov 21, 2022 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Nov 21, 2022

Uh oh!

openjdk bot commented Nov 21, 2022

Uh oh!

mlbridge bot commented Nov 21, 2022

Webrevs

Uh oh!

VladimirKempik commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

VladimirKempik commented Nov 21, 2022 •

edited by openjdk bot

Loading

VladimirKempik commented Nov 22, 2022 •

edited

Loading