-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8297359: RISC-V: improve performance of floating Max Min intrinsics #11276
Conversation
👋 Welcome back vkempik! A progress list of the required criteria for merging this PR into |
@VladimirKempik The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Withdrawn, this version have issues when operating with infinity, I'll redo the change
|
Please review this change.
It improves performance of Math.min/max intrinsics for Floats and Doubles.
The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
That requires additional logic to handle the case where only of of src is NaN.
Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with fadd-ing srcs into dst and checking the dst for NaN ( with fclass).
The results on the thead c910:
The results, thead c910:
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 54023.827 ± 268.645 ns/op
FpMinMaxIntrinsics.dMin avgt 25 54309.850 ± 323.551 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ± 12.114 ns/op
FpMinMaxIntrinsics.fMax avgt 25 53797.657 ± 15.816 ns/op
FpMinMaxIntrinsics.fMin avgt 25 54135.710 ± 313.185 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ± 13.424 ns/op
MaxMinOptimizeTest.dAdd avgt 25 650.810 ± 169.998 us/op
MaxMinOptimizeTest.dMax avgt 25 4561.967 ± 40.367 us/op
MaxMinOptimizeTest.dMin avgt 25 4589.100 ± 75.854 us/op
MaxMinOptimizeTest.dMul avgt 25 759.821 ± 240.092 us/op
MaxMinOptimizeTest.fAdd avgt 25 300.137 ± 13.495 us/op
MaxMinOptimizeTest.fMax avgt 25 4348.885 ± 20.061 us/op
MaxMinOptimizeTest.fMin avgt 25 4372.799 ± 27.296 us/op
MaxMinOptimizeTest.fMul avgt 25 304.024 ± 12.120 us/op
after
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 10545.196 ± 140.137 ns/op
FpMinMaxIntrinsics.dMin avgt 25 10454.525 ± 9.972 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ± 0.892 ns/op
FpMinMaxIntrinsics.fMax avgt 25 10449.709 ± 7.284 ns/op
FpMinMaxIntrinsics.fMin avgt 25 10445.261 ± 7.206 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ± 0.951 ns/op
MaxMinOptimizeTest.dAdd avgt 25 487.769 ± 170.711 us/op
MaxMinOptimizeTest.dMax avgt 25 929.394 ± 158.697 us/op
MaxMinOptimizeTest.dMin avgt 25 864.230 ± 284.794 us/op
MaxMinOptimizeTest.dMul avgt 25 894.116 ± 342.550 us/op
MaxMinOptimizeTest.fAdd avgt 25 284.664 ± 1.446 us/op
MaxMinOptimizeTest.fMax avgt 25 384.388 ± 15.004 us/op
MaxMinOptimizeTest.fMin avgt 25 371.952 ± 15.295 us/op
MaxMinOptimizeTest.fMul avgt 25 305.226 ± 12.467 us/op
significant improvement
On hifive u74 ( unmatched) the improvements is less significant:
hifive:
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 30219.666 ± 12.878 ns/op
FpMinMaxIntrinsics.dMin avgt 25 30242.249 ± 31.374 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ± 2.803 ns/op
FpMinMaxIntrinsics.fMax avgt 25 30150.114 ± 22.421 ns/op
FpMinMaxIntrinsics.fMin avgt 25 30149.752 ± 20.813 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ± 4.251 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1143.582 ± 4.444 us/op
MaxMinOptimizeTest.dMax avgt 25 2556.317 ± 3.795 us/op
MaxMinOptimizeTest.dMin avgt 25 2556.569 ± 2.274 us/op
MaxMinOptimizeTest.dMul avgt 25 1142.769 ± 1.593 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.688 ± 7.342 us/op
MaxMinOptimizeTest.fMax avgt 25 2280.381 ± 1.535 us/op
MaxMinOptimizeTest.fMin avgt 25 2280.760 ± 1.532 us/op
MaxMinOptimizeTest.fMul avgt 25 748.991 ± 7.261 us/op
after:
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 27723.791 ± 22.784 ns/op
FpMinMaxIntrinsics.dMin avgt 25 27760.799 ± 45.411 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ± 2.829 ns/op
FpMinMaxIntrinsics.fMax avgt 25 25992.753 ± 23.788 ns/op
FpMinMaxIntrinsics.fMin avgt 25 25994.554 ± 32.060 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ± 2.169 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1144.128 ± 4.371 us/op
MaxMinOptimizeTest.dMax avgt 25 1968.145 ± 2.346 us/op
MaxMinOptimizeTest.dMin avgt 25 1970.249 ± 4.712 us/op
MaxMinOptimizeTest.dMul avgt 25 1143.356 ± 2.203 us/op
MaxMinOptimizeTest.fAdd avgt 25 748.634 ± 7.229 us/op
MaxMinOptimizeTest.fMax avgt 25 1523.719 ± 0.570 us/op
MaxMinOptimizeTest.fMin avgt 25 1524.534 ± 1.109 us/op
MaxMinOptimizeTest.fMul avgt 25 748.643 ± 7.291 us/op
fAdd/dAdd and fMul/dMull is unaffected likely due to :
private double dAddBench(double a, double b) {
return Math.max(a, b) + Math.min(a, b);
}
private double dMulBench(double a, double b) {
return Math.max(a, b) * Math.min(a, b);
}
may get reduces to just a + b and a*b respectively without actually using min/max
Testing : tier1/tier2 in progress, will update this as soon as it finishes
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11276/head:pull/11276
$ git checkout pull/11276
Update a local copy of the PR:
$ git checkout pull/11276
$ git pull https://git.openjdk.org/jdk pull/11276/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11276
View PR using the GUI difftool:
$ git pr show -t 11276
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11276.diff