Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8322790: RISC-V: Tune costs for shuffles with no conversion #17206

Closed
wants to merge 2 commits into from

Conversation

Ilyagavrilin
Copy link

@Ilyagavrilin Ilyagavrilin commented Dec 30, 2023

Hi all, please review this small change to RISC-V nodes insertion costs.
Now we have several nodes which provide shuffles without conversion:

// stack <-> reg and reg <-> reg shuffles with no conversion
instruct MoveF2I_stack_reg(iRegINoSp dst, stackSlotF src) %{
match(Set dst (MoveF2I src));
effect(DEF dst, USE src);
ins_cost(LOAD_COST);
format %{ "lw $dst, $src\t#@MoveF2I_stack_reg" %}
ins_encode %{
__ lw(as_Register($dst$$reg), Address(sp, $src$$disp));
%}
ins_pipe(iload_reg_reg);
%}
instruct MoveI2F_stack_reg(fRegF dst, stackSlotI src) %{
match(Set dst (MoveI2F src));
effect(DEF dst, USE src);
ins_cost(LOAD_COST);
format %{ "flw $dst, $src\t#@MoveI2F_stack_reg" %}
ins_encode %{
__ flw(as_FloatRegister($dst$$reg), Address(sp, $src$$disp));
%}
ins_pipe(fp_load_mem_s);
%}
instruct MoveD2L_stack_reg(iRegLNoSp dst, stackSlotD src) %{
match(Set dst (MoveD2L src));
effect(DEF dst, USE src);
ins_cost(LOAD_COST);
format %{ "ld $dst, $src\t#@MoveD2L_stack_reg" %}
ins_encode %{
__ ld(as_Register($dst$$reg), Address(sp, $src$$disp));
%}
ins_pipe(iload_reg_reg);
%}
instruct MoveL2D_stack_reg(fRegD dst, stackSlotL src) %{
match(Set dst (MoveL2D src));
effect(DEF dst, USE src);
ins_cost(LOAD_COST);
format %{ "fld $dst, $src\t#@MoveL2D_stack_reg" %}
ins_encode %{
__ fld(as_FloatRegister($dst$$reg), Address(sp, $src$$disp));
%}
ins_pipe(fp_load_mem_d);
%}
instruct MoveF2I_reg_stack(stackSlotI dst, fRegF src) %{
match(Set dst (MoveF2I src));
effect(DEF dst, USE src);
ins_cost(STORE_COST);
format %{ "fsw $src, $dst\t#@MoveF2I_reg_stack" %}
ins_encode %{
__ fsw(as_FloatRegister($src$$reg), Address(sp, $dst$$disp));
%}
ins_pipe(fp_store_reg_s);
%}
instruct MoveI2F_reg_stack(stackSlotF dst, iRegI src) %{
match(Set dst (MoveI2F src));
effect(DEF dst, USE src);
ins_cost(STORE_COST);
format %{ "sw $src, $dst\t#@MoveI2F_reg_stack" %}
ins_encode %{
__ sw(as_Register($src$$reg), Address(sp, $dst$$disp));
%}
ins_pipe(istore_reg_reg);
%}
instruct MoveD2L_reg_stack(stackSlotL dst, fRegD src) %{
match(Set dst (MoveD2L src));
effect(DEF dst, USE src);
ins_cost(STORE_COST);
format %{ "fsd $dst, $src\t#@MoveD2L_reg_stack" %}
ins_encode %{
__ fsd(as_FloatRegister($src$$reg), Address(sp, $dst$$disp));
%}
ins_pipe(fp_store_reg_d);
%}
instruct MoveL2D_reg_stack(stackSlotD dst, iRegL src) %{
match(Set dst (MoveL2D src));
effect(DEF dst, USE src);
ins_cost(STORE_COST);
format %{ "sd $src, $dst\t#@MoveL2D_reg_stack" %}
ins_encode %{
__ sd(as_Register($src$$reg), Address(sp, $dst$$disp));
%}
ins_pipe(istore_reg_reg);
%}
instruct MoveF2I_reg_reg(iRegINoSp dst, fRegF src) %{
match(Set dst (MoveF2I src));
effect(DEF dst, USE src);
ins_cost(XFER_COST);
format %{ "fmv.x.w $dst, $src\t#@MoveF2I_reg_reg" %}
ins_encode %{
__ fmv_x_w(as_Register($dst$$reg), as_FloatRegister($src$$reg));
%}
ins_pipe(fp_f2i);
%}
instruct MoveI2F_reg_reg(fRegF dst, iRegI src) %{
match(Set dst (MoveI2F src));
effect(DEF dst, USE src);
ins_cost(XFER_COST);
format %{ "fmv.w.x $dst, $src\t#@MoveI2F_reg_reg" %}
ins_encode %{
__ fmv_w_x(as_FloatRegister($dst$$reg), as_Register($src$$reg));
%}
ins_pipe(fp_i2f);
%}
instruct MoveD2L_reg_reg(iRegLNoSp dst, fRegD src) %{
match(Set dst (MoveD2L src));
effect(DEF dst, USE src);
ins_cost(XFER_COST);
format %{ "fmv.x.d $dst, $src\t#@MoveD2L_reg_reg" %}
ins_encode %{
__ fmv_x_d(as_Register($dst$$reg), as_FloatRegister($src$$reg));
%}
ins_pipe(fp_d2l);
%}
instruct MoveL2D_reg_reg(fRegD dst, iRegL src) %{
match(Set dst (MoveL2D src));
effect(DEF dst, USE src);
ins_cost(XFER_COST);
format %{ "fmv.d.x $dst, $src\t#@MoveL2D_reg_reg" %}
ins_encode %{
__ fmv_d_x(as_FloatRegister($dst$$reg), as_Register($src$$reg));
%}
ins_pipe(fp_l2d);
%}

On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue).
After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board):

Benchmark Upstream build (ops/ms) Patched build (ops/ms) difference (%)
MathBench.doubleToRawLongBitsDouble 30935.139 32171.761 +4.00
StrictMathBench.ceilDouble 24682.810 29782.050 +20.66
StrictMathBench.cosDouble 6948.309 6938.276 -0.14
StrictMathBench.expDouble 6816.143 7211.021 +5.79
StrictMathBench.floorDouble 30699.630 34189.509 +11.37
StrictMathBench.maxDouble 35157.355 34675.191 -1.37
StrictMathBench.minDouble 35192.135 35183.015 -0.03
StrictMathBench.sinDouble 6698.405 6721.809 +0.35

New benchmark for changed nodes:

--- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
+++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
@@ -540,4 +540,11 @@ public class MathBench {
         return  Math.ulp(float7);
     }
 
+    @Benchmark
+    public long doubleToRawLongBitsDouble() {
+        double dbl162Dot5 = double81 * 2.0d + double0Dot5;
+        double dbl3 = double2 + double1;
+        return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
+    }
+


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8322790: RISC-V: Tune costs for shuffles with no conversion (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17206/head:pull/17206
$ git checkout pull/17206

Update a local copy of the PR:
$ git checkout pull/17206
$ git pull https://git.openjdk.org/jdk.git pull/17206/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17206

View PR using the GUI difftool:
$ git pr show -t 17206

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17206.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 30, 2023

👋 Welcome back igavrilin! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 30, 2023
@openjdk
Copy link

openjdk bot commented Dec 30, 2023

@Ilyagavrilin The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 30, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 30, 2023

Webrevs

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems reasonable to me.

@openjdk
Copy link

openjdk bot commented Jan 2, 2024

@Ilyagavrilin This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322790: RISC-V: Tune costs for shuffles with no conversion

Reviewed-by: rehn, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 85 new commits pushed to the master branch:

  • c8fa3e2: 8320310: CompiledMethod::has_monitors flag can be incorrect
  • 57a65fe: 8322003: JShell - Incorrect type inference in lists of records implementing interfaces
  • c90768c: 8318444: Write details about compilation bailouts into crash reports
  • 29397d2: 8320317: ObjectMonitor NotRunnable is not really an optimization
  • fc04750: 8321371: SpinPause() not implemented for bsd_aarch64/macOS
  • 458e563: 8310711: [IR Framework] Remove safepoint while printing handling
  • 71aac7a: 8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows
  • 09c6c4f: 8322489: 22-b27: Up to 7% regression in all Footprint3-*-G1/ZGC
  • eb9e754: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination
  • a40d397: 8323110: Eliminate -Wparentheses warnings in ppc code
  • ... and 75 more: https://git.openjdk.org/jdk/compare/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@robehn, @RealFYang) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 2, 2024
@@ -8530,7 +8531,7 @@ instruct MoveF2I_stack_reg(iRegINoSp dst, stackSlotF src) %{

effect(DEF dst, USE src);

ins_cost(LOAD_COST);
ins_cost(ALU_COST + LOAD_COST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an extra cost of ALU_COST for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those nodes need to go below 100 which then starts looking ugly

Copy link
Member

@RealFYang RealFYang Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the performance gain is still there (tested on lichee-pi-4a board) when reverting part of the changes. I haven't checked the JIT code though. Try this addon change:

addon-change.diff.txt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reverting some changes still leaves good generation. I have performed some more benchmarks on thead board, in all cases necessary instructions are generated in JIT code.

@Ilyagavrilin
Copy link
Author

Thanks @RealFYang for suggested changes, performed some additional tests on thead board, also checked JIT code for some tests.

Benchmark Upstream Old patch Current patch
lang.MathBench.doubleToRawLongBitsDouble 30495.868 32332.48 31635.15
lang.MathBench.longBitsToDoubleLong 35161.101 34542.878 34146.705
lang.StrictMathBench.ceilDouble 24272.224 29797.862 29094.981
lang.StrictMathBench.cosDouble 6967.161 6930.468 6960.957
lang.StrictMathBench.expDouble 6812.605 7211.988 7123.429
lang.StrictMathBench.floorDouble 29893.151 34193.412 33257.669
lang.StrictMathBench.maxDouble 34684.497 35194.694 35199.944
lang.StrictMathBench.minDouble 34692.521 34673.531 34678.324
lang.StrictMathBench.sinDouble 6769.593 6714.003 6736.884
math.FpRoundingBenchmark.testnativeceil 67.801 115.6 116.822
math.FpRoundingBenchmark.testnativefloor 71.745 116.59 116.662

Additional benchmarks:

diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java
index 27d8033b8b7..fd39cc58222 100644
--- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
+++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
@@ -540,4 +540,17 @@ public class MathBench {
         return  Math.ulp(float7);
     }
 
+    @Benchmark
+    public long doubleToRawLongBitsDouble() {
+        double dbl162Dot5 = double81 * 2.0d + double0Dot5;
+        double dbl3 = double2 + double1;
+        return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
+    }
+
+    @Benchmark
+    public double longBitsToDoubleLong() {
+        long lng14 = long13 + long1;
+        long lng750 = long747 + 3;
+        return Double.longBitsToDouble(lng14) + Double.longBitsToDouble(lng750);
+    }
 }
diff --git a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
index cf0eed32e07..3687f43b886 100644
--- a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
+++ b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
@@ -75,4 +75,16 @@ public class FpRoundingBenchmark {
     for (int i = 0; i < TESTSIZE; i++)
       Res[i] = Math.rint(DargV1[i]);
   }
+
+  @Benchmark
+  public void testnativeceil(Blackhole bh) {
+    for (int i = 0; i < TESTSIZE; i++)
+      Res[i] = StrictMath.ceil(DargV1[i]);
+  }
+
+  @Benchmark
+  public void testnativefloor(Blackhole bh) {
+    for (int i = 0; i < TESTSIZE; i++)
+      Res[i] = StrictMath.floor(DargV1[i]);
+  }
 }

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reasonable to me.

@Ilyagavrilin
Copy link
Author

@robehn @RealFYang Thanks for your reviews.
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jan 8, 2024
@openjdk
Copy link

openjdk bot commented Jan 8, 2024

@Ilyagavrilin
Your change (at version ae8bca9) is now ready to be sponsored by a Committer.

@VladimirKempik
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Jan 8, 2024

Going to push as commit 2acb5bd.
Since your change was applied there have been 85 commits pushed to the master branch:

  • c8fa3e2: 8320310: CompiledMethod::has_monitors flag can be incorrect
  • 57a65fe: 8322003: JShell - Incorrect type inference in lists of records implementing interfaces
  • c90768c: 8318444: Write details about compilation bailouts into crash reports
  • 29397d2: 8320317: ObjectMonitor NotRunnable is not really an optimization
  • fc04750: 8321371: SpinPause() not implemented for bsd_aarch64/macOS
  • 458e563: 8310711: [IR Framework] Remove safepoint while printing handling
  • 71aac7a: 8276809: java/awt/font/JNICheck/FreeTypeScalerJNICheck.java shows JNI warning on Windows
  • 09c6c4f: 8322489: 22-b27: Up to 7% regression in all Footprint3-*-G1/ZGC
  • eb9e754: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination
  • a40d397: 8323110: Eliminate -Wparentheses warnings in ppc code
  • ... and 75 more: https://git.openjdk.org/jdk/compare/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 8, 2024
@openjdk openjdk bot closed this Jan 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jan 8, 2024
@openjdk
Copy link

openjdk bot commented Jan 8, 2024

@VladimirKempik @Ilyagavrilin Pushed as commit 2acb5bd.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants