Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8256438: AArch64: Implement match rules with ROR shift register value #1858

Closed
wants to merge 4 commits into from

Conversation

e1iu
Copy link
Member

@e1iu e1iu commented Dec 21, 2020

This patch transforms '(x >>> rshift) + (x << lshift)' into
'RotateRight(x, rshift)' during GVN phase when both the shift exponents
are constants and their sum equals to the number of bits for the type
of shift base.

This patch implements some new match rules for AArch64 instructions
which can take ROR as the optional shift. Such instructions are 'and',
'or', 'eor', 'eon', 'bic' and 'orn'.

ror w11, w2, #5
eor w0, w1, w11

With this patch, above code could be optimized to below:

eor w0, w1, w2, ror #5

Finally, the patch refactors TestRotate.java[1][2].

Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps,
jdk::jdk_core, langtools::tier1.

[1] https://bugs.openjdk.java.net/browse/JDK-8252776
[2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8256438: AArch64: Implement match rules with ROR shift register value

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1858/head:pull/1858
$ git checkout pull/1858

This patch transforms '(x >>> rshift) + (x << lshift)' into
'RotateRight(x, rshift)' during GVN phase when both the shift exponents
are constants and their sum equals to the number of bits for the type
of shift base.

This patch implements some new match rules for AArch64 instructions
which can take ROR as the optional shift. Such instructions are 'and',
'or', 'eor', 'eon', 'bic' and 'orn'.

  ror     w11, w2, openjdk#5
  eor     w0, w1, w11

With this patch, above code could be optimized to below:

  eor     w0, w1, w2, ror openjdk#5

Finally, the patch refactors TestRotate.java[1][2].

Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps,
jdk::jdk_core, langtools::tier1.

[1] https://bugs.openjdk.java.net/browse/JDK-8252776
[2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html

Change-Id: I70842bcdb7cbc31bdf261c3223ea882076c2c66b
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 21, 2020

👋 Welcome back theRealELiu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 21, 2020
@openjdk
Copy link

openjdk bot commented Dec 21, 2020

@theRealELiu The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 21, 2020
@mlbridge
Copy link

mlbridge bot commented Dec 21, 2020

Webrevs

@e1iu
Copy link
Member Author

e1iu commented Jan 4, 2021

Could anyone help to take a look at this PR?

@e1iu
Copy link
Member Author

e1iu commented Jan 28, 2021

Ping

@AzeemJiva
Copy link

What impact does this have on runtime? There are also Tier1 failures in x86, which you'll need to investigate

@e1iu
Copy link
Member Author

e1iu commented Jan 28, 2021

I suppose this failure was not caused by my patch, and it has been fixed by 1594372c.

@mlbridge
Copy link

mlbridge bot commented Jan 28, 2021

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 1/28/21 1:50 AM, Eric Liu wrote:

I'm reviewing it now.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@mlbridge
Copy link

mlbridge bot commented Jan 28, 2021

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 1/28/21 9:30 AM, Andrew Haley wrote:

On 1/28/21 1:50 AM, Eric Liu wrote:

I'm reviewing it now.

Looks basically OK, but was tough to review because of the flags and
comments cleanup that made the diff hard to read.

An oddity. For rotations we have:

match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4)));

but for all other shifts we have:

match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4)));

That is to say, the other shifts have a type suffix. This doesn't look
right. Do you know what is going on?

Otherwise, this looks OK.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@e1iu
Copy link
Member Author

e1iu commented Jan 28, 2021

Thanks for your kindly review.

An oddity. For rotations we have:

match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4)));

but for all other shifts we have:

match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4)));

That is to say, the other shifts have a type suffix. This doesn't look
right. Do you know what is going on?

For rotation node, they have an extra field to identify the type, maybe this could save some code size... And I didn't use 'predicate' to check that because middle-end could guarantee Xor's input type was correct.

@AzeemJiva
Copy link

I suppose this failure was not caused by my patch, and it has been fixed by 1594372c.

Benchmark results?

@theRealAph
Copy link
Contributor

An oddity. For rotations we have:
match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4)));
but for all other shifts we have:
match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4)));
That is to say, the other shifts have a type suffix. This doesn't look
right. Do you know what is going on?

For rotation node, they have an extra field to identify the type, maybe this could save some code size... And I didn't use 'predicate' to check that because middle-end could guarantee Xor's input type was correct.

Ok, I see. That's very ugly, but it's not your fault.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that remains to do is the benchmarks.

@e1iu
Copy link
Member Author

e1iu commented Jan 29, 2021

I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Before:

Benchmark                Mode  Cnt    Score   Error  Units
Rotation.addRotateRight  avgt   15  270.080 ± 0.630  ns/op
Rotation.andRotateRight  avgt   15  269.000 ± 0.226  ns/op
Rotation.bicRotateRight  avgt   15  269.269 ± 0.769  ns/op
Rotation.eonRotateRight  avgt   15  269.456 ± 0.056  ns/op
Rotation.ornRotateRight  avgt   15  269.681 ± 1.028  ns/op
Rotation.xorRotateRight  avgt   15  270.953 ± 2.764  ns/op

After:

Benchmark                Mode  Cnt    Score    Error  Units
Rotation.addRotateRight  avgt   15  399.995 ± 10.702  ns/op
Rotation.andRotateRight  avgt   15  399.517 ±  7.933  ns/op
Rotation.bicRotateRight  avgt   15  396.602 ±  2.038  ns/op
Rotation.eonRotateRight  avgt   15  401.978 ± 17.371  ns/op
Rotation.ornRotateRight  avgt   15  431.052 ± 73.711  ns/op
Rotation.xorRotateRight  avgt   15  410.447 ± 42.929  ns/op

[1] http://cr.openjdk.java.net/~xgong/rfr/8256438/Rotation.java

@mlbridge
Copy link

mlbridge bot commented Jan 29, 2021

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 1/29/21 8:35 AM, Eric Liu wrote:

I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Wow. I'll have to look at this.

In theory we prefer combined instructions because it should reduce code
size. However, we can't escape the possibility that on some implementations
combined instructions might cause pipeline stalls.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@mlbridge
Copy link

mlbridge bot commented Jan 29, 2021

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 29/01/2021 10:11, Andrew Haley wrote:

On 1/29/21 8:35 AM, Eric Liu wrote:

I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Your benchmark did not work for me. It did not generate the correct
instructions.

Please try with this or similar:

@benchmark
public void xorRotateRight(MyState s, Blackhole blackhole) {
int x = s.xi;
int y = s.yi;
for (int i = 0; i < COUNT; i++) {
y = x ^ ((y >>> 5) | (y << -5));
x = y ^ ((x >>> 5) | (x << -5));
}
blackhole.consume(x);
}

I get:

Benchmark Mode Cnt Score Error Units
Rotation.xorRotateRight (before) avgt 3 6142.575 ? 15.940 ns/op
Rotation.xorRotateRight (after) avgt 3 4081.587 ? 33.904 ns/op

Please integrate the corrected benchmark into your patch.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Change-Id: Ia357074efc8488a57030863b3eab7b27839cd3d0
@e1iu e1iu closed this Feb 1, 2021
@e1iu e1iu deleted the eor-shifted branch February 1, 2021 11:09
@e1iu e1iu restored the eor-shifted branch February 1, 2021 11:10
@e1iu e1iu reopened this Feb 1, 2021
Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da
@mlbridge
Copy link

mlbridge bot commented Feb 2, 2021

Mailing list message from Eric Liu on hotspot-compiler-dev:

Hi Andrew,

Thanks for your feedback.

I refined those benchmarks and on two platforms could have a better performance.
One can get about 100% gain, and 30% for another one. Other platforms were hard to
see any performance changes.

Before:
Benchmark Mode Cnt Score Error Units
Rotation.andRotateRight avgt 15 3860.994 ? 3.409 ns/op
Rotation.bicRotateRight avgt 15 3861.247 ? 3.321 ns/op
Rotation.eonRotateRight avgt 15 3860.865 ? 3.003 ns/op
Rotation.ornRotateRight avgt 15 3860.884 ? 3.260 ns/op
Rotation.xorRotateRight avgt 15 3860.886 ? 2.728 ns/op

After:
Benchmark Mode Cnt Score Error Units
Rotation.andRotateRight avgt 15 1933.495 ? 0.263 ns/op
Rotation.bicRotateRight avgt 15 1933.436 ? 0.244 ns/op
Rotation.eonRotateRight avgt 15 1933.459 ? 0.255 ns/op
Rotation.ornRotateRight avgt 15 1933.559 ? 0.316 ns/op
Rotation.xorRotateRight avgt 15 1933.467 ? 0.245 ns/op

I would update my patch with the benchmark tests after finishing the whole tests.

--Eric

-----Original Message-----
From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On Behalf Of Andrew Haley
Sent: Saturday, January 30, 2021 12:30 AM
To: hotspot-compiler-dev at openjdk.java.net
Subject: Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

On 29/01/2021 10:11, Andrew Haley wrote:

On 1/29/21 8:35 AM, Eric Liu wrote:

I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Your benchmark did not work for me. It did not generate the correct instructions.

Please try with this or similar:

@benchmark
public void xorRotateRight(MyState s, Blackhole blackhole) {
int x = s.xi;
int y = s.yi;
for (int i = 0; i < COUNT; i++) {
y = x ^ ((y >>> 5) | (y << -5));
x = y ^ ((x >>> 5) | (x << -5));
}
blackhole.consume(x);
}

I get:

Benchmark Mode Cnt Score Error Units
Rotation.xorRotateRight (before) avgt 3 6142.575 ? 15.940 ns/op
Rotation.xorRotateRight (after) avgt 3 4081.587 ? 33.904 ns/op

Please integrate the corrected benchmark into your patch.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

@e1iu
Copy link
Member Author

e1iu commented Feb 9, 2021

@theRealAph Could you help to take a look ?

@e1iu
Copy link
Member Author

e1iu commented Feb 17, 2021

Ping

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u.

@openjdk
Copy link

openjdk bot commented Feb 17, 2021

@theRealELiu This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256438: AArch64: Implement match rules with ROR shift register value

Reviewed-by: aph, roland

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 321 new commits pushed to the master branch:

  • fac37bf: 8262269: javadoc test TestGeneratedClasses.java fails on Windows
  • 3e13b66: 8262157: LingeredApp.startAppExactJvmOpts does not print app output when launching fails
  • c769388: 8262266: JDK-8262049 fails validate-source
  • 03e781b: 8262265: ProblemList jdk/javadoc/doclet/testGeneratedClasses/TestGeneratedClasses.java on Windows
  • c6eae06: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode
  • e5304b3: 8253409: Double-rounding possibility in float fma
  • 3132b1c: 8261665: Clean up naming of StringContent and FixedStringContent
  • c30a90b: 8261976: Normalize id's used by the standard doclet
  • 53b1545: 8223355: Redundant output by javadoc
  • d2b9c22: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread
  • ... and 311 more: https://git.openjdk.java.net/jdk/compare/aec037721cf8990c3f7c4364730812447e0811e1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @rwestrel) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 17, 2021
@e1iu
Copy link
Member Author

e1iu commented Feb 18, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 18, 2021
@openjdk
Copy link

openjdk bot commented Feb 18, 2021

@theRealELiu
Your change (at version 492f4ca) is now ready to be sponsored by a Committer.

@e1iu
Copy link
Member Author

e1iu commented Feb 18, 2021

OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u.

Thanks for your review, I'd like to backport this.

BTW, Is there anyone could review those trivial shared code?

@e1iu
Copy link
Member Author

e1iu commented Feb 20, 2021

@TobiHartmann Could you help to review those shared code?

Change-Id: I18dda4a01154bce72fd4025685fa0721263092ce
@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Feb 23, 2021
Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared code looks good to me.

@e1iu
Copy link
Member Author

e1iu commented Feb 23, 2021

Thanks for your review. I will integrate it after some tests were finished.

@e1iu
Copy link
Member Author

e1iu commented Feb 24, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 24, 2021
@openjdk
Copy link

openjdk bot commented Feb 24, 2021

@theRealELiu
Your change (at version 9357723) is now ready to be sponsored by a Committer.

@nsjian
Copy link

nsjian commented Feb 24, 2021

/sponsor

@openjdk openjdk bot closed this Feb 24, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 24, 2021
@openjdk
Copy link

openjdk bot commented Feb 24, 2021

@nsjian @theRealELiu Since your change was applied there have been 321 commits pushed to the master branch:

  • fac37bf: 8262269: javadoc test TestGeneratedClasses.java fails on Windows
  • 3e13b66: 8262157: LingeredApp.startAppExactJvmOpts does not print app output when launching fails
  • c769388: 8262266: JDK-8262049 fails validate-source
  • 03e781b: 8262265: ProblemList jdk/javadoc/doclet/testGeneratedClasses/TestGeneratedClasses.java on Windows
  • c6eae06: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode
  • e5304b3: 8253409: Double-rounding possibility in float fma
  • 3132b1c: 8261665: Clean up naming of StringContent and FixedStringContent
  • c30a90b: 8261976: Normalize id's used by the standard doclet
  • 53b1545: 8223355: Redundant output by javadoc
  • d2b9c22: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread
  • ... and 311 more: https://git.openjdk.java.net/jdk/compare/aec037721cf8990c3f7c4364730812447e0811e1...master

Your commit was automatically rebased without conflicts.

Pushed as commit 382e38d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@e1iu e1iu deleted the eor-shifted branch March 18, 2021 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants