Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8283699: Improve the peephole mechanism of hotspot #8025

Closed
wants to merge 34 commits into from

Conversation

merykitty
Copy link
Member

@merykitty merykitty commented Mar 29, 2022

Hi,

The current peephole mechanism has several drawbacks:

  • Can only match and remove adjacent instructions.
  • Cannot match machine ideal nodes (e.g MachSpillCopyNode).
  • Can only replace 1 instruction, the position of insertion is limited to the position at which the matched nodes reside.
  • Is actually broken since the nodes are not connected properly and OptoScheduling requires true dependencies between nodes.

The patch proposes to enhance the peephole mechanism by allowing a peep rule to call into a dedicated function, which takes the responsibility to perform all required transformations on the basic block. This allows the peephole mechanism to perform several transformations effectively in a more fine-grain manner.

The patch uses the peephole optimisation to perform some classic peepholes, transforming on x86 the sequences:

mov r1, r2    ->    lea r1, [r2 + r3/i]
add r1, r3/i

and

mov r1, r2    ->    lea r1, [r2 << i], with i = 1, 2, 3
shl r1, i

On the added benchmarks, the transformations show positive results:

Benchmark             Mode  Cnt     Score     Error  Units
LeaPeephole.B_D_int   avgt    5  1200.490 ± 104.662  ns/op
LeaPeephole.B_D_long  avgt    5  1211.439 ±  30.196  ns/op
LeaPeephole.B_I_int   avgt    5  1118.831 ±   7.995  ns/op
LeaPeephole.B_I_long  avgt    5  1112.389 ±  15.838  ns/op
LeaPeephole.I_S_int   avgt    5  1262.528 ±   7.293  ns/op
LeaPeephole.I_S_long  avgt    5  1223.820 ±  17.777  ns/op

Benchmark             Mode  Cnt    Score    Error  Units
LeaPeephole.B_D_int   avgt    5  860.889 ±  6.089  ns/op
LeaPeephole.B_D_long  avgt    5  945.455 ± 21.603  ns/op
LeaPeephole.B_I_int   avgt    5  849.109 ±  9.809  ns/op
LeaPeephole.B_I_long  avgt    5  851.283 ± 16.921  ns/op
LeaPeephole.I_S_int   avgt    5  976.594 ± 23.004  ns/op
LeaPeephole.I_S_long  avgt    5  936.984 ±  9.601  ns/op

A following patch would add IR tests for these transformations since the IR framework has not been able to parse the ideal scheduling yet although printing the scheduling itself has been made possible recently.

Thank you very much.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8283699: Improve the peephole mechanism of hotspot

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8025/head:pull/8025
$ git checkout pull/8025

Update a local copy of the PR:
$ git checkout pull/8025
$ git pull https://git.openjdk.org/jdk pull/8025/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8025

View PR using the GUI difftool:
$ git pr show -t 8025

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8025.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 29, 2022

👋 Welcome back merykitty! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 30, 2022

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 30, 2022
@merykitty merykitty marked this pull request as ready for review April 1, 2022 19:22
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 1, 2022
@mlbridge
Copy link

mlbridge bot commented Apr 1, 2022

Webrevs

@merykitty
Copy link
Member Author

In case this has not reached the mailing list, may someone take a look at this, please.
Thank you very much.

@bridgekeeper
Copy link

bridgekeeper bot commented May 4, 2022

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!


#ifdef COMPILER2

#include "opto/peephole.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why opto/peephole.hpp is useful. Why not just include peephole_x86.hpp? Then the empty peephole_.hpp for the other platforms are no longer needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opto/peephole.hpp is needed from the generated ad_x86_peephole.cpp so that addI_rRegNode::peephole can call the helper functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about including peephole_<cpu>.hpp only when "peepprocedure" is seen, and delete opto/peephole.hpp and empty peephole_<cpu>.hpp files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if I include peephole_x86.hpp in x86_64.ad in a source hpp block? This will result in the include appearing in ad_x86.hpp, which will be transitively included in ad_x86_peephole.cpp. Thanks a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering peephole_x86.cpp contains only x64-specific code, I had a suggestion to rename it into peephole_x86_64.cpp and move #ifdef _LP64 into peephole_x86.hpp to guard x64-specific declarations. But if you intend to include the header directly from x86_64.ad, you can rename the header to peephole_x86_64.hpp and get rid of #ifdef _LP64 completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored that and removed opto/peephole.hpp as well as other peephole headers. peephole_x86.cpp is also renamed to peephole_x86_64.cpp. Thanks very much.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 10, 2022
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started build testing. But I can only verify 64-bit.
@merykitty Can you verify 32 build too?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 10, 2022
@@ -319,6 +319,12 @@ reg_class int_rdi_reg(RDI);
//----------SOURCE BLOCK-------------------------------------------------------
// This is a block of C++ code which provides values, functions, and
// definitions necessary in the rest of the architecture description

source_hpp %{
#include CPU_HEADER(peephole)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you simply include peephole_x86_64.hpp here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@merykitty
Copy link
Member Author

@vnkozlov I saw GHA being happy with x86_32, I also tried cross-compiling locally for 32-bit build.

@vnkozlov
Copy link
Contributor

@vnkozlov I saw GHA being happy with x86_32, I also tried cross-compiling locally for 32-bit build.

Good. My builds in tier1 also passed. I don't have any more comments.

@merykitty
Copy link
Member Author

Thank you very much for your reviews!
/integrate

@openjdk
Copy link

openjdk bot commented Oct 11, 2022

@merykitty This pull request has not yet been marked as ready for integration.

@merykitty
Copy link
Member Author

@vnkozlov Could you reapprove this PR, please?

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 11, 2022
@merykitty
Copy link
Member Author

Thanks a lot
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 11, 2022
@openjdk
Copy link

openjdk bot commented Oct 11, 2022

@merykitty
Your change (at version 9a0b65f) is now ready to be sponsored by a Committer.

@vnkozlov
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Oct 12, 2022

Going to push as commit 703a6ef.
Since your change was applied there have been 187 commits pushed to the master branch:

  • 94a9b04: 8295013: OopStorage should derive from CHeapObjBase
  • 3a980b9: 8295168: Remove superfluous period in @throws tag description
  • 9bb932c: 8295154: Documentation for RemoteExecutionControl.invoke(Method) inherits non-existent documentation
  • 945950d: 8295069: [PPC64] Performance regression after JDK-8290025
  • d362e16: 8294689: The SA transported_core.html file needs quite a bit of work
  • 07946aa: 8289552: Make intrinsic conversions between bit representations of half precision values and floats
  • 2586b1a: 8295155: Incorrect javadoc of java.base module
  • e1a77cf: 8295163: Remove old hsdis Makefile
  • 3c7ae12: 8294821: Class load improvement for AES crypto engine
  • 619cd82: 8294702: BufferedInputStream uses undefined value range for markpos
  • ... and 177 more: https://git.openjdk.org/jdk/compare/3419363e89eaeef61a44fa1ab12d6a355323eb68...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 12, 2022
@openjdk openjdk bot closed this Oct 12, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 12, 2022
@openjdk
Copy link

openjdk bot commented Oct 12, 2022

@vnkozlov @merykitty Pushed as commit 703a6ef.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@merykitty merykitty deleted the peephole branch April 27, 2023 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants