Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8275317: AArch64: Support some type conversion vectorization in SLP #6145

Closed
wants to merge 1 commit into from

Conversation

fg1417
Copy link
Contributor

@fg1417 fg1417 commented Oct 28, 2021

Current SLP vectorizer in C2 compiler doesn't support type conversion
operations. But AArch64 has vector type conversion instructions in
both NEON and SVE.

The type conversion involves two kinds of scenarios, conversion between
the same data sizes and conversion between different data sizes. If we
want to support casts between different data sizes, we need to amend
the code part for identifying adjacent memory references and the code
part for justifying if the combination is profitable. I suppose it
would be easier to review if we split the whole task to support type
conversion into two separate patches, one for the same data sizes and
the other one for different data sizes. The goal of this patch is just
to support conversions within the same data size, including:
int -> float
float -> int
long -> double
double -> long

A typical test case:

for (int i = start; i < limit; i++) {
b[i] = (float) a[i];
}

To implement it, the patch completed the necessary instructions and
matching rules in the backend and added implementation for SLP in
the middle end.

The percentage of performance uplift on aarch64 system:
Mode: avgt
Cnt: 15
Metric: (ns/op)

benchmark percentage change [(After-Before)/Before]
VectorLoop.convertD2L -48.46%
VectorLoop.convertF2I -55.67%
VectorLoop.convertI2F -55.27%
VectorLoop.convertL2D -48.75%


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8275317: AArch64: Support some type conversion vectorization in SLP

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6145/head:pull/6145
$ git checkout pull/6145

Update a local copy of the PR:
$ git checkout pull/6145
$ git pull https://git.openjdk.java.net/jdk pull/6145/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6145

View PR using the GUI difftool:
$ git pr show -t 6145

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6145.diff

Current SLP vectorizer in C2 compiler doesn't support type conversion
operations. But AArch64 has vector type conversion instructions in
both NEON and SVE.

The type conversion involves two kinds of scenarios, conversion between
the same data sizes and conversion between different data sizes. If we
want to support casts between different data sizes, we need to amend
the code part for identifying adjacent memory references and the code
part for justifying if the combination is profitable. I suppose it
would be easier to review if we split the whole task to support type
conversion into two separate patches, one for the same data sizes and
the other one for different data sizes. The goal of this patch is just
to support conversions within the same data size, including:
  int -> float
  float -> int
  long -> double
  double -> long

A typical test case:

for (int i = start; i < limit; i++) {
    b[i] = (float) a[i];
}

To implement it, the patch completed the necessary instructions and
matching rules in the backend and added implementation for SLP in
the middle end.

The percentage of performance uplift on aarch64 system:

benchmark	        Mode   Cnt    percentage change        Metric
                                    (After-Before)/Before
VectorLoop.convertD2L	avgt    15       -48.46%	        ns/op
VectorLoop.convertF2I	avgt    15       -55.67%	        ns/op
VectorLoop.convertI2F	avgt    15       -55.27%	        ns/op
VectorLoop.convertL2D	avgt    15       -48.75%	        ns/op

Change-Id: I91fe4455fa63047e980ffd212ec6e9233210ef47
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 28, 2021

👋 Welcome back fg1417! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Oct 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2021

@fg1417 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler label Oct 28, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 28, 2021

Webrevs

@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 5, 2021

Hi, the patch mostly benefits AArch64, but the changes are all shared code. Could anyone help review it ?

Copy link
Member

@TobiHartmann TobiHartmann left a comment

That looks good to me but x86 supports vector instructions for these operations as well, right? Or am I missing something?

match(Set dst (VectorCastI2X src));

Do you have perf numbers for x86?

@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 11, 2021

That looks good to me but x86 supports vector instructions for these operations as well, right? Or am I missing something?

match(Set dst (VectorCastI2X src));

Do you have perf numbers for x86?

Yes, I got perf data on normal X86 as showed below:

Before the patch:
Benchmark (length) Mode Cnt Score Error Units
VectorLoop.convertD2L 523 avgt 15 527.330 ± 6.159 ns/op
VectorLoop.convertF2I 523 avgt 15 545.808 ± 4.677 ns/op
VectorLoop.convertI2F 523 avgt 15 373.227 ± 1.259 ns/op
VectorLoop.convertL2D 523 avgt 15 869.646 ± 0.183 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
VectorLoop.convertD2L 523 avgt 15 530.785 ± 4.767 ns/op
VectorLoop.convertF2I 523 avgt 15 545.831 ± 7.576 ns/op
VectorLoop.convertI2F 523 avgt 15 66.562 ± 2.270 ns/op
VectorLoop.convertL2D 523 avgt 15 869.510 ± 0.075 ns/op

X86 supports int to FP only, and got performance uplift on convertI2F. But it has implementation limitation on both FP to integer types and double to long, and got no obvious positive effect on these scenarios.

x86 supports Long to double vector operations only on AVX512. Here is the perf data on AVX512:

Before the patch:
Benchmark (length) Mode Cnt Score Error Units
VectorLoop.convertD2L 523 avgt 15 525.806 ± 4.374 ns/op
VectorLoop.convertF2I 523 avgt 15 427.714 ± 2.437 ns/op
VectorLoop.convertI2F 523 avgt 15 429.706 ± 2.365 ns/op
VectorLoop.convertL2D 523 avgt 15 869.453 ± 0.056 ns/op

After the patch:
Benchmark (length) Mode Cnt Score Error Units
VectorLoop.convertD2L 523 avgt 15 528.772 ± 3.064 ns/op
VectorLoop.convertF2I 523 avgt 15 428.822 ± 1.801 ns/op
VectorLoop.convertI2F 523 avgt 15 67.560 ± 0.057 ns/op
VectorLoop.convertL2D 523 avgt 15 98.849 ± 0.922 ns/op

As described above, there are obvious uplift on convertI2F and convertL2D.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Thanks for sharing these numbers. Your changes look good to me and our internal testing all passed.

@TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Nov 11, 2021

A second review would be good.

/reviewers 2

@openjdk
Copy link

@openjdk openjdk bot commented Nov 11, 2021

⚠️ @fg1417 the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout fg208048
$ git commit -c user.name='Preferred Full Name' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

@openjdk openjdk bot commented Nov 11, 2021

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8275317: AArch64: Support some type conversion vectorization in SLP

Reviewed-by: thartmann, ngasson

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 259 new commits pushed to the master branch:

  • 23e5117: 8276559: (httpclient) Consider adding an HttpRequest.Builder.HEAD method to build a HEAD request.
  • a77d8dd: 8276787: Improve warning messages for -XX:+RecordDynamicDumpInfo
  • 8ed384c: 8276609: Document setting property jdk.serialFilter to an invalid value throws ExceptionInInitializerError
  • cddc6ce: 8275811: Incorrect instance to dispose
  • b0a463f: 8169468: NoResizeEventOnDMChangeTest.java fails because FS Window didn't receive all resizes!
  • e5ffdf9: 8276231: ciReplay: SIGSEGV when replay compiling lambdas
  • d5e47d6: 8277089: Use system binutils to build hsdis
  • f3eb501: 8276162: Optimise unsigned comparison pattern
  • 9a9a157: 8276905: Use appropriate macosx_version_minimum value while compiling metal shaders
  • 7906eb0: 8277119: Add asserts in GenericTaskQueueSet methods
  • ... and 249 more: https://git.openjdk.java.net/jdk/compare/9a3e9542997860de79d07a4411b1007e9cd5c348...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann, @nick-arm) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Nov 11, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 11, 2021

@TobiHartmann
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

@openjdk openjdk bot removed the ready label Nov 11, 2021
@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 12, 2021

A second review would be good.

/reviewers 2

Thanks for your review, @TobiHartmann . Can I get a second review please?

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Nov 12, 2021

Hi @fg1417 , Thanks for fixing this, I have X86 backend changes for missing same sized vector cast operations. Will push for review after this PR gets integrated.

@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 17, 2021

Hi @nick-arm , can I have your review please?

Copy link
Member

@nick-arm nick-arm left a comment

Looks OK to me.

@openjdk openjdk bot added the ready label Nov 17, 2021
@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 17, 2021

Looks OK to me.

Thanks, @nick-arm

@fg1417
Copy link
Contributor Author

@fg1417 fg1417 commented Nov 17, 2021

/integrate

@openjdk openjdk bot added the sponsor label Nov 17, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 17, 2021

@fg1417
Your change (at version 8b9394d) is now ready to be sponsored by a Committer.

@TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Nov 17, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Nov 17, 2021

Going to push as commit 9aa30de.
Since your change was applied there have been 260 commits pushed to the master branch:

  • 08f65a5: 8277313: Validate header failed for test/jdk/java/net/httpclient/HeadTest.java
  • 23e5117: 8276559: (httpclient) Consider adding an HttpRequest.Builder.HEAD method to build a HEAD request.
  • a77d8dd: 8276787: Improve warning messages for -XX:+RecordDynamicDumpInfo
  • 8ed384c: 8276609: Document setting property jdk.serialFilter to an invalid value throws ExceptionInInitializerError
  • cddc6ce: 8275811: Incorrect instance to dispose
  • b0a463f: 8169468: NoResizeEventOnDMChangeTest.java fails because FS Window didn't receive all resizes!
  • e5ffdf9: 8276231: ciReplay: SIGSEGV when replay compiling lambdas
  • d5e47d6: 8277089: Use system binutils to build hsdis
  • f3eb501: 8276162: Optimise unsigned comparison pattern
  • 9a9a157: 8276905: Use appropriate macosx_version_minimum value while compiling metal shaders
  • ... and 250 more: https://git.openjdk.java.net/jdk/compare/9a3e9542997860de79d07a4411b1007e9cd5c348...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 17, 2021
@openjdk openjdk bot added integrated and removed ready labels Nov 17, 2021
@openjdk openjdk bot removed rfr sponsor labels Nov 17, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 17, 2021

@TobiHartmann @fg1417 Pushed as commit 9aa30de.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler integrated
4 participants