Skip to content

Conversation

@e1iu
Copy link
Member

@e1iu e1iu commented Nov 15, 2023

Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to VectorUCastNode by C2. This patch adds backend implementation for VectorUCastNode on AArch64.

The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below:


  Benchmark                     Before     After       Units   Gain
  VectorZeroExtend.byte2Int     3168.251   243012.399  ops/ms  75.70
  VectorZeroExtend.byte2Long    3212.201   216291.588  ops/ms  66.33
  VectorZeroExtend.byte2Short   3391.968   182655.365  ops/ms  52.85
  VectorZeroExtend.int2Long     1012.197    80448.553  ops/ms  78.48
  VectorZeroExtend.short2Int    1812.471   153416.828  ops/ms  83.65
  VectorZeroExtend.short2Long   1788.382   129794.814  ops/ms  71.58

On other Neon systems, we can get similar performance boost as a result of intrinsification success.

Since VectorUCastNode only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages.

[TEST]
compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16670/head:pull/16670
$ git checkout pull/16670

Update a local copy of the PR:
$ git checkout pull/16670
$ git pull https://git.openjdk.org/jdk.git pull/16670/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16670

View PR using the GUI difftool:
$ git pr show -t 16670

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16670.diff

Webrev

Link to Webrev Comment

…ed) casts

Vector API defines zero-extend operations [1], which are going to be
intrinsified and generated to `VectorUCastNode` by C2. This patch adds
backend implementation for VectorUCastNode on AArch64.

The micro benchmark shows significant performance improvement. In my
test machine (SVE, 256-bit), the result is shown as below:

  Benchmark                     Before     After       Units   Gain
  VectorZeroExtend.byte2Int     3168.251   243012.399  ops/ms  75.70
  VectorZeroExtend.byte2Long    3212.201   216291.588  ops/ms  66.33
  VectorZeroExtend.byte2Short   3391.968   182655.365  ops/ms  52.85
  VectorZeroExtend.int2Long     1012.197    80448.553  ops/ms  78.48
  VectorZeroExtend.short2Int    1812.471   153416.828  ops/ms  83.65
  VectorZeroExtend.short2Long   1788.382   129794.814  ops/ms  71.58

On other Neon systems, we can get similar performance boost as a result
of intrinsification success.

Since `VectorUCastNode` only used in Vector API's zero extension
currently, this patch also adds assertion on nodes' definitions to
clarify their usages.

[TEST]
compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE
machines.

[1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726

Change-Id: I10770759f158975ead1eecd3fb63280e563ed5e2
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 15, 2023

👋 Welcome back eliu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 15, 2023
@openjdk
Copy link

openjdk bot commented Nov 15, 2023

@e1iu The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Nov 15, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 15, 2023

Webrevs

%}
ins_pipe(pipe_slow);
%}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following hunk does not seem to be making good use of the macro processor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated m4 file. Please help to review, thanks!

Change-Id: I82bf5f9384f79e09965a0498ad2de45cec6f0a29
// 4B/8B to 4S/8S
assert(dst_vlen_in_bytes == 8 || dst_vlen_in_bytes == 16, "unsupported");
sxtl(dst, T8H, src, T8B);
(this->*ext)(dst, T8H, src, T8B);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing might make this cleaner: I suggest you make _xshll protected rather than private, then here

_xshll(is_unsigned, dst, T8H, src, T8B, 0);

case S:
sve_sunpklo(dst, H, src);
sve_sunpklo(dst, S, dst);
(this->*unpklo)(dst, H, src);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AS above: try making is_unsigned a parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will fix it soon. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler/vectorapi and jdk/incubator/vector passed. Full test is running. I would report the result when it has been finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full jtreg passed without new failure.

Change-Id: Ic19836feb8a73ea7e65443794f2a0eb1363f6e2f
// Signed unpack and extend half of vector - low half
void sve_sunpklo(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) {
_sve_xunpk(/* is_unsigned */ false, /* is_high */ false, Zd, T, Zn);
}
Copy link
Contributor

@theRealAph theRealAph Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code expansion does not look right. You should be able to make this change without so much code expansion.

#define INSN(NAME, unsigned, high)
void name(FloatRegister Zd, SIMD_RegVariant T, FloatRegister Zn) { \
  _sve_xunpk(unsigned, high, T, Zn);                                            \
}
  INSN(sve_uunpkhi, true, true) ...

_sve_xunpk(is_unsigned, /* is_high */ false, dst, H, src);
_sve_xunpk(is_unsigned, /* is_high */ false, dst, S, dst);
_sve_xunpk(is_unsigned, /* is_high */ false, dst, D, dst);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change here? It doesn't do anything. Does it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_unsigned is also used in this function. It is used in VectorUCastNode for zero extending.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

Change-Id: Icfe9619af1c9e7d5ea8cac457ccebb4eec5c34ad
@e1iu e1iu requested a review from theRealAph November 27, 2023 06:59
@e1iu
Copy link
Member Author

e1iu commented Nov 30, 2023

@theRealAph Could you help to take a look? Thanks.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks nice. Thanks.

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@e1iu This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts

Reviewed-by: aph, xgong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 316 new commits pushed to the master branch:

  • 2b00ac0: 8308753: Class-File API transition to Preview
  • b9df827: 8309871: jdk/jfr/api/consumer/recordingstream/TestSetEndTime.java timed out
  • 9498469: 8318983: Fix comment typo in PKCS12Passwd.java
  • 4dcbd13: 8314905: jdk/jfr/tool/TestView.java fails with RuntimeException 'Invoked Concurrent' missing from stdout/stderr
  • 5dee2a3: 8320440: Implementation of Structured Concurrency (Second Preview)
  • 6f7bb79: 8320931: [REDO] dsymutil command leaves around temporary directories
  • 8be3e39: 8320129: "top" command during jtreg failure handler does not display CPU usage on OSX
  • 2f299e4: 8321182: SourceExample.SOURCE_14 comment should refer to 'switch expressions' instead of 'text blocks'
  • 3a09a05: 8313722: JFR: Avoid unnecessary calls to Events.from(Recording)
  • 42af8ce: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10
  • ... and 306 more: https://git.openjdk.org/jdk/compare/4e8c0364a2d3d4b445ff3a0d3da1da079748f05f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 30, 2023
Copy link

@XiaohongGong XiaohongGong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@e1iu
Copy link
Member Author

e1iu commented Dec 4, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Dec 4, 2023

Going to push as commit 9b8eaa2.
Since your change was applied there have been 318 commits pushed to the master branch:

  • b9b8263: 8317307: test/jdk/com/sun/jndi/ldap/LdapPoolTimeoutTest.java fails with ConnectException: Connection timed out: no further information
  • 0d0a657: 5108458: JTable does not properly layout its content
  • 2b00ac0: 8308753: Class-File API transition to Preview
  • b9df827: 8309871: jdk/jfr/api/consumer/recordingstream/TestSetEndTime.java timed out
  • 9498469: 8318983: Fix comment typo in PKCS12Passwd.java
  • 4dcbd13: 8314905: jdk/jfr/tool/TestView.java fails with RuntimeException 'Invoked Concurrent' missing from stdout/stderr
  • 5dee2a3: 8320440: Implementation of Structured Concurrency (Second Preview)
  • 6f7bb79: 8320931: [REDO] dsymutil command leaves around temporary directories
  • 8be3e39: 8320129: "top" command during jtreg failure handler does not display CPU usage on OSX
  • 2f299e4: 8321182: SourceExample.SOURCE_14 comment should refer to 'switch expressions' instead of 'text blocks'
  • ... and 308 more: https://git.openjdk.org/jdk/compare/4e8c0364a2d3d4b445ff3a0d3da1da079748f05f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 4, 2023
@openjdk openjdk bot closed this Dec 4, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 4, 2023
@openjdk
Copy link

openjdk bot commented Dec 4, 2023

@e1iu Pushed as commit 9b8eaa2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants