Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8269725: AArch64: Add VectorMask query implementation for NEON #4699

Closed
wants to merge 7 commits into from

Conversation

XiaohongGong
Copy link
Contributor

@XiaohongGong XiaohongGong commented Jul 7, 2021

The VectorMask query (trueCount, firstTrue, lastTrue) APIs can be intrinsified after [1] is closed. This patch adds the Arm NEON backend implementation for the new added vector nodes.

Here is the performance comparison data for the three APIs with and without this patch:

Benchmark                                        (bits) (inputs) Before       After      Gain  Units
MaskQueryOperationsBenchmark.testFirstTrueByte    128      1    42583.141   103900.253   2.44  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueByte    128      2    37158.470   108234.110   2.91  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueByte    128      3    42583.584   108235.231   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      1    42583.625   108236.859   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      2    42583.288   107368.205   2.52  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueInt     128      3    42583.673   108232.371   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      1    42583.408   108232.617   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      2    42583.443   107367.035   2.52  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueLong    128      3    42583.111   108236.036   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      1    42583.536   108230.365   2.54  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      2    41231.639   108239.148   2.62  ops/ms
MaskQueryOperationsBenchmark.testFirstTrueShort   128      3    42583.630   108238.542   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      1    42584.067   108238.989   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      2    36845.596   108234.297   2.94  ops/ms
MaskQueryOperationsBenchmark.testLastTrueByte     128      3    42583.759   108237.501   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      1    42583.319   108236.218   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      2    42583.112   108234.516   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueInt      128      3    42583.340   108238.777   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      1    42581.004   108233.701   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      2    42583.266   108238.323   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueLong     128      3    42583.542   108234.327   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      1    42583.552   108238.011   2.54  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      2    41231.142   108237.919   2.63  ops/ms
MaskQueryOperationsBenchmark.testLastTrueShort    128      3    44784.270   108238.011   2.42  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      1    37075.556   108233.571   2.92  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      2    37527.370   108233.396   2.88  ops/ms
MaskQueryOperationsBenchmark.testTrueCountByte    128      3    36585.788   107372.032   2.93  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      1    42583.608   108233.721   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      2    42584.733   107369.578   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountInt     128      3    42583.623   107367.859   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      1    42583.671   107368.004   2.52  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      2    42583.661   108233.301   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountLong    128      3    42583.015   108232.783   2.54  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      1    41229.280   108233.369   2.63  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      2    41231.914   107366.904   2.60  ops/ms
MaskQueryOperationsBenchmark.testTrueCountShort   128      3    41231.734   108233.606   2.63  ops/ms

All VectorAPI jtreg tests pass with patch [2] is applied together.

[1] https://bugs.openjdk.java.net/browse/JDK-8256973
[2] openjdk/jdk17#168

Tested tier1 and jdk:tier3.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8269725: AArch64: Add VectorMask query implementation for NEON

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4699/head:pull/4699
$ git checkout pull/4699

Update a local copy of the PR:
$ git checkout pull/4699
$ git pull https://git.openjdk.java.net/jdk pull/4699/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4699

View PR using the GUI difftool:
$ git pr show -t 4699

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4699.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jul 7, 2021

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Jul 7, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jul 7, 2021

@XiaohongGong The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler label Jul 7, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jul 7, 2021

Webrevs

@XiaohongGong
Copy link
Contributor Author

@XiaohongGong XiaohongGong commented Jul 8, 2021

Hi there, could anyone please take a look at this PR? Thanks so much!

@XiaohongGong
Copy link
Contributor Author

@XiaohongGong XiaohongGong commented Jul 14, 2021

Hi @theRealAph , all your significant comments have been addressed. Would you mind taking a look at this PR again? Thanks so much!

Copy link
Contributor

@theRealAph theRealAph left a comment

It's looking good, just a few minor issues.

src/hotspot/cpu/aarch64/aarch64.ad Show resolved Hide resolved
src/hotspot/cpu/aarch64/aarch64_neon.ad Show resolved Hide resolved
src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 Outdated Show resolved Hide resolved
Copy link
Contributor

@theRealAph theRealAph left a comment

OK, thanks.

@openjdk
Copy link

@openjdk openjdk bot commented Jul 15, 2021

@XiaohongGong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8269725: AArch64: Add VectorMask query implementation for NEON

Reviewed-by: aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 12 new commits pushed to the master branch:

  • 7240d67: Merge
  • e104ded: 8268635: Corrupt oop in ClassLoaderData
  • a32d2ee: 8269276: Additional tests for MessageDigest with different providers
  • bb82005: 8270468: TestRangeCheckEliminated fails because methods are not compiled
  • 057992f: 8269387: jpackage --add-launcher should have option to not create shortcuts for additional launchers
  • 746fe5d: 8270366: C2: Add associative rule to add/sub node
  • 1f995e5: 8265888: StandardJavaFileManager::setLocationForModule specification misses 'Implementation Requirements:'
  • c962e6e: 8261006: 'super' qualified method references cannot occur in a static context
  • 99d7f9a: 8264908: Investigate adding BOT range check in G1BlockOffsetTablePart::block_at_or_preceding
  • e92e2fd: 8270517: Add Zero support for LoongArch
  • ... and 2 more: https://git.openjdk.java.net/jdk/compare/1ebd9469db1adada9f5ad41f8599e9458da58399...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Jul 15, 2021
@XiaohongGong
Copy link
Contributor Author

@XiaohongGong XiaohongGong commented Jul 16, 2021

/integrate

@XiaohongGong
Copy link
Contributor Author

@XiaohongGong XiaohongGong commented Jul 16, 2021

Thanks for the review @theRealAph !

@openjdk openjdk bot added the sponsor label Jul 16, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jul 16, 2021

@XiaohongGong
Your change (at version b631047) is now ready to be sponsored by a Committer.

nsjian
nsjian approved these changes Jul 16, 2021
@nsjian
Copy link

@nsjian nsjian commented Jul 16, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Jul 16, 2021

Going to push as commit ea77ef8.
Since your change was applied there have been 12 commits pushed to the master branch:

  • 7240d67: Merge
  • e104ded: 8268635: Corrupt oop in ClassLoaderData
  • a32d2ee: 8269276: Additional tests for MessageDigest with different providers
  • bb82005: 8270468: TestRangeCheckEliminated fails because methods are not compiled
  • 057992f: 8269387: jpackage --add-launcher should have option to not create shortcuts for additional launchers
  • 746fe5d: 8270366: C2: Add associative rule to add/sub node
  • 1f995e5: 8265888: StandardJavaFileManager::setLocationForModule specification misses 'Implementation Requirements:'
  • c962e6e: 8261006: 'super' qualified method references cannot occur in a static context
  • 99d7f9a: 8264908: Investigate adding BOT range check in G1BlockOffsetTablePart::block_at_or_preceding
  • e92e2fd: 8270517: Add Zero support for LoongArch
  • ... and 2 more: https://git.openjdk.java.net/jdk/compare/1ebd9469db1adada9f5ad41f8599e9458da58399...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 16, 2021
@openjdk openjdk bot added integrated and removed ready rfr sponsor labels Jul 16, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jul 16, 2021

@nsjian @XiaohongGong Pushed as commit ea77ef8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@XiaohongGong XiaohongGong deleted the JDK-8269725 branch Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler integrated
3 participants