8304450: [vectorapi] Refactor VectorShuffle implementation#13093
8304450: [vectorapi] Refactor VectorShuffle implementation#13093merykitty wants to merge 16 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back qamai! A progress list of the required criteria for merging this PR into |
|
@merykitty The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
PaulSandoz
left a comment
There was a problem hiding this comment.
That looks like a very good simplification and performance enhancement, and removing a limitation, the byte[] representation. This should likely also help with Valhalla integration.
IIUC it has the same upper bound limitation for vector lengths greater than the maximum index size that can be represented as a lane element (although in practice there may not be any hardware where this can occur). Which is fine, i am not suggesting we try and fix this.
Perhaps it may be possible to move some methods on the concrete implementations to the abstract implementations as helper methods or template methods, thereby reducing the amount of generated code? It seems so in some cases, but i did not look very closely. It may require the introduction of an an element type specific abstract shuffle, and if that's the case it may not be worth it.
--
Relatedly, i would be interested in your opinion on the following. One annoyance in the API which propagates down into the implementation is VectorShuffle<E> and VectorMask<E> have E that is the lane element type. But, in theory they should not need E, and any shuffle or mask with the same lanes as the vector being operated on should be compatible, and it's an implementation detail of the shuffle/mask how its state represented as a hardware register. However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible.
|
Yes I will try to polish the patch more after finding the cause of the failure in x86_32. The failure is strange, though, it does not occur on x86_64 for some reasons.
Yes I agree, a shuffle merely contains the lane indices while a mask is an array of boolean, it would be a good cleanup to remove
Note that generics are erased, so from the VM point of view, a Thanks a lot. |
Yes, that's the easy bit :-) The mask implementation is specialized by the species of vectors it operates on, but does it have to be and can we make it independent of the species and bind to the lane count? Then the user does not need to explicitly cast from and to species that have the same lane count, which means we can remove the VectorMask::cast method (since it already throws if the lane counts are not equal). |
|
I have moved most of the methods to
Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. However, this information does not have to be visible to the API, similar to how we currently handle the vector length, we can have What do you think? |
The Java changes look good to me. I need to have another look, but will not be able to do so until next week. |
|
/label hotspot-compiler |
|
@merykitty |
Yes, the way you have implemented shuffle is tightly connected, that looks ok. I am wondering if we can make the mask implementation more loosely coupled and modified such that it does not have to take into consideration the element type (or species) of the vector it operates on, and instead compatibility is based solely on the lane count. Ideally it would be good to change the What you propose seems a possible a interim step towards a more preferable API, if the performance is good. |
|
@PaulSandoz As some hardware does differentiate masks based on element type, at some point we have to differentiate between them. From a design point of view, they are both implementation details so there might be no consideration regarding the API. On the other hand, having more in the Java side seems to be more desirable, as it does illustrate the operations more intuitively compared to the graph management in C2. Another important point I can think of is that having a constant shape for a Java class would help us in implementing the vector calling convention, as we can rely on the class information instead of some side channels. As a result, I think I do prefer the current class hierarchy. |
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java
Show resolved
Hide resolved
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template
Show resolved
Hide resolved
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template
Show resolved
Hide resolved
PaulSandoz
left a comment
There was a problem hiding this comment.
The changes look very good to me. It would be useful if @jatin-bhateja could also take a look.
|
@merykitty This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 155 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
Vector API tests pass on AArch64 platforms (NEON & SVE). So looks good to me! Please do not forget to update the copyright for two additional touched files |
|
Thanks @PaulSandoz and @XiaohongGong for the reviews and testings. |
Running tier2/3 tests. |
|
Tier 2/3 tests passed. |
|
Thanks, may I integrate the changes now? |
You might need another HotSpot reviewer? @vnkozlov is that correct? |
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java
Show resolved
Hide resolved
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java
Show resolved
Hide resolved
iwanowww
left a comment
There was a problem hiding this comment.
Nice refactoring! Happy to see so much code gone.
Looks good.
|
@jatin-bhateja @iwanowww Thanks a lot for your approvals, I will integrate the patch |
|
/integrate |
|
Going to push as commit e846a1d.
Your commit was automatically rebased without conflicts. |
|
@merykitty Pushed as commit e846a1d. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi,
This patch reimplements
VectorShuffleimplementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:rearrangeoperations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing therearrangeoperations.VectorShuffle::toVector, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.Upon these changes, a
rearrangecan emit more efficient code:Please take a look and leave reviews. Thanks a lot.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093$ git checkout pull/13093Update a local copy of the PR:
$ git checkout pull/13093$ git pull https://git.openjdk.org/jdk.git pull/13093/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13093View PR using the GUI difftool:
$ git pr show -t 13093Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13093.diff
Webrev
Link to Webrev Comment