Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8304450: [vectorapi] Refactor VectorShuffle implementation #13093

Closed
wants to merge 16 commits into from

Conversation

merykitty
Copy link
Member

@merykitty merykitty commented Mar 19, 2023

Hi,

This patch reimplements VectorShuffle implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks:

  1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically.
  2. Redundant expansions in rearrange operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the rearrange operations.
  3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler.
  4. Range checks are performed using VectorShuffle::toVector, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones.

Upon these changes, a rearrange can emit more efficient code:

var species = IntVector.SPECIES_128;
var v1 = IntVector.fromArray(species, SRC1, 0);
var v2 = IntVector.fromArray(species, SRC2, 0);
v1.rearrange(v2.toShuffle()).intoArray(DST, 0);

Before:
movabs $0x751589fa8,%r10            ;   {oop([I{0x0000000751589fa8})}
vmovdqu 0x10(%r10),%xmm2
movabs $0x7515a0d08,%r10            ;   {oop([I{0x00000007515a0d08})}
vmovdqu 0x10(%r10),%xmm1
movabs $0x75158afb8,%r10            ;   {oop([I{0x000000075158afb8})}
vmovdqu 0x10(%r10),%xmm0
vpand  -0xddc12(%rip),%xmm0,%xmm0        # Stub::vector_int_to_byte_mask
                                                        ;   {external_word}
vpackusdw %xmm0,%xmm0,%xmm0
vpackuswb %xmm0,%xmm0,%xmm0
vpmovsxbd %xmm0,%xmm3
vpcmpgtd %xmm3,%xmm1,%xmm3
vtestps %xmm3,%xmm3
jne    0x00007fc2acb4e0d8
vpmovzxbd %xmm0,%xmm0
vpermd %ymm2,%ymm0,%ymm0
movabs $0x751588f98,%r10            ;   {oop([I{0x0000000751588f98})}
vmovdqu %xmm0,0x10(%r10)

After:
movabs $0x751589c78,%r10            ;   {oop([I{0x0000000751589c78})}
vmovdqu 0x10(%r10),%xmm1
movabs $0x75158ac88,%r10            ;   {oop([I{0x000000075158ac88})}
vmovdqu 0x10(%r10),%xmm2
vpxor  %xmm0,%xmm0,%xmm0
vpcmpgtd %xmm2,%xmm0,%xmm3
vtestps %xmm3,%xmm3
jne    0x00007fa818b27cb1
vpermd %ymm1,%ymm2,%ymm0
movabs $0x751588c68,%r10            ;   {oop([I{0x0000000751588c68})}
vmovdqu %xmm0,0x10(%r10)

Please take a look and leave reviews. Thanks a lot.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8304450: [vectorapi] Refactor VectorShuffle implementation

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093
$ git checkout pull/13093

Update a local copy of the PR:
$ git checkout pull/13093
$ git pull https://git.openjdk.org/jdk.git pull/13093/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13093

View PR using the GUI difftool:
$ git pr show -t 13093

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13093.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 19, 2023

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 19, 2023
@openjdk
Copy link

openjdk bot commented Mar 19, 2023

@merykitty The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Mar 19, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 19, 2023

Webrevs

@merykitty merykitty marked this pull request as draft March 19, 2023 13:40
@openjdk openjdk bot removed the rfr Pull request is ready for review label Mar 19, 2023
Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a very good simplification and performance enhancement, and removing a limitation, the byte[] representation. This should likely also help with Valhalla integration.

IIUC it has the same upper bound limitation for vector lengths greater than the maximum index size that can be represented as a lane element (although in practice there may not be any hardware where this can occur). Which is fine, i am not suggesting we try and fix this.

Perhaps it may be possible to move some methods on the concrete implementations to the abstract implementations as helper methods or template methods, thereby reducing the amount of generated code? It seems so in some cases, but i did not look very closely. It may require the introduction of an an element type specific abstract shuffle, and if that's the case it may not be worth it.

--

Relatedly, i would be interested in your opinion on the following. One annoyance in the API which propagates down into the implementation is VectorShuffle<E> and VectorMask<E> have E that is the lane element type. But, in theory they should not need E, and any shuffle or mask with the same lanes as the vector being operated on should be compatible, and it's an implementation detail of the shuffle/mask how its state represented as a hardware register. However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible.

@merykitty
Copy link
Member Author

Yes I will try to polish the patch more after finding the cause of the failure in x86_32. The failure is strange, though, it does not occur on x86_64 for some reasons.

One annoyance in the API which propagates down into the implementation is VectorShuffle<E> and VectorMask<E> have E that is the lane element type.

Yes I agree, a shuffle merely contains the lane indices while a mask is an array of boolean, it would be a good cleanup to remove E from the interface.

However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible.

Note that generics are erased, so from the VM point of view, a VectorMask<E> and a VectorMask is indifferent. As a result, removing the type parameter should not have any impact on the VM. Some details may have to change though, as element types are removed, a mask or shuffle would only be validated in accordance to its length, and we need to insert a cast at use sites. The cast will be removed if it is actually the same species so there is little concern regarding the machine code emitted.

Thanks a lot.

@PaulSandoz
Copy link
Member

Note that generics are erased, so from the VM point of view, a VectorMask<E> and a VectorMask is indifferent.

Yes, that's the easy bit :-) The mask implementation is specialized by the species of vectors it operates on, but does it have to be and can we make it independent of the species and bind to the lane count?

Then the user does not need to explicitly cast from and to species that have the same lane count, which means we can remove the VectorMask::cast method (since it already throws if the lane counts are not equal).

@merykitty
Copy link
Member Author

I have moved most of the methods to AbstractVector and AbstractShuffle, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like Vector<E.integral>. The remaining failure seems to be related to JDK-8304676, so I think this patch is ready for review now.

The mask implementation is specialized by the species of vectors it operates on, but does it have to be

Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. However, this information does not have to be visible to the API, similar to how we currently handle the vector length, we can have class AbstractMask<E> implements VectorMask. As a result, the cast method would be useless and can be removed in the API, but our implementation details would still use it, for example

Vector<E> blend(Vector<E> v, VectorMask w) {
    AbstractMask<?> aw = (AbstractMask<?>) w;
    AbstractMask<E> tw = aw.cast(vspecies());
    return VectorSupport.blend(...);
}

Vector<E> rearrange(VectorShuffle s) {
    AbstractShuffle<?> as = (AbstractShuffle<?>) s;
    AbstractShuffle<E> ts = s.cast(vspecies());
    return VectorSupport.rearrangeOp(...);
}

What do you think?

@merykitty merykitty marked this pull request as ready for review March 21, 2023 16:11
@PaulSandoz
Copy link
Member

I have moved most of the methods to AbstractVector and AbstractShuffle, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like Vector<E.integral>. The remaining failure seems to be related to JDK-8304676, so I think this patch is ready for review now.

The Java changes look good to me. I need to have another look, but will not be able to do so until next week.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 21, 2023
@merykitty
Copy link
Member Author

/label hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 21, 2023
@openjdk
Copy link

openjdk bot commented Mar 21, 2023

@merykitty
The hotspot-compiler label was successfully added.

@PaulSandoz
Copy link
Member

PaulSandoz commented Mar 21, 2023

Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type.

Yes, the way you have implemented shuffle is tightly connected, that looks ok.

I am wondering if we can make the mask implementation more loosely coupled and modified such that it does not have to take into consideration the element type (or species) of the vector it operates on, and instead compatibility is based solely on the lane count.

Ideally it would be good to change the VectorMask::check method to just compare the lanes counts and not require a cast in the implementation, which i presume requires some deeper changes in C2?

What you propose seems a possible a interim step towards a more preferable API, if the performance is good.

@merykitty
Copy link
Member Author

@PaulSandoz As some hardware does differentiate masks based on element type, at some point we have to differentiate between them. From a design point of view, they are both implementation details so there might be no consideration regarding the API. On the other hand, having more in the Java side seems to be more desirable, as it does illustrate the operations more intuitively compared to the graph management in C2. Another important point I can think of is that having a constant shape for a Java class would help us in implementing the vector calling convention, as we can rely on the class information instead of some side channels. As a result, I think I do prefer the current class hierarchy.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look very good to me. It would be useful if @jatin-bhateja could also take a look.

@openjdk
Copy link

openjdk bot commented Mar 31, 2023

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8304450: [vectorapi] Refactor VectorShuffle implementation

Reviewed-by: psandoz, xgong, jbhateja, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 155 new commits pushed to the master branch:

  • 475e9a7: 8305809: (fs) Review obsolete Linux kernel dependency on os.version (Unix kernel 2.6.39)
  • 1de772c: 8294806: jpackaged-app ignores splash screen from jar file
  • d9db906: 8305368: G1 remset chunk claiming may use relaxed memory ordering
  • c789d24: 8305370: Inconsistent use of for_young_only_phase parameter in G1 predictions
  • c6d7cf6: 8305663: Wrong iteration order of pause array in g1MMUTracker
  • ce4b995: 8305761: Resolve multiple definition of 'jvm' when statically linking with JDK native libraries
  • 12946f5: 8305419: JDK-8301995 broke building libgraal
  • 9486969: 8302696: Revert API signature changes made in JDK-8285504 and JDK-8285263
  • 628a3f1: 8304738: UnregisteredClassesTable_lock never created
  • 7a5597c: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects
  • ... and 145 more: https://git.openjdk.org/jdk/compare/d063b8964fbdd6ca1d9527dabb40fed59bbc8ad7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 31, 2023
@XiaohongGong
Copy link

Vector API tests pass on AArch64 platforms (NEON & SVE). So looks good to me! Please do not forget to update the copyright for two additional touched files AbstractSpecies.java and VectorSpecies.java. Thanks!

@merykitty
Copy link
Member Author

Thanks @PaulSandoz and @XiaohongGong for the reviews and testings.

@PaulSandoz
Copy link
Member

Thanks @PaulSandoz and @XiaohongGong for the reviews and testings.

Running tier2/3 tests.

Copy link

@XiaohongGong XiaohongGong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks!

@PaulSandoz
Copy link
Member

Tier 2/3 tests passed.

@merykitty
Copy link
Member Author

Thanks, may I integrate the changes now?

@PaulSandoz
Copy link
Member

Thanks, may I integrate the changes now?

You might need another HotSpot reviewer? @vnkozlov is that correct?

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring! Happy to see so much code gone.

Looks good.

@merykitty
Copy link
Member Author

@jatin-bhateja @iwanowww Thanks a lot for your approvals, I will integrate the patch

@merykitty
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Apr 13, 2023

Going to push as commit e846a1d.
Since your change was applied there have been 167 commits pushed to the master branch:

  • 3f36dd8: 8305529: DefaultProxySelector.select(URI) in certain cases returns a List with null element
  • 425ef06: 8303923: ZipOutStream::putEntry should include an apiNote to indicate that the STORED compression method should be used when writing directory entries
  • 2bbbff2: 8305858: Resolve multiple definition of 'handleSocketError' when statically linking with JDK native libraries
  • bc15163: 8304834: Fix wrapper insertion in TestScaffold.parseArgs(String args[])
  • 19380d7: 8305324: C2: Wrong execution of vectorizing Interger.reverseBytes
  • 87017b5: 8295859: Update Manual Test Groups
  • 99a9dbc: 8305783: x86_64: Optimize AbsI and AbsL
  • d8af7a6: 8304725: AsyncGetCallTrace can cause SIGBUS on M1
  • b9bdbe9: 8305524: AArch64: Fix arraycopy issue on SVE caused by matching rule vmask_gen_sub
  • 82e8b03: 8305203: Simplify trimming operation in Region::Ideal
  • ... and 157 more: https://git.openjdk.org/jdk/compare/d063b8964fbdd6ca1d9527dabb40fed59bbc8ad7...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 13, 2023
@openjdk openjdk bot closed this Apr 13, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 13, 2023
@openjdk
Copy link

openjdk bot commented Apr 13, 2023

@merykitty Pushed as commit e846a1d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@merykitty merykitty deleted the shufflerefactor branch April 27, 2023 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants