Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen #874

Closed
wants to merge 2 commits into from

Conversation

@DamonFool
Copy link
Member

@DamonFool DamonFool commented Oct 27, 2020

Hi all,

Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode.

It would be better to fix the following cases:

  1. instruct mul2L_reg
    The code-gen logic uses phaddd [2], which requires legacy mode here [3].
    This bug might be reproduced on AVX512 machines without avx512dq.

  2. instruct vmul4L_reg_avx
    The code-gen logic uses vphaddd [4], which requires legacy mode here [5].
    This bug might be reproduced on AVX512 machines without avx512dq.

  3. instruct reductionL
    For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6]
    vpmullq require legacy mode [7] if avx512dq isn't supported.
    This bug might be reproduced on AVX512 machines without avx512dq.

  4. instruct reductionB
    For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8]
    pminsb require legacy mode [9] if avx512bw isn't supported.
    This bug might be reproduced on AVX512 machines without avx512bw.


Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq.
And bug in reductionB can be only reproduced on AVX512 machines without avx512bw.

Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw.
However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future.

The fix just changes vec to legVec, which is quite safe in theory.

As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something).

Testing:

  • jdk/incubator/vector on both AVX512 and AVX256 machines

Any comments?

Thanks a lot.
Best regards,
Jie

[1] #791 (comment)
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217
[4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497
[5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165
[6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521
[7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428
[8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482
[9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x32 Linux x64 Windows x64 macOS x64
Build ✔️ (1/1 passed) ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed)

Issue

  • JDK-8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/874/head:pull/874
$ git checkout pull/874

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 27, 2020

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 27, 2020

/issue add JDK-8255438
/test
/label add hotspot-compiler
/cc hotspot-compiler

@openjdk openjdk bot added the rfr label Oct 27, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 27, 2020

@DamonFool This issue is referenced in the PR title - it will now be updated.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 27, 2020

@DamonFool
The hotspot-compiler label was successfully added.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 27, 2020

@DamonFool The hotspot-compiler label was already applied.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 27, 2020

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Good. Thank you for cleaning this up.
Please, someone in Oracle runs Mach5 testing with UseAVX=3.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 27, 2020

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen

Reviewed-by: kvn, vlivanov, azeemj

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 29 new commits pushed to the master branch:

  • 0425889: 8255429: Remove C2-based profiling
  • aaf4f69: 8255233: InterpreterRuntime::at_unwind should be a JRT_LEAF
  • bbf0a31: 8255397: x86: coalesce reference and int entry points into vtos bytecodes
  • 3bd5b80: 8243583: Change 'final' error checks to throw ICCE
  • 1f00c3b: 8255527: Shenandoah: Let ShenadoahGCStateResetter disable barriers
  • 3c4fc79: 8255299: Drop explicit zeroing at instantiation of Atomic* objects
  • 6b2d11b: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic
  • 591e7e2: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210
  • 2c9dfc7: 8255234: ZGC: Bulk allocate forwarding data structures
  • b7d483c: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer
  • ... and 19 more: https://git.openjdk.java.net/jdk/compare/d735f919195cd45d507cd228872b379d97072800...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 27, 2020
@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 27, 2020

Good. Thank you for cleaning this up.
Please, someone in Oracle runs Mach5 testing with UseAVX=3.

Thanks @vnkozlov for your review.
Hope experts from Intel (@sviswa7 , @jatin-bhateja , etc.) can also take a look at this.
Thanks.

@iwanowww
Copy link

@iwanowww iwanowww commented Oct 28, 2020

From correctness perspective, the fix looks good.
Xeon Phi CPU family doesn't support BW/DQ extensions.

The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 28, 2020

From correctness perspective, the fix looks good.
Xeon Phi CPU family doesn't support BW/DQ extensions.

The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high.

Thanks @iwanowww for your review.

reductionL_avx512dq and reductionB_avx512bw have been added for your concerns.
Any comments?
Thanks.

Copy link

@iwanowww iwanowww left a comment

Looks good.

At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 28, 2020

Looks good.

At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint.

Sounds great!
Thanks @iwanowww .

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 28, 2020

Hi @vnkozlov ,
Are you still OK with the updated fix?
Thanks.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Oct 28, 2020

@vnkozlov @DamonFool i am running some tests, and will report results when done.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Okay.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Oct 28, 2020

@vnkozlov @DamonFool tests passed.

@DamonFool
Copy link
Member Author

@DamonFool DamonFool commented Oct 28, 2020

Thanks @vnkozlov , @PaulSandoz and @AzeemJiva .
/integrate

@openjdk openjdk bot closed this Oct 28, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 28, 2020
@DamonFool DamonFool deleted the JDK-8255438 branch Oct 28, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2020

@DamonFool Since your change was applied there have been 41 commits pushed to the master branch:

  • 1a5e6c9: 8253101: Clean up CallStaticJavaNode EA flags
  • a7595b2: 8250669: Running JMH micros is broken after JDK-8248135
  • edd1988: 8255530: Additional cleanup after JDK-8235710 (elliptic curve removal)
  • 790d6e2: 8255533: Incorrect javadoc in DateTimeFormatterBuilder.appendPattern() for 'uu'/'yy'
  • 3f20612: 8255555: Bad copyright headers in SocketChannelCompare.java SocketChannelConnectionSetup.java UnixSocketChannelReadWrite.java
  • 42fc158: 8253939: [TESTBUG] Increase coverage of the cgroups detection code
  • 01eb690: 8255554: Bad copyright header in AbstractFileSystemProvider.java
  • 1215b1a: 8255457: Shenandoah: cleanup ShenandoahMarkTask
  • af33e16: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s
  • 8ad7f38: 8255014: Record Classes javax.lang.model changes, follow-up
  • ... and 31 more: https://git.openjdk.java.net/jdk/compare/d735f919195cd45d507cd228872b379d97072800...master

Your commit was automatically rebased without conflicts.

Pushed as commit d82a6dc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants