Skip to content

8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen #874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

DamonFool
Copy link
Member

@DamonFool DamonFool commented Oct 27, 2020

Hi all,

Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode.

It would be better to fix the following cases:

  1. instruct mul2L_reg
    The code-gen logic uses phaddd [2], which requires legacy mode here [3].
    This bug might be reproduced on AVX512 machines without avx512dq.

  2. instruct vmul4L_reg_avx
    The code-gen logic uses vphaddd [4], which requires legacy mode here [5].
    This bug might be reproduced on AVX512 machines without avx512dq.

  3. instruct reductionL
    For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6]
    vpmullq require legacy mode [7] if avx512dq isn't supported.
    This bug might be reproduced on AVX512 machines without avx512dq.

  4. instruct reductionB
    For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8]
    pminsb require legacy mode [9] if avx512bw isn't supported.
    This bug might be reproduced on AVX512 machines without avx512bw.


Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq.
And bug in reductionB can be only reproduced on AVX512 machines without avx512bw.

Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw.
However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future.

The fix just changes vec to legVec, which is quite safe in theory.

As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something).

Testing:

  • jdk/incubator/vector on both AVX512 and AVX256 machines

Any comments?

Thanks a lot.
Best regards,
Jie

[1] #791 (comment)
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217
[4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497
[5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165
[6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521
[7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428
[8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482
[9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x32 Linux x64 Windows x64 macOS x64
Build ✔️ (1/1 passed) ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed)

Issue

  • JDK-8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/874/head:pull/874
$ git checkout pull/874

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 27, 2020

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@DamonFool
Copy link
Member Author

/issue add JDK-8255438
/test
/label add hotspot-compiler
/cc hotspot-compiler

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 27, 2020
@openjdk
Copy link

openjdk bot commented Oct 27, 2020

@DamonFool This issue is referenced in the PR title - it will now be updated.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 27, 2020
@openjdk
Copy link

openjdk bot commented Oct 27, 2020

@DamonFool
The hotspot-compiler label was successfully added.

@openjdk
Copy link

openjdk bot commented Oct 27, 2020

@DamonFool The hotspot-compiler label was already applied.

@mlbridge
Copy link

mlbridge bot commented Oct 27, 2020

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Thank you for cleaning this up.
Please, someone in Oracle runs Mach5 testing with UseAVX=3.

@openjdk
Copy link

openjdk bot commented Oct 27, 2020

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen

Reviewed-by: kvn, vlivanov, azeemj

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 29 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 27, 2020
@DamonFool
Copy link
Member Author

Good. Thank you for cleaning this up.
Please, someone in Oracle runs Mach5 testing with UseAVX=3.

Thanks @vnkozlov for your review.
Hope experts from Intel (@sviswa7 , @jatin-bhateja , etc.) can also take a look at this.
Thanks.

@iwanowww
Copy link
Contributor

From correctness perspective, the fix looks good.
Xeon Phi CPU family doesn't support BW/DQ extensions.

The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high.

@DamonFool
Copy link
Member Author

From correctness perspective, the fix looks good.
Xeon Phi CPU family doesn't support BW/DQ extensions.

The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high.

Thanks @iwanowww for your review.

reductionL_avx512dq and reductionB_avx512bw have been added for your concerns.
Any comments?
Thanks.

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint.

@DamonFool
Copy link
Member Author

Looks good.

At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint.

Sounds great!
Thanks @iwanowww .

@DamonFool
Copy link
Member Author

Hi @vnkozlov ,
Are you still OK with the updated fix?
Thanks.

@PaulSandoz
Copy link
Member

@vnkozlov @DamonFool i am running some tests, and will report results when done.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@PaulSandoz
Copy link
Member

@vnkozlov @DamonFool tests passed.

@DamonFool
Copy link
Member Author

Thanks @vnkozlov , @PaulSandoz and @AzeemJiva .
/integrate

@openjdk openjdk bot closed this Oct 28, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 28, 2020
@DamonFool DamonFool deleted the JDK-8255438 branch October 28, 2020 23:03
@openjdk
Copy link

openjdk bot commented Oct 28, 2020

@DamonFool Since your change was applied there have been 41 commits pushed to the master branch:

  • 1a5e6c9: 8253101: Clean up CallStaticJavaNode EA flags
  • a7595b2: 8250669: Running JMH micros is broken after JDK-8248135
  • edd1988: 8255530: Additional cleanup after JDK-8235710 (elliptic curve removal)
  • 790d6e2: 8255533: Incorrect javadoc in DateTimeFormatterBuilder.appendPattern() for 'uu'/'yy'
  • 3f20612: 8255555: Bad copyright headers in SocketChannelCompare.java SocketChannelConnectionSetup.java UnixSocketChannelReadWrite.java
  • 42fc158: 8253939: [TESTBUG] Increase coverage of the cgroups detection code
  • 01eb690: 8255554: Bad copyright header in AbstractFileSystemProvider.java
  • 1215b1a: 8255457: Shenandoah: cleanup ShenandoahMarkTask
  • af33e16: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s
  • 8ad7f38: 8255014: Record Classes javax.lang.model changes, follow-up
  • ... and 31 more: https://git.openjdk.java.net/jdk/compare/d735f919195cd45d507cd228872b379d97072800...master

Your commit was automatically rebased without conflicts.

Pushed as commit d82a6dc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants