-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into |
/issue add JDK-8255438 |
@DamonFool This issue is referenced in the PR title - it will now be updated. |
@DamonFool |
@DamonFool The |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. Thank you for cleaning this up.
Please, someone in Oracle runs Mach5 testing with UseAVX=3.
@DamonFool This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 29 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks @vnkozlov for your review. |
From correctness perspective, the fix looks good. The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high. |
Thanks @iwanowww for your review. reductionL_avx512dq and reductionB_avx512bw have been added for your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint.
Sounds great! |
Hi @vnkozlov , |
@vnkozlov @DamonFool i am running some tests, and will report results when done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
@vnkozlov @DamonFool tests passed. |
Thanks @vnkozlov , @PaulSandoz and @AzeemJiva . |
@DamonFool Since your change was applied there have been 41 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit d82a6dc. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi all,
Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode.
It would be better to fix the following cases:
instruct mul2L_reg
The code-gen logic uses phaddd [2], which requires legacy mode here [3].
This bug might be reproduced on AVX512 machines without avx512dq.
instruct vmul4L_reg_avx
The code-gen logic uses vphaddd [4], which requires legacy mode here [5].
This bug might be reproduced on AVX512 machines without avx512dq.
instruct reductionL
For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6]
vpmullq require legacy mode [7] if avx512dq isn't supported.
This bug might be reproduced on AVX512 machines without avx512dq.
instruct reductionB
For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8]
pminsb require legacy mode [9] if avx512bw isn't supported.
This bug might be reproduced on AVX512 machines without avx512bw.
Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq.
And bug in reductionB can be only reproduced on AVX512 machines without avx512bw.
Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw.
However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future.
The fix just changes vec to legVec, which is quite safe in theory.
As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something).
Testing:
Any comments?
Thanks a lot.
Best regards,
Jie
[1] #791 (comment)
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472
[3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217
[4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497
[5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165
[6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521
[7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428
[8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482
[9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475
Progress
Testing
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/874/head:pull/874
$ git checkout pull/874