-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms #9748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/label add hotspot-compiler-dev |
|
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
|
@jatin-bhateja |
Webrevs
|
|
I can run some testing in our system once you resolved the merge conflicts. |
|
Testing in our system did not show any failures but I see that there are SIGILL failures in the pre-submit testing. |
|
Could you please enable the compiler/vectorapi/VectorFPtoIntCastTest.java for AVX2 platforms? |
I have added missing casting cases AVX/AVX2 and AVX512 targets in existing comprehensive test for casting |
src/hotspot/cpu/x86/matcher_x86.hpp
Outdated
| return 0; | ||
| case Op_VectorCastF2X: // fall through | ||
| case Op_VectorCastD2X: { | ||
| return is_subword_type(ety) ? 75 : 70; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment here and in other cases explaining numbers. Is it size of instructions or elements or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value now matches the one for RoundV[FD] IR nodes, currently, its a rudimentary heuristic based on emitted code size for complex IR nodes used by unroll policy. Idea is to constrain unrolling factor and prevent generating bloated loop bodies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add this information/clarification to this method's comment at line 186.
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I will test it.
You need second review.
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My testing passed.
|
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 19 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still going through the c2_MacroAssembler_x86.cpp changes. Hopefully early next week will finish the review.
src/hotspot/cpu/x86/matcher_x86.hpp
Outdated
| return 0; | ||
| case Op_VectorCastF2X: // fall through | ||
| case Op_VectorCastD2X: { | ||
| return is_subword_type(ety) ? 35 : 30; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be more selective. It is not that in all cases F2X and D2X need lot of instructions e.g. F2D, D2F are single instruction.
|
|
||
| void C2_MacroAssembler::vector_castF2L_evex(XMMRegister dst, XMMRegister src, XMMRegister xtmp1, XMMRegister xtmp2, | ||
| KRegister ktmp1, KRegister ktmp2, AddressLiteral double_sign_flip, | ||
| Register rscratch, int vec_enc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an assert here:
assert(rscratch != noreg || always_reachable(double_sign_flip), "missing");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sviswa7, assertions are part of leaf level macro assembly routine which is vector_cast_float_to_long_special_cases_evex in this case.
|
@jatin-bhateja Rest of the changes look good to me. Mainly the vector_op_pre_select_sz_estimate() needs to be corrected. |
|
@jatin-bhateja, please merge latest JDK and I will start re-testing. |
Hi @kvn, kindly regress the changes. |
|
I started new testing |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My testing passed.
|
/integrate |
|
Going to push as commit 2ceb80c.
Your commit was automatically rebased without conflicts. |
|
@jatin-bhateja Pushed as commit 2ceb80c. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi All,
This patch extends conversion optimizations added with JDK-8287835 to optimize following floating point to integral conversions for X86 AVX2 targets:-
In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature.
Following are the JMH micro performance results with and without patch.
System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz)
Kindly review and share your feedback.
Best Regards,
Jatin
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748$ git checkout pull/9748Update a local copy of the PR:
$ git checkout pull/9748$ git pull https://git.openjdk.org/jdk pull/9748/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9748View PR using the GUI difftool:
$ git pr show -t 9748Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9748.diff