-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8287835: Add support for additional float/double to integral conversion for x86 #9032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
|
/label remove core-libs |
|
@sviswa7 |
Webrevs
|
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it is support for "vector conversion".
Please, add IR framework test.
src/hotspot/cpu/x86/x86.ad
Outdated
| (is_subword_type(bt) || bt == T_LONG)) { | ||
| return false; | ||
| } | ||
| if ((bt == T_LONG) && !VM_Version::supports_avx512dq()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again overlapping conditions. So T_LONG requires both: AVX512, avx512vl and avx512dq?
What about T_INT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T_INT doesn't need AVX512dq. Float to long conversion (T_LONG) uses evcvttps2qq, which needs AVX512dq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I see that there are 2 instructions to support F2I by using avx or evex encoding. They cover all cases.
No you are introducing sub_integer and long types only for evex encoding.
You need comment that F2I is supported in all cases. For other integral types you need 512vl and additionally 512dq for T_LONG.
Note, you don't need to check (UseAVX <= 2) because avx512vl bit is cleaned in such case. It is the same for VectorCastD2X code.
In such case I suggest:
if (is_subword_type(bt) && !VM_Version::supports_avx512vl() ||
(bt == T_LONG) && !VM_Version::supports_avx512vldq()) {
return false;
}
src/hotspot/cpu/x86/x86.ad
Outdated
| predicate(((VM_Version::supports_avx512vl() || | ||
| Matcher::vector_length_in_bytes(n) == 64)) && | ||
| is_integral_type(Matcher::vector_element_basic_type(n))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need some of these conditions since you have them already in match_rule_supported_vector()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predicate is not correct for all types this instruction is used now: it says that if size is 64 bytes you don't need avx512vl support for all types. Is it true?
All this is very confusing. I suggest to keep original castFtoI_reg_evex() instruction as it was and use new castFtoX_reg_evex() only for T_LONG and sub_integer with new predicate (type != T_INT) and additional conditions if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it was needed to select between the rules. On platforms that don't support avx512vl, we use AVX512 instructions only for 512-bit vectors and AVX instructions for < 64 byte vectors.
src/hotspot/cpu/x86/x86.ad
Outdated
| if (((UseAVX <= 2) || (!VM_Version::supports_avx512vl())) && | ||
| (is_subword_type(bt) || bt == T_INT)) { | ||
| return false; | ||
| } | ||
| if (bt == T_LONG && !VM_Version::supports_avx512dq()) { | ||
| if (is_integral_type(bt) && !VM_Version::supports_avx512dq()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlapping conditions for the same types are confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add comments and rephrase the checks to make it clearer.
src/hotspot/cpu/x86/x86.ad
Outdated
| predicate(((VM_Version::supports_avx512vl() || | ||
| Matcher::vector_length_in_bytes(n) == 64)) && | ||
| is_integral_type(Matcher::vector_element_basic_type(n))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predicate is not correct for all types this instruction is used now: it says that if size is 64 bytes you don't need avx512vl support for all types. Is it true?
All this is very confusing. I suggest to keep original castFtoI_reg_evex() instruction as it was and use new castFtoX_reg_evex() only for T_LONG and sub_integer with new predicate (type != T_INT) and additional conditions if needed.
src/hotspot/cpu/x86/x86.ad
Outdated
| break; | ||
| case Op_VectorCastD2X: | ||
| if (is_subword_type(bt) || bt == T_INT) { | ||
| if (((UseAVX <= 2) || (!VM_Version::supports_avx512vl())) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which asm instructions are required avx512vl? I don't see asserts in assembler_x86.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avx512vl support is needed only for vectors < 512 bit. I have corrected this in the predicate.
|
@vnkozlov I have implemented your review comments. The only item remaining is to add IR framework test. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Will wait IR test before testing and approval.
|
@vnkozlov I have added the IR framework test case. Please take a look. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I will start testing.
|
@sviswa7 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 86 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results are good.
You need second review.
|
@vnkozlov Thanks a lot for the review and test. |
| if (to_elem_bt == T_SHORT) { | ||
| __ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| } else { | ||
| assert(to_elem_bt == T_BYTE, "required"); | ||
| __ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do support F2I cast on AVX2 and that can be extended for sub-word types using
signed saturated lane packing instructions (PACKSSDW and PACKSSWB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will file a separate RFE for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to RFE: https://bugs.openjdk.org/browse/JDK-8288043
src/hotspot/cpu/x86/x86.ad
Outdated
| int vlen_enc = vector_length_encoding(this, $src); | ||
| __ vector_castD2L_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, | ||
| $xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister, | ||
| ExternalAddress(vector_double_signflip()), $scratch$$Register, vlen_enc); | ||
| BasicType to_elem_bt = Matcher::vector_element_basic_type(this); | ||
| if (to_elem_bt != T_LONG) { | ||
| switch(to_elem_bt) { | ||
| case T_INT: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| break; | ||
| case T_SHORT: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| __ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| break; | ||
| case T_BYTE: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| __ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| break; | ||
| default: assert(false, "%s", type2name(to_elem_bt)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to a macro assembly routine named vector_castD2X_evex
src/hotspot/cpu/x86/x86.ad
Outdated
| switch(to_elem_bt) { | ||
| case T_INT: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| break; | ||
| case T_SHORT: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| __ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| break; | ||
| case T_BYTE: | ||
| __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); | ||
| __ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-word handling can be extended for AVX2 using packing instruction sequence similar to VectorStoreMask for quad ward lanes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D2X in general needs AVX 512 due to evcvttpd2qq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sviswa7 , for AVX we can use VCVTTPD2DQ to cast double precison lane to integer and subsequently to sub words lanes. For casting to long we do not have direct instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jatin-bhateja. I have updated the RFE (https://bugs.openjdk.org/browse/JDK-8288043) to include this.
| private static final VectorSpecies<Float> fspec512 = FloatVector.SPECIES_512; | ||
| private static final VectorSpecies<Double> dspec512 = DoubleVector.SPECIES_512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused declarations.
|
|
||
| @Benchmark | ||
| public IntVector microFloat2Int() { | ||
| return (IntVector)fvec512.convertShape(VectorOperators.F2I, IntVector.SPECIES_512, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove explicit cast by setting return type to Vector
Applicable to all cases.
|
@jatin-bhateja I have implemented your review comments. Please take a look. |
|
Thanks @sviswa7 |
|
@vnkozlov Could I go ahead and integrate? There were some minor changes and code rearrangement after your last test. Please let me know. |
|
I submitted new testing. |
|
Thanks a lot Vladimir! |
|
@sviswa7 testing results are good. You can push. |
| for (int i = 0; i < COUNT; i++) { | ||
| float_arr[i] = ran.nextFloat(); | ||
| double_arr[i] = ran.nextDouble(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you kindly also add special floating point values NaN, +/-Inf, +/-0.0 to input array to cover your special handling code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jatin-bhateja The test is only checking the IR node generation for x86.
The rest of the actual functionality test is already covered under the following including the special cases:
compiler/codegen/TestByteDoubleVect.java compiler/codegen/TestByteFloatVect.java compiler/codegen/TestShortFloatVect.java compiler/codegen/TestShortDoubleVect.java compiler/codegen/TestLongFloatVect.java compiler/codegen/TestIntDoubleVect.java compiler/codegen/TestIntFloatVect.java
The general idea of this PR was to complement x86 FP to integral conversion along with https://git.openjdk.org/jdk/pull/7806 from Fei Gao.
|
/integrate |
|
Going to push as commit 2cc40af.
Your commit was automatically rebased without conflicts. |
Currently the C2 JIT only supports float -> int and double -> long conversion for x86.
This PR adds the support for following conversions in the c2 JIT:
float -> long, short, byte
double -> int, short, byte
The performance gain is as follows.
Before the patch:
Benchmark Mode Cnt Score Error Units
VectorFPtoIntCastOperations.microDouble2Byte thrpt 3 32367.971 ± 6161.118 ops/ms
VectorFPtoIntCastOperations.microDouble2Int thrpt 3 25825.251 ± 5417.104 ops/ms
VectorFPtoIntCastOperations.microDouble2Long thrpt 3 59641.958 ± 17307.177 ops/ms
VectorFPtoIntCastOperations.microDouble2Short thrpt 3 29641.505 ± 12023.015 ops/ms
VectorFPtoIntCastOperations.microFloat2Byte thrpt 3 16271.224 ± 1523.083 ops/ms
VectorFPtoIntCastOperations.microFloat2Int thrpt 3 59199.994 ± 14357.959 ops/ms
VectorFPtoIntCastOperations.microFloat2Long thrpt 3 17169.197 ± 1738.273 ops/ms
VectorFPtoIntCastOperations.microFloat2Short thrpt 3 14934.139 ± 2329.253 ops/ms
After the patch:
Benchmark Mode Cnt Score Error Units
VectorFPtoIntCastOperations.microDouble2Byte thrpt 3 115436.659 ± 21282.364 ops/ms
VectorFPtoIntCastOperations.microDouble2Int thrpt 3 87194.395 ± 9443.106 ops/ms
VectorFPtoIntCastOperations.microDouble2Long thrpt 3 59652.356 ± 7240.721 ops/ms
VectorFPtoIntCastOperations.microDouble2Short thrpt 3 110570.719 ± 10401.620 ops/ms
VectorFPtoIntCastOperations.microFloat2Byte thrpt 3 110028.539 ± 11113.137 ops/ms
VectorFPtoIntCastOperations.microFloat2Int thrpt 3 59469.193 ± 18272.495 ops/ms
VectorFPtoIntCastOperations.microFloat2Long thrpt 3 59897.101 ± 7249.268 ops/ms
VectorFPtoIntCastOperations.microFloat2Short thrpt 3 86167.554 ± 8253.232 ops/ms
Please review.
Best Regards,
Sandhya
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/9032/head:pull/9032$ git checkout pull/9032Update a local copy of the PR:
$ git checkout pull/9032$ git pull https://git.openjdk.java.net/jdk pull/9032/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9032View PR using the GUI difftool:
$ git pr show -t 9032Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/9032.diff