Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8287835: Add support for additional float/double to integral conversion for x86 #9032

Closed
wants to merge 7 commits into from

Conversation

sviswa7
Copy link

@sviswa7 sviswa7 commented Jun 4, 2022

Currently the C2 JIT only supports float -> int and double -> long conversion for x86.
This PR adds the support for following conversions in the c2 JIT:
float -> long, short, byte
double -> int, short, byte

The performance gain is as follows.
Before the patch:
Benchmark Mode Cnt Score Error Units
VectorFPtoIntCastOperations.microDouble2Byte thrpt 3 32367.971 ± 6161.118 ops/ms
VectorFPtoIntCastOperations.microDouble2Int thrpt 3 25825.251 ± 5417.104 ops/ms
VectorFPtoIntCastOperations.microDouble2Long thrpt 3 59641.958 ± 17307.177 ops/ms
VectorFPtoIntCastOperations.microDouble2Short thrpt 3 29641.505 ± 12023.015 ops/ms
VectorFPtoIntCastOperations.microFloat2Byte thrpt 3 16271.224 ± 1523.083 ops/ms
VectorFPtoIntCastOperations.microFloat2Int thrpt 3 59199.994 ± 14357.959 ops/ms
VectorFPtoIntCastOperations.microFloat2Long thrpt 3 17169.197 ± 1738.273 ops/ms
VectorFPtoIntCastOperations.microFloat2Short thrpt 3 14934.139 ± 2329.253 ops/ms

After the patch:
Benchmark Mode Cnt Score Error Units
VectorFPtoIntCastOperations.microDouble2Byte thrpt 3 115436.659 ± 21282.364 ops/ms
VectorFPtoIntCastOperations.microDouble2Int thrpt 3 87194.395 ± 9443.106 ops/ms
VectorFPtoIntCastOperations.microDouble2Long thrpt 3 59652.356 ± 7240.721 ops/ms
VectorFPtoIntCastOperations.microDouble2Short thrpt 3 110570.719 ± 10401.620 ops/ms
VectorFPtoIntCastOperations.microFloat2Byte thrpt 3 110028.539 ± 11113.137 ops/ms
VectorFPtoIntCastOperations.microFloat2Int thrpt 3 59469.193 ± 18272.495 ops/ms
VectorFPtoIntCastOperations.microFloat2Long thrpt 3 59897.101 ± 7249.268 ops/ms
VectorFPtoIntCastOperations.microFloat2Short thrpt 3 86167.554 ± 8253.232 ops/ms

Please review.

Best Regards,
Sandhya


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8287835: Add support for additional float/double to integral conversion for x86

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/9032/head:pull/9032
$ git checkout pull/9032

Update a local copy of the PR:
$ git checkout pull/9032
$ git pull https://git.openjdk.java.net/jdk pull/9032/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9032

View PR using the GUI difftool:
$ git pr show -t 9032

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/9032.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 4, 2022

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 4, 2022

@sviswa7 The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jun 4, 2022
@sviswa7 sviswa7 changed the title 8287835: Add support for float/double to integral conversion for x86 8287835: Add support for additional float/double to integral conversion for x86 Jun 4, 2022
@sviswa7
Copy link
Author

sviswa7 commented Jun 4, 2022

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Jun 4, 2022
@openjdk
Copy link

openjdk bot commented Jun 4, 2022

@sviswa7
The core-libs label was successfully removed.

@sviswa7 sviswa7 marked this pull request as ready for review June 4, 2022 22:34
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 4, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 4, 2022

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is support for "vector conversion".

Please, add IR framework test.

(is_subword_type(bt) || bt == T_LONG)) {
return false;
}
if ((bt == T_LONG) && !VM_Version::supports_avx512dq()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again overlapping conditions. So T_LONG requires both: AVX512, avx512vl and avx512dq?

What about T_INT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T_INT doesn't need AVX512dq. Float to long conversion (T_LONG) uses evcvttps2qq, which needs AVX512dq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I see that there are 2 instructions to support F2I by using avx or evex encoding. They cover all cases.
No you are introducing sub_integer and long types only for evex encoding.

You need comment that F2I is supported in all cases. For other integral types you need 512vl and additionally 512dq for T_LONG.

Note, you don't need to check (UseAVX <= 2) because avx512vl bit is cleaned in such case. It is the same for VectorCastD2X code.

In such case I suggest:

  if (is_subword_type(bt) && !VM_Version::supports_avx512vl() ||
      (bt == T_LONG) && !VM_Version::supports_avx512vldq()) {
    return false;
  }

Comment on lines 7296 to 7298
predicate(((VM_Version::supports_avx512vl() ||
Matcher::vector_length_in_bytes(n) == 64)) &&
is_integral_type(Matcher::vector_element_basic_type(n)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some of these conditions since you have them already in match_rule_supported_vector()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predicate is not correct for all types this instruction is used now: it says that if size is 64 bytes you don't need avx512vl support for all types. Is it true?

All this is very confusing. I suggest to keep original castFtoI_reg_evex() instruction as it was and use new castFtoX_reg_evex() only for T_LONG and sub_integer with new predicate (type != T_INT) and additional conditions if needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was needed to select between the rules. On platforms that don't support avx512vl, we use AVX512 instructions only for 512-bit vectors and AVX instructions for < 64 byte vectors.

Comment on lines 1871 to 1877
if (((UseAVX <= 2) || (!VM_Version::supports_avx512vl())) &&
(is_subword_type(bt) || bt == T_INT)) {
return false;
}
if (bt == T_LONG && !VM_Version::supports_avx512dq()) {
if (is_integral_type(bt) && !VM_Version::supports_avx512dq()) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping conditions for the same types are confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments and rephrase the checks to make it clearer.

Comment on lines 7296 to 7298
predicate(((VM_Version::supports_avx512vl() ||
Matcher::vector_length_in_bytes(n) == 64)) &&
is_integral_type(Matcher::vector_element_basic_type(n)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predicate is not correct for all types this instruction is used now: it says that if size is 64 bytes you don't need avx512vl support for all types. Is it true?

All this is very confusing. I suggest to keep original castFtoI_reg_evex() instruction as it was and use new castFtoX_reg_evex() only for T_LONG and sub_integer with new predicate (type != T_INT) and additional conditions if needed.

@@ -1868,10 +1868,11 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType
}
break;
case Op_VectorCastD2X:
if (is_subword_type(bt) || bt == T_INT) {
if (((UseAVX <= 2) || (!VM_Version::supports_avx512vl())) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which asm instructions are required avx512vl? I don't see asserts in assembler_x86.cpp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avx512vl support is needed only for vectors < 512 bit. I have corrected this in the predicate.

@sviswa7
Copy link
Author

sviswa7 commented Jun 6, 2022

@vnkozlov I have implemented your review comments. The only item remaining is to add IR framework test.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Will wait IR test before testing and approval.

@sviswa7
Copy link
Author

sviswa7 commented Jun 6, 2022

@vnkozlov I have added the IR framework test case. Please take a look.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 6, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 6, 2022
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I will start testing.

@openjdk
Copy link

openjdk bot commented Jun 6, 2022

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8287835: Add support for additional float/double to integral conversion for x86

Reviewed-by: kvn, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 86 new commits pushed to the master branch:

  • e01cd7c: 8284780: Need methods to create pre-sized HashSet and LinkedHashSet
  • a941bc2: 8288082: Build failure due to clang_major is not defined after JDK-8214976
  • 65f0829: 8288068: Javadoc contains spurious reference to CLinker
  • 130ce7c: 8288052: Small logging clarification during failed heap shrinkage
  • b623398: 8287901: Loom: Failures with -XX:+VerifyStack
  • 04f02ac: 8214976: Warn about uses of functions replaced for portability
  • 024a240: 8287333: Clean up ParamTaglet and ThrowsTaglet
  • c8cff1b: 8202449: overflow handling in Random.doubles
  • c15e10f: 8233760: Result of BigDecimal.toString throws overflow exception on new BigDecimal(str)
  • b92ce26: 8281001: Class::forName(String) defaults to system class loader if the caller is null
  • ... and 76 more: https://git.openjdk.java.net/jdk/compare/a6fc485a22484b70daf170e981432c0856b9d93d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 6, 2022
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results are good.
You need second review.

@sviswa7
Copy link
Author

sviswa7 commented Jun 7, 2022

@vnkozlov Thanks a lot for the review and test.
@jatin-bhateja Could you please review this PR. It is an extension of your earlier work.

Comment on lines +7344 to +7349
if (to_elem_bt == T_SHORT) {
__ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
} else {
assert(to_elem_bt == T_BYTE, "required");
__ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support F2I cast on AVX2 and that can be extended for sub-word types using
signed saturated lane packing instructions (PACKSSDW and PACKSSWB).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will file a separate RFE for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 7372 to 7391
int vlen_enc = vector_length_encoding(this, $src);
__ vector_castD2L_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister,
$xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister,
ExternalAddress(vector_double_signflip()), $scratch$$Register, vlen_enc);
BasicType to_elem_bt = Matcher::vector_element_basic_type(this);
if (to_elem_bt != T_LONG) {
switch(to_elem_bt) {
case T_INT:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
break;
case T_SHORT:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
__ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
break;
case T_BYTE:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
__ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
break;
default: assert(false, "%s", type2name(to_elem_bt));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to a macro assembly routine named vector_castD2X_evex

Comment on lines 7378 to 7388
switch(to_elem_bt) {
case T_INT:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
break;
case T_SHORT:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
__ evpmovdw($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
break;
case T_BYTE:
__ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
__ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-word handling can be extended for AVX2 using packing instruction sequence similar to VectorStoreMask for quad ward lanes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D2X in general needs AVX 512 due to evcvttpd2qq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sviswa7 , for AVX we can use VCVTTPD2DQ to cast double precison lane to integer and subsequently to sub words lanes. For casting to long we do not have direct instruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jatin-bhateja. I have updated the RFE (https://bugs.openjdk.org/browse/JDK-8288043) to include this.

Comment on lines 44 to 45
private static final VectorSpecies<Float> fspec512 = FloatVector.SPECIES_512;
private static final VectorSpecies<Double> dspec512 = DoubleVector.SPECIES_512;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused declarations.


@Benchmark
public IntVector microFloat2Int() {
return (IntVector)fvec512.convertShape(VectorOperators.F2I, IntVector.SPECIES_512, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove explicit cast by setting return type to Vector

Applicable to all cases.

@sviswa7
Copy link
Author

sviswa7 commented Jun 8, 2022

@jatin-bhateja I have implemented your review comments. Please take a look.

@jatin-bhateja
Copy link
Member

Thanks @sviswa7

@sviswa7
Copy link
Author

sviswa7 commented Jun 9, 2022

@vnkozlov Could I go ahead and integrate? There were some minor changes and code rearrangement after your last test. Please let me know.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 9, 2022

I submitted new testing.

@sviswa7
Copy link
Author

sviswa7 commented Jun 9, 2022

Thanks a lot Vladimir!

@vnkozlov
Copy link
Contributor

@sviswa7 testing results are good. You can push.

Comment on lines +76 to +79
for (int i = 0; i < COUNT; i++) {
float_arr[i] = ran.nextFloat();
double_arr[i] = ran.nextDouble();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you kindly also add special floating point values NaN, +/-Inf, +/-0.0 to input array to cover your special handling code changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja The test is only checking the IR node generation for x86.
The rest of the actual functionality test is already covered under the following including the special cases:
compiler/codegen/TestByteDoubleVect.java compiler/codegen/TestByteFloatVect.java compiler/codegen/TestShortFloatVect.java compiler/codegen/TestShortDoubleVect.java compiler/codegen/TestLongFloatVect.java compiler/codegen/TestIntDoubleVect.java compiler/codegen/TestIntFloatVect.java
The general idea of this PR was to complement x86 FP to integral conversion along with https://git.openjdk.org/jdk/pull/7806 from Fei Gao.

@sviswa7
Copy link
Author

sviswa7 commented Jun 10, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Jun 10, 2022

Going to push as commit 2cc40af.
Since your change was applied there have been 111 commits pushed to the master branch:

  • 3ee1e60: 8288132: Update test artifacts in QuoVadis CA interop tests
  • 512db0f: 8271838: AmazonCA.java interop test fails
  • fcb35ed: 8287743: javax/swing/text/CSSBorder/6796710/bug6796710.java failed
  • bdd64d6: 8288181: AArch64: clean up out-of-date comments
  • 5d0e8b6: 8288203: runtime/ClassUnload/UnloadTestWithVerifyDuringGC.java fails with release VMs
  • 975316e: 8287902: UnreadableRB case in MissingResourceCauseTest is not working reliably on Windows
  • 0901548: 8283724: Incorrect description for jtreg-failure-handler option
  • dae4c49: 8286197: C2: Optimize MemorySegment shape in int loop
  • 94b473e: 8280454: G1: ClassLoaderData verification keeps CLDs live that causes problems with VerifyDuringGC during Remark
  • 900d967: 8287924: Avoid redundant HashMap.containsKey call in EnvHelp.mapToHashtable
  • ... and 101 more: https://git.openjdk.org/jdk/compare/a6fc485a22484b70daf170e981432c0856b9d93d...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 10, 2022
@openjdk openjdk bot closed this Jun 10, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 10, 2022
@openjdk
Copy link

openjdk bot commented Jun 10, 2022

@sviswa7 Pushed as commit 2cc40af.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@sviswa7 sviswa7 deleted the fpconvert branch June 3, 2024 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
3 participants