8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms #9748

jatin-bhateja · 2022-08-04T16:20:10Z

Hi All,

This patch extends conversion optimizations added with JDK-8287835 to optimize following floating point to integral conversions for X86 AVX2 targets:-

D2I , D2S, D2B, F2I , F2S, F2B

In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature.

D2I, D2S, D2B

Following are the JMH micro performance results with and without patch.

System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz)

BENCHMARK	SIZE	BASELINE (ops/ms)	WITHOPT (ops/ms)	PERF GAIN FACTOR
VectorFPtoIntCastOperations.microDouble128ToByte128	1024	90.603	92.797	1.024215534
VectorFPtoIntCastOperations.microDouble128ToByte256	1024	81.909	82.3	1.00477359
VectorFPtoIntCastOperations.microDouble128ToByte512	1024	26.181	26.244	1.002406325
VectorFPtoIntCastOperations.microDouble128ToInteger128	1024	90.74	2537.958	27.96956138
VectorFPtoIntCastOperations.microDouble128ToInteger256	1024	81.586	2429.599	29.7796068
VectorFPtoIntCastOperations.microDouble128ToInteger512	1024	19.406	19.61	1.010512213
VectorFPtoIntCastOperations.microDouble128ToLong128	1024	91.723	90.754	0.989435583
VectorFPtoIntCastOperations.microDouble128ToShort128	1024	91.766	1984.577	21.62649565
VectorFPtoIntCastOperations.microDouble128ToShort256	1024	81.949	1940.599	23.68056962
VectorFPtoIntCastOperations.microDouble128ToShort512	1024	16.468	16.56	1.005586592
VectorFPtoIntCastOperations.microDouble256ToByte128	1024	163.331	3018.351	18.479964
VectorFPtoIntCastOperations.microDouble256ToByte256	1024	148.878	3082.034	20.70174237
VectorFPtoIntCastOperations.microDouble256ToByte512	1024	50.108	51.629	1.030354434
VectorFPtoIntCastOperations.microDouble256ToInteger128	1024	159.805	4619.421	28.90661118
VectorFPtoIntCastOperations.microDouble256ToInteger256	1024	143.876	4649.642	32.31700909
VectorFPtoIntCastOperations.microDouble256ToInteger512	1024	38.127	38.188	1.001599916
VectorFPtoIntCastOperations.microDouble256ToLong128	1024	160.322	162.442	1.013223388
VectorFPtoIntCastOperations.microDouble256ToLong256	1024	141.252	143.01	1.012445841
VectorFPtoIntCastOperations.microDouble256ToShort128	1024	157.717	3757.471	23.82413437
VectorFPtoIntCastOperations.microDouble256ToShort256	1024	143.876	3830.971	26.62689399
VectorFPtoIntCastOperations.microDouble256ToShort512	1024	32.061	32.911	1.026511962
VectorFPtoIntCastOperations.microFloat128ToByte128	1024	146.599	4002.967	27.30555461
VectorFPtoIntCastOperations.microFloat128ToByte256	1024	136.99	3938.799	28.75245638
VectorFPtoIntCastOperations.microFloat128ToByte512	1024	51.561	50.284	0.975233219
VectorFPtoIntCastOperations.microFloat128ToInteger128	1024	5933.565	5361.472	0.903583596
VectorFPtoIntCastOperations.microFloat128ToInteger256	1024	5079.564	5062.046	0.996551279
VectorFPtoIntCastOperations.microFloat128ToInteger512	1024	37.101	38.419	1.035524649
VectorFPtoIntCastOperations.microFloat128ToLong128	1024	145.863	145.362	0.99656527
VectorFPtoIntCastOperations.microFloat128ToLong256	1024	131.159	133.154	1.015210546
VectorFPtoIntCastOperations.microFloat128ToShort128	1024	145.966	4150.039	28.4315457
VectorFPtoIntCastOperations.microFloat128ToShort256	1024	134.703	4566.589	33.90116775
VectorFPtoIntCastOperations.microFloat128ToShort512	1024	31.878	30.867	0.968285338
VectorFPtoIntCastOperations.microFloat256ToByte128	1024	237.841	6292.051	26.4548627
VectorFPtoIntCastOperations.microFloat256ToByte256	1024	222.041	6292.748	28.34047766
VectorFPtoIntCastOperations.microFloat256ToByte512	1024	92.073	88.981	0.966417951
VectorFPtoIntCastOperations.microFloat256ToInteger128	1024	11471.121	10269.636	0.895260019
VectorFPtoIntCastOperations.microFloat256ToInteger256	1024	10729.816	10105.92	0.941853989
VectorFPtoIntCastOperations.microFloat256ToInteger512	1024	68.328	70.005	1.024543379
VectorFPtoIntCastOperations.microFloat256ToLong128	1024	247.101	248.571	1.005948984
VectorFPtoIntCastOperations.microFloat256ToLong256	1024	225.74	223.987	0.992234429
VectorFPtoIntCastOperations.microFloat256ToLong512	1024	76.39	76.187	0.997342584
VectorFPtoIntCastOperations.microFloat256ToShort128	1024	233.196	8202.179	35.17289748
VectorFPtoIntCastOperations.microFloat256ToShort256	1024	220.75	7781.073	35.24834881
VectorFPtoIntCastOperations.microFloat256ToShort512	1024	58.143	55.633	0.956830573

Kindly review and share your feedback.

Best Regards,
Jatin

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)
Sandhya Viswanathan (@sviswa7 - Reviewer) ⚠️ Review applies to 5fb99e81

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748
$ git checkout pull/9748

Update a local copy of the PR:
$ git checkout pull/9748
$ git pull https://git.openjdk.org/jdk pull/9748/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9748

View PR using the GUI difftool:
$ git pr show -t 9748

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9748.diff

… AVX2 platforms

jatin-bhateja · 2022-08-04T16:22:10Z

/label add hotspot-compiler-dev

bridgekeeper · 2022-08-04T16:38:45Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-08-04T16:42:30Z

@jatin-bhateja
The hotspot-compiler label was successfully added.

mlbridge · 2022-08-05T02:32:34Z

Webrevs

TobiHartmann · 2022-08-29T06:58:19Z

I can run some testing in our system once you resolved the merge conflicts.

…egisters needed to load stub constants.

TobiHartmann · 2022-09-08T06:24:26Z

Testing in our system did not show any failures but I see that there are SIGILL failures in the pre-submit testing.

sviswa7 · 2022-09-17T01:00:48Z

Could you please enable the compiler/vectorapi/VectorFPtoIntCastTest.java for AVX2 platforms?
Currently they are only run for AVX512DQ platforms.

jatin-bhateja · 2022-09-19T14:15:19Z

Could you please enable the compiler/vectorapi/VectorFPtoIntCastTest.java for AVX2 platforms? Currently they are only run for AVX512DQ platforms.

I have added missing casting cases AVX/AVX2 and AVX512 targets in existing comprehensive test for casting

vnkozlov · 2022-09-19T21:28:58Z

src/hotspot/cpu/x86/matcher_x86.hpp

        return 0;
+      case Op_VectorCastF2X: // fall through
+      case Op_VectorCastD2X: {
+        return is_subword_type(ety) ? 75 : 70;


Add comment here and in other cases explaining numbers. Is it size of instructions or elements or something?

Value now matches the one for RoundV[FD] IR nodes, currently, its a rudimentary heuristic based on emitted code size for complex IR nodes used by unroll policy. Idea is to constrain unrolling factor and prevent generating bloated loop bodies.

Please, add this information/clarification to this method's comment at line 186.

vnkozlov

Good. I will test it.

You need second review.

vnkozlov

My testing passed.

openjdk · 2022-09-21T16:28:06Z

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 19 new commits pushed to the master branch:

3a980b9: 8295168: Remove superfluous period in @throws tag description
9bb932c: 8295154: Documentation for RemoteExecutionControl.invoke(Method) inherits non-existent documentation
945950d: 8295069: [PPC64] Performance regression after JDK-8290025
d362e16: 8294689: The SA transported_core.html file needs quite a bit of work
07946aa: 8289552: Make intrinsic conversions between bit representations of half precision values and floats
2586b1a: 8295155: Incorrect javadoc of java.base module
e1a77cf: 8295163: Remove old hsdis Makefile
3c7ae12: 8294821: Class load improvement for AES crypto engine
619cd82: 8294702: BufferedInputStream uses undefined value range for markpos
9d0009e: 6777156: GTK L&F: JFileChooser can jump beyond root directory in combobox and selection textarea.
... and 9 more: https://git.openjdk.org/jdk/compare/9d116ec147a3182a9c831ffdce02c98da8c5031d...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

sviswa7

I am still going through the c2_MacroAssembler_x86.cpp changes. Hopefully early next week will finish the review.

sviswa7 · 2022-09-30T23:58:25Z

src/hotspot/cpu/x86/matcher_x86.hpp

        return 0;
+      case Op_VectorCastF2X: // fall through
+      case Op_VectorCastD2X: {
+        return is_subword_type(ety) ? 35 : 30;


This needs to be more selective. It is not that in all cases F2X and D2X need lot of instructions e.g. F2D, D2F are single instruction.

sviswa7 · 2022-10-01T00:28:16Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

-
+void C2_MacroAssembler::vector_castF2L_evex(XMMRegister dst, XMMRegister src, XMMRegister xtmp1, XMMRegister xtmp2,
+                                            KRegister ktmp1, KRegister ktmp2, AddressLiteral double_sign_flip,
+                                            Register rscratch, int vec_enc) {


Need an assert here:
assert(rscratch != noreg || always_reachable(double_sign_flip), "missing");

Hi @sviswa7, assertions are part of leaf level macro assembly routine which is vector_cast_float_to_long_special_cases_evex in this case.

sviswa7 · 2022-10-07T00:06:20Z

@jatin-bhateja Rest of the changes look good to me. Mainly the vector_op_pre_select_sz_estimate() needs to be corrected.

vnkozlov · 2022-10-10T20:04:55Z

@jatin-bhateja, please merge latest JDK and I will start re-testing.

jatin-bhateja · 2022-10-11T12:27:11Z

@jatin-bhateja, please merge latest JDK and I will start re-testing.

Hi @kvn, kindly regress the changes.

vnkozlov · 2022-10-11T15:54:21Z

I started new testing

vnkozlov

My testing passed.

jatin-bhateja · 2022-10-12T01:04:35Z

Thanks @sviswa7 and @vnkozlov.

jatin-bhateja · 2022-10-12T01:04:46Z

/integrate

openjdk · 2022-10-12T01:05:53Z

Going to push as commit 2ceb80c.
Since your change was applied there have been 21 commits pushed to the master branch:

703a6ef: 8283699: Improve the peephole mechanism of hotspot
94a9b04: 8295013: OopStorage should derive from CHeapObjBase
3a980b9: 8295168: Remove superfluous period in @throws tag description
9bb932c: 8295154: Documentation for RemoteExecutionControl.invoke(Method) inherits non-existent documentation
945950d: 8295069: [PPC64] Performance regression after JDK-8290025
d362e16: 8294689: The SA transported_core.html file needs quite a bit of work
07946aa: 8289552: Make intrinsic conversions between bit representations of half precision values and floats
2586b1a: 8295155: Incorrect javadoc of java.base module
e1a77cf: 8295163: Remove old hsdis Makefile
3c7ae12: 8294821: Class load improvement for AES crypto engine
... and 11 more: https://git.openjdk.org/jdk/compare/9d116ec147a3182a9c831ffdce02c98da8c5031d...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-10-12T01:06:14Z

@jatin-bhateja Pushed as commit 2ceb80c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Jatin Bhateja added 2 commits August 4, 2022 03:48

8288043: Optimize FP to word/sub-word integral type conversion on X86…

089f28d

… AVX2 platforms

8288043: Changing file permission.

5998faf

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Aug 4, 2022

openjdk bot added the rfr Pull request is ready for review label Aug 5, 2022

Jatin Bhateja added 2 commits August 16, 2022 13:44

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8288043

6087ad8

8288043: Adding a descriptive comment.

842b3d7

Jatin Bhateja and others added 3 commits September 5, 2022 12:53

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8288043

2e5ecd9

8288043: Adding a descriptive comment for removing explicit scratch r…

51de0e2

…egisters needed to load stub constants.

8288043: Some mainline merge realted cleanups.

5cdfd68

8288043: Code re-factoring.

dce02fa

8288043: Extending exiting regressions with more cases.

80f1ad7

vnkozlov reviewed Sep 19, 2022

View reviewed changes

Jatin Bhateja added 2 commits September 20, 2022 12:27

8288043: cost adjustments for loop body size estimation.

4dbc18c

8288043: Adding descriptive comments.

f54ea60

vnkozlov reviewed Sep 20, 2022

View reviewed changes

vnkozlov approved these changes Sep 21, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 21, 2022

sviswa7 reviewed Oct 1, 2022

View reviewed changes

sviswa7 approved these changes Oct 10, 2022

View reviewed changes

Jatin Bhateja added 2 commits October 11, 2022 04:08

8288043: Review comments resolutions.

5fb99e8

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8288043

6fb1a5d

vnkozlov approved these changes Oct 11, 2022

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Oct 12, 2022

openjdk bot closed this Oct 12, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 12, 2022

jatin-bhateja deleted the JDK-8288043 branch January 20, 2023 21:26

8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms #9748

8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms #9748

Uh oh!

Conversation

jatin-bhateja commented Aug 4, 2022 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

jatin-bhateja commented Aug 4, 2022

Uh oh!

bridgekeeper bot commented Aug 4, 2022

Uh oh!

openjdk bot commented Aug 4, 2022

Uh oh!

mlbridge bot commented Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

TobiHartmann commented Aug 29, 2022

Uh oh!

TobiHartmann commented Sep 8, 2022

Uh oh!

sviswa7 commented Sep 17, 2022

Uh oh!

jatin-bhateja commented Sep 19, 2022

Uh oh!

vnkozlov Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Sep 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sviswa7 left a comment

Choose a reason for hiding this comment

Uh oh!

sviswa7 Sep 30, 2022

Choose a reason for hiding this comment

Uh oh!

sviswa7 Oct 1, 2022

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja Oct 10, 2022

Choose a reason for hiding this comment

Uh oh!

sviswa7 commented Oct 7, 2022

Uh oh!

vnkozlov commented Oct 10, 2022

Uh oh!

jatin-bhateja commented Oct 11, 2022

Uh oh!

vnkozlov commented Oct 11, 2022

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Oct 12, 2022

Uh oh!

jatin-bhateja commented Oct 12, 2022

Uh oh!

openjdk bot commented Oct 12, 2022

Uh oh!

openjdk bot commented Oct 12, 2022

Uh oh!

Reviewers

jatin-bhateja commented Aug 4, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented Aug 5, 2022 •

edited

Loading

jatin-bhateja Sep 20, 2022 •

edited

Loading

openjdk bot commented Sep 21, 2022 •

edited

Loading