8286197: C2: Optimize MemorySegment shape in int loop#8555
8286197: C2: Optimize MemorySegment shape in int loop#8555rwestrel wants to merge 7 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
Webrevs
|
vnkozlov
left a comment
There was a problem hiding this comment.
Good suggestion. I have comments.
src/hotspot/share/opto/loopopts.cpp
Outdated
| // Check for having no control input; not pinned. Allow | ||
| // Check for having no control input; not pinned. Allow |
There was a problem hiding this comment.
fixed in new commit
src/hotspot/share/opto/castnode.cpp
Outdated
| if (in1 != NULL && in1->Opcode() == Op_ConvI2L) { | ||
| const Type* t = Value(phase); | ||
| const Type* t_in = phase->type(in1); | ||
| if (t != Type::TOP && t_in != Type::TOP && t != t_in) { |
There was a problem hiding this comment.
t != t_in does not mean that type is narrower in general case. I think we need to check ranges (types meet?).
There was a problem hiding this comment.
Thanks for looking at this. t is the result of Value() which takes the type of its input into account so, AFAICT, there's no way t can be wider than t_in. Am I missing something? If not I could add an assert. What do you think?
There was a problem hiding this comment.
There is no specialized CastLLNode::Value() and ConstraintCastNode only calls filter_speculative() which do call join(). May be it is indeed enough. Yes, would be nice to have an assert to make sure we got it right.
There was a problem hiding this comment.
The new commit adds an assert
Thanks for reviewing this. |
vnkozlov
left a comment
There was a problem hiding this comment.
Good. I will start testing.
|
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 588 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
Testing passed. It needs second review. |
|
Any one else for this one? |
TobiHartmann
left a comment
There was a problem hiding this comment.
Looks good to me but I'm wondering if we should delay this to JDK 20 as we are late for the JDK 19 release and had many issues with Cast/ConvNodes before.
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
….java Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Thanks for the review. I have no objection to delaying until JDK 20. |
|
/integrate |
|
Going to push as commit dae4c49.
Your commit was automatically rebased without conflicts. |
This is another small enhancement for a code shape that showed up in a
MemorySegment micro benchmark. The shape to optimize is the one from test1:
In that code shape, the loop iv is first scaled, result is then casted
to long, range checked and finally address of memory location is
computed.
The alignment check is transformed so the loop body has no check In
order to eliminate the range check, that loop is transformed into:
The address shape is (AddP base (CastLL (ConvI2L (LShiftI (AddI ...
In this case, the type of the ConvI2L is [min_jint, max_jint] and type
of CastLL is [0, max_jint] (the CastLL has a narrower type).
I propose transforming (CastLL (ConvI2L into (ConvI2L (CastII in that
case. The convI2L and CastII types can be set to [0, max_jint]. The
new address shape is then:
(AddP base (ConvI2L (CastII (LShiftI (AddI ...
which optimize well.
(LShiftI (AddI ...
is transformed into
(AddI (LShiftI ...
because one of the AddI input is loop invariant (i2) and we have:
(AddP base (ConvI2L (CastII (AddI (LShiftI ...
Then because the ConvI2L and CastII types are [0, max_jint], the AddI
is pushed through the ConvI2L and CastII:
(AddP base (AddL (ConvI2L (CastII (LShiftI ...
base and one of the inputs of the AddL are loop invariant so this
transformed into:
(AddP (AddP ...) (ConvI2L (CastII (LShiftI ...
The (AddP ...) is loop invariant so computed before entry. The
(ConvI2L ...) only depends on the loop iv.
The resulting address is a shift + an add. The address before
transformation requires 2 adds + a shift. Also after unrolling, the
adress of the second access in the loop is cheaper to compute as it
can be derived from the address of the first access.
For all of this to work:
I added a CastLL::Ideal transformation:
(CastLL (ConvI2L into (ConvI2l (CastII
I also had to prevent split if to transform (LShiftI (Phi for the
iv Phi of a counted loop.
test2 and test3 test 1) and 2) separately.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8555/head:pull/8555$ git checkout pull/8555Update a local copy of the PR:
$ git checkout pull/8555$ git pull https://git.openjdk.org/jdk pull/8555/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8555View PR using the GUI difftool:
$ git pr show -t 8555Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8555.diff