Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8291669: [REDO] Fix array range check hoisting for some scaled loop iv #9851

Closed
wants to merge 4 commits into from

Conversation

pfustc
Copy link
Member

@pfustc pfustc commented Aug 12, 2022

This is a REDO of JDK-8289996. In previous patch, we defer some strength
reductions in Ideal functions of Mul[I|L]Node to post loop igvn phase
to fix a range check hoisting issue. More about previous patch can be
found in PR #9508, where we have described some details of the issue
we would like to fix.

Previous patch was backed out due to some jtreg failures found. We have
analyzed those failures one by one and found one of them exposes a real
performance regression. We see that deferring some strength reductions
to post loop igvn phase has too much impact. Some vector multiplication
will not be optimized to vector addition with vector shift after that
change. So in this REDO we propose the range check hoisting fix with a
different approach.

In this new patch, we add some recursive pattern matches for scaled loop
iv in function PhaseIdealLoop::is_scaled_iv(). These include matching
a sum or a difference of two scaled iv expressions. With this, all kinds
of Ideal-transformed scaled iv expressions can still be recognized. This
new approach only touches loop transformation code and hence has much
smaller impact. We have verified that this new approach applies to both
int range checks and long range checks.

Previously attached jtreg case fails on ppc64 because VectorAPI has no
vector intrinsics on ppc64 so there's no long range check to hoist. In
this patch, we limit the test architecture to x64 and AArch64.

Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8291669: [REDO] Fix array range check hoisting for some scaled loop iv

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9851/head:pull/9851
$ git checkout pull/9851

Update a local copy of the PR:
$ git checkout pull/9851
$ git pull https://git.openjdk.org/jdk pull/9851/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9851

View PR using the GUI difftool:
$ git pr show -t 9851

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9851.diff

This is a REDO of JDK-8289996. In previous patch, we defer some strength
reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase
to fix a range check hoisting issue. More about previous patch can be
found in PR openjdk#9508 [1], where we have described some details of the issue
we would like to fix.

Previous patch was backed out due to some jtreg failures found. We have
analyzed those failures one by one and found one of them exposes a real
performance regression. We see that deferring some strength reductions
to post loop igvn phase has too much impact. Some vector multiplication
will not be optimized to vector addition with vector shift after that
change. So in this REDO we propose the range check hoisting fix with a
different approach.

In this new patch, we add some recursive pattern matches for scaled loop
iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching
a sum or a difference of two scaled iv expressions. With this, all kinds
of Ideal-transformed scaled iv expressions can still be recognized. This
new approach only touches loop transformation code and hence has much
smaller impact. We have verified that this new approach applies to both
int range checks and long range checks.

Previously attached jtreg case fails on ppc64 because VectorAPI has no
vector intrinsics on ppc64 so there's no long range check to hoist. In
this patch, we limit the test architecture to x64 and AArch64.

Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 12, 2022

👋 Welcome back pli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 12, 2022
@openjdk
Copy link

openjdk bot commented Aug 12, 2022

@pfustc The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Aug 12, 2022
@mlbridge
Copy link

mlbridge bot commented Aug 12, 2022

Webrevs

}
if (p_short_scale != NULL) {
// (ConvI2L (MulI iv K)) can be 64-bit linear if iv is kept small enough...
*p_short_scale = *p_short_scale || (exp_bt != bt && scale != 1);
*p_short_scale = short_scale_l || short_scale_r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be short_scale_l && short_scale_r here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be short_scale_l && short_scale_r here

Hi @rwestrel , may I ask you a question about this? From your comments, I see short_scale reports if a ConvI2L node is present since it's used to protect against overflow. Does this mean that ConvI2L at this point only appears in long counted loops? I ask this because in my knowledge array address computing in int loops also generates ConvI2L on 64-bit platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for loops like this one:
public static void testStridePosScalePosInIntLoop1(int start, int stop, long length, long offset) {
final long scale = 2;
final int stride = 1;

    // Same but with int loop
    for (int i = start; i < stop; i += stride) {
        Objects.checkIndex(scale * i + offset, length);
    }
}

It's an int loop but because length is a long, there's an implicit cast of scale * i + offset to long (which is where the ConvI2L comes from). In the case of your change an expression for the range check that would need to be optimized would be:
((long)i) * scale
with scale 5 for instance so expressed by the compiler as ((long)i) << 2 + ((long)i) << 1
and both calls to is_scalled_iv would return true for short_scale which is why I think it should short_scale_l && short_scale_r
You're right that address computation includes a ConvI2L on 64 bits but the range check doesn't in:
array[i] = val;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your detailed explanation! I have updated these.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Sep 2, 2022

@pfustc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8291669: [REDO] Fix array range check hoisting for some scaled loop iv

Reviewed-by: roland, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 371 new commits pushed to the master branch:

  • 7f3250d: 8293787: Linux aarch64 build fails after 8292591
  • 2a38791: 8292755: Non-default method in interface leads to a stack overflow in JShell
  • 8351b30: 8293771: runtime/handshake/SystemMembarHandshakeTransitionTest.java fails if MEMBARRIER_CMD_QUERY is unsupported
  • 91f9c0d: 8293774: Improve TraceOptoParse to dump the bytecode name
  • 1169a15: 8291657: Javac assertion when compiling a method call with switch expression as argument
  • 2baf251: 8293654: Improve SharedRuntime handling of continuation helper out-arguments
  • 60f59a4: 8293660: Fix frame::sender_for_compiled_frame frame size assert
  • b3461c1: 8293680: PPC64BE build failure after JDK-8293344
  • 7e02039: 8293647: Avoid unnecessary boxing in jdk.hotspot.agent
  • 9039022: 8287394: AArch64: Remove cbuf parameter from far_call/far_jump/trampoline_call
  • ... and 361 more: https://git.openjdk.org/jdk/compare/0c40128fec41cf69821dbf7f1b19600560e8ac12...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 2, 2022
@pfustc
Copy link
Member Author

pfustc commented Sep 5, 2022

May I have another review for this REDO? Perhaps @vnkozlov @TobiHartmann

Comment on lines +2775 to +2776
// as we use jlong to compute so do the check here. Long result may also
// overflow but that's fine because result wraps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't this mean that we bail out for integer overflows while not bailing out for long overflows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. If this inconsistency doesn't look good, I could also try adding long overflow checks just like what we have in utility function bool add_overflows(T x, T y).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if there's a good reason for bailing out for integer overflows and if the same applies to long overflows. @rwestrel, you added that check with JDK-8278296, do you remember why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The:
if (scale == min_signed_integer(exp_bt)) {
?
(It's` from JDK-8259609)
The problem I think is for the expression: -min_jint * i
scale here is min_jint initially, stored in a long. It's then multiplied by -1. -min_jint = min_jint when stored in an int but not in a long. When scale is later transformed from a long to an int, some code finds that -(long)min_jint can't be stored in an int.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. In my understanding, min_jint is also a special point where bailing out is required. I should update the condition of scale_sum < min_signed_integer(exp_bt) to scale_sum <= min_signed_integer(exp_bt), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you must be right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated this.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@pfustc
Copy link
Member Author

pfustc commented Sep 14, 2022

Thanks for review. I will integrate this.

@pfustc
Copy link
Member Author

pfustc commented Sep 14, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Sep 14, 2022

Going to push as commit 211fab8.
Since your change was applied there have been 371 commits pushed to the master branch:

  • 7f3250d: 8293787: Linux aarch64 build fails after 8292591
  • 2a38791: 8292755: Non-default method in interface leads to a stack overflow in JShell
  • 8351b30: 8293771: runtime/handshake/SystemMembarHandshakeTransitionTest.java fails if MEMBARRIER_CMD_QUERY is unsupported
  • 91f9c0d: 8293774: Improve TraceOptoParse to dump the bytecode name
  • 1169a15: 8291657: Javac assertion when compiling a method call with switch expression as argument
  • 2baf251: 8293654: Improve SharedRuntime handling of continuation helper out-arguments
  • 60f59a4: 8293660: Fix frame::sender_for_compiled_frame frame size assert
  • b3461c1: 8293680: PPC64BE build failure after JDK-8293344
  • 7e02039: 8293647: Avoid unnecessary boxing in jdk.hotspot.agent
  • 9039022: 8287394: AArch64: Remove cbuf parameter from far_call/far_jump/trampoline_call
  • ... and 361 more: https://git.openjdk.org/jdk/compare/0c40128fec41cf69821dbf7f1b19600560e8ac12...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 14, 2022
@openjdk openjdk bot closed this Sep 14, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 14, 2022
@openjdk
Copy link

openjdk bot commented Sep 14, 2022

@pfustc Pushed as commit 211fab8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@pfustc pfustc deleted the rangecheck branch September 14, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
3 participants