Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8290432: C2 compilation fails with assert(node->_last_del == _last) failed: must have deleted the edge just produced #9695

Closed
wants to merge 6 commits into from

Conversation

y1yang0
Copy link
Member

@y1yang0 y1yang0 commented Jul 31, 2022

Hi, can I have a review for this patch? JDK-8273585 recognized the form of Phi->CastII->AddI as additional parallel induction variables. In the following program:

class Test {
    static int dontInline() {
        return 0;
    }

    static long test(int val, boolean b) {
        long ret = 0;
        long dArr[] = new long[100];
        for (int i = 15; 293 > i; ++i) {
            ret = val;
            int j = 1;
            while (++j < 6) {
                int k = (val--);
                for (long l = i; 1 > l; ) {
                    if (k != 0) {
                        ret += dontInline();
                    }
                }
                if (b) {
                    break;
                }
            }
        }
        return ret;
    }

    public static void main(String[] args) {
        for (int i = 0; i < 1000; i++) {
            test(0, false);
        }
    }
}

val is incorrectly matched with the new parallel IV form:
image
And C2 further replaces it with newly added nodes, which finally leads the crash:
image

I think we can add more constraints to the new form. The form of Phi->CastXX->AddX appears when using Preconditions.checkIndex, and it would be recognized as additional IV when 1) Phi != phi2, 2) CastXX is controlled by RangeCheck(to reflect changes in Preconditions checkindex intrinsic)


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8290432: C2 compilation fails with assert(node->_last_del == _last) failed: must have deleted the edge just produced

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9695/head:pull/9695
$ git checkout pull/9695

Update a local copy of the PR:
$ git checkout pull/9695
$ git pull https://git.openjdk.org/jdk pull/9695/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9695

View PR using the GUI difftool:
$ git pr show -t 9695

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9695.diff

…ailed: must have deleted the edge just produced
@y1yang0 y1yang0 marked this pull request as ready for review August 1, 2022 02:18
@y1yang0
Copy link
Member Author

y1yang0 commented Aug 1, 2022

/rfr

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 1, 2022

👋 Welcome back yyang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 1, 2022
@openjdk
Copy link

openjdk bot commented Aug 1, 2022

@kelthuzadx Unknown command rfr - for a list of valid commands use /help.

@openjdk
Copy link

openjdk bot commented Aug 1, 2022

@kelthuzadx The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Aug 1, 2022
@mlbridge
Copy link

mlbridge bot commented Aug 1, 2022

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your test I assume internal loop for (long l= is processed. But val is updated outside it. Its AddI node should not be part of loop's body. Could you explain?

Comment on lines 3681 to 3682
if (incr2->in(1)->is_ConstraintCast()) {
if (!incr2->in(1)->in(0)->is_RangeCheck()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use && instead of nested condition.

Comment on lines 3678 to 3679
!incr2->in(2)->is_Con())
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you touch near code, please update style in these lines too to use {}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function is formatted.

@y1yang0
Copy link
Member Author

y1yang0 commented Aug 5, 2022

Hi @vnkozlov, thanks for your review. Here is the (almost) complete IR before replacing additional parallel IVs:
image

Before that, some loop optimizations are applied: [15,293) and [2,6) are recognized as counted loops, and the latter loop is maximally unrolled. Loop unswitching is applied when C2 saw if (b). The transformed program looks like:

static long test(int val, boolean b) {
    long ret = 0;
    long dArr[] = new long[100];
    if (b) {
        for (int i = 15; 293 > i; ++i) {
            ret = val;
            val--;
        }
    } else {
        for (int i = 15; 293 > i; ++i) {
            ret = val;
            val-=4
        }
    }
    return ret;
}

Their corresponding IR are as follows:
image

Phi#564 and Phi#125 represent val in two unswitched loops, respectively. val is actually part of loop body.

Comment on lines 3687 to 3689
// The form of Phi->CastXX->AddX appears when using Preconditions.checkIndex, and it would
// be recognized as additional IV when 1) Phi != phi2, 2) CastXX is controlled by RangeCheck
// (to reflect changes in LibraryCallKit::inline_preconditions_checkIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating comment. But it still do not say which issue you are trying to avoid. Yes, this pattern can be generated by inline_preconditions_checkIndex.
Currently your check skip search for parallel IV if CastII is NOT controlled by RangeCheck!
But the comment in the BODY of check talks about using parallel IV.
It is contradicting.
Aslo 1) Phi != phi2 says nothing to me - your check does not have such condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just say that we should skip AddI->CastII->Phi case if CastII is not controlled by local RangeCheck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just say that we should skip AddI->CastII->Phi case if CastII is not controlled by local RangeCheck.

Yes, this is exactly what I want to say. Maybe I should update comments.

@vnkozlov
Copy link
Contributor

vnkozlov commented Aug 5, 2022

About test's loops. I see that C2 recognized that for (long l = i; 1 > l; ) is not executed because i >= 15.
Then while (++j < 6) will be executed only once if b == true - C2 unswitch it with result of method code you showed.
Good.

BUT. val is long type. Why we are trying to use it as IV instead of existing int i? I think the issue is not in Cast node. There is mixture of long and int types here which confuses loop optimizer. What if val was int type?

@y1yang0
Copy link
Member Author

y1yang0 commented Aug 6, 2022

BUT. val is long type. Why we are trying to use it as IV instead of existing int i? I think the issue is not in Cast node. There is mixture of long and int types here which confuses loop optimizer. What if val was int type?

val is type of int! And we are not trying to use it instead of existing int i, we are using both of them as IV.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why it's an issue in this case if the control input of the CastII is not a range check?


if (incr2->in(1)->is_ConstraintCast() && !incr2->in(1)->in(0)->is_RangeCheck()) {
// Skip AddI->CastII->Phi case if CastII is not controlled by local RangeCheck
// to reflect changes in LibraryCallKit::inline_preconditions_checkIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the valid case, isn't the ConstraintCast control input incr2->in(1)->in(0) the IfTrue projection of the RangeCheck?

I would remove the second line because it's not clear which "changes" in LibraryCallKit::inline_preconditions_checkIndex it is referring to.

Suggested change
// to reflect changes in LibraryCallKit::inline_preconditions_checkIndex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear to me how the incr2->in(1)->in(0)->is_RangeCheck() condition can ever be true. How can the control input of a ConstraintCast be a RangeCheck? Shouldn't there be a projection node in-between?

Copy link
Member Author

@y1yang0 y1yang0 Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this comment before... incr2->in(1)->in(0)->in(0)->is_RangeCheck() is really what I want. I also double-checked this.

incr2->in(1)->in(0)->is_RangeCheck() prevents JDK-8273585 from optimization - 15s
incr2->in(1)->in(0)->in(0)->is_RangeCheck() allows JDK-8273585 to optimize further - 8s

@TobiHartmann
Copy link
Member

I think we should add an IR verification test for JDK-8273585.

@vnkozlov
Copy link
Contributor

vnkozlov commented Aug 9, 2022

BUT. val is long type. Why we are trying to use it as IV instead of existing int i? I think the issue is not in Cast node. There is mixture of long and int types here which confuses loop optimizer. What if val was int type?

val is type of int! And we are not trying to use it instead of existing int i, we are using both of them as IV.

Right. I mixed ret long type with val.

@vnkozlov
Copy link
Contributor

vnkozlov commented Aug 9, 2022

I still think the issue is in some other place. Your change just avoiding the case which triggers it. The java code is legal and compiling next code did not trigger nay issues:

    static long test(int val, boolean b) {
         long ret = 0;
         for (int i = 15; 293 > i; ++i) {
             ret = val;
             val--;
         }
         return ret;
     }

Actually I was not able to reproduce the issue with included test.

@y1yang0
Copy link
Member Author

y1yang0 commented Aug 24, 2022

I still think the issue is in some other place.

Why? Can you clarify more? The root cause seems clear: the unexpected Add->CastII->Phi pattern was matched due to changes in JDK-8273585 and finally leads the crash.

@y1yang0
Copy link
Member Author

y1yang0 commented Aug 24, 2022

I think we should add an IR verification test for JDK-8273585.

Yes, we need a verification test for it. I'll do this later.

@TobiHartmann
Copy link
Member

Why? Can you clarify more?

As I mentioned above, I don't understand how your newly added condition is supposed to work.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 6, 2022

@kelthuzadx This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@TobiHartmann
Copy link
Member

Comment to keep this open. @kelthuzadx, any update on this?

@y1yang0
Copy link
Member Author

y1yang0 commented Nov 16, 2022

Why? Can you clarify more?

As I mentioned above, I don't understand how your newly added condition is supposed to work.

@vnkozlov @TobiHartmann I reviewed the above comments again. I think the proposed fix is good. According to my analysis at #9695 (comment) , we have a parallel IV which formed as Add->CastII->Phi, the birth of the form looks good, and this pattern is only legal after JDK-8273585. In that case, Add->CastII->Phi parallel IV was generated by Preconditions.checkIndex, CastII was controlled by RangeCheck. In the proposed fix, I'm trying to avoid to recognize similar pattern if CastII was not controlled by RangeCheck.

As far as this fix itself is concerned, I think the whole process is reasonable, i.e. we accept that form as parallel IV only when it was generated by library_call_preconditions_checkindex. The only thing I'm not sure about is whether this Add-CastII-Phi form itself is really reasonable as an IV, which is why I think adding an IR verification test for JDK-8273585 is reasonable, I will go back to investigate that again after this patch, what do you think?

/**
* @test
* @bug 8290432
* @summary Unexpected parallel induction variable pattern was recongized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not reproduce the issue for me, whereas Test-2.java still works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reset to JDK-8273585, TestUnexpectedParallelIV.java reproduces the crash. Test-2.java can always reproduce without resetting commits. I added both of them as test cases.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 17, 2022
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version looks reasonable to me but please merge the two tests into one. There is also a jcheck whitespace error.

If you don't intend to add an IR verification test for JDK-8273585 with this fix, please file at least a follow-up enhancement.

@y1yang0
Copy link
Member Author

y1yang0 commented Nov 21, 2022

I filed adding IR verification test as https://bugs.openjdk.org/browse/JDK-8297307.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2022
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Nov 21, 2022

@y1yang0 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8290432: C2 compilation fails with assert(node->_last_del == _last) failed: must have deleted the edge just produced

Reviewed-by: kvn, thartmann, chagedorn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1797 new commits pushed to the master branch:

  • 0d2a9ee: 8298142: Update internal comment on language features in SourceVersion
  • 203251f: 8297379: Enable the ByteBuffer path of Poly1305 optimizations
  • 1e46832: 8297602: Compiler crash with type annotation and generic record during pattern matching
  • b0e5432: 8297687: new URI(S,S,S,S) throws exception with incorrect index position reported in the error message
  • 2243646: 8298145: Remove ContiguousSpace::capacity
  • 84b927a: 8296024: Usage of DirectBuffer::address should be guarded
  • a9e6c62: 8297186: G1 triggers unnecessary full GCs when heap utilization is low
  • 4458de9: 8297172: Fix some issues of auto-vectorization of Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
  • a613998: 8297689: Fix incorrect result of Short.reverseBytes() call in loops
  • f8f4630: 8297963: Partially fix string expansion issues in UTIL_DEFUN_NAMED and related macros
  • ... and 1787 more: https://git.openjdk.org/jdk/compare/af86cd3d8c0f8a874d1b738ad0caeeb7cd4c61d0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 21, 2022
@y1yang0
Copy link
Member Author

y1yang0 commented Nov 21, 2022

@vnkozlov Can I have a second review? Thanks!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may ask to do our internal performance testing for this change too before approval.

@y1yang0
Copy link
Member Author

y1yang0 commented Nov 29, 2022

I may ask to do our internal performance testing for this change too before approval.

Do you mean... oracle internal performance testing or opensource microbenchmark inside openjdk? For the former one, may I ask your help to do that? Thanks. (For the later one, I tested JDK-8273585 attached test, and it works well. After this patch, it takes 8s.)

@TobiHartmann
Copy link
Member

I'll run some performance testing in our system and report back once it passed.

@TobiHartmann
Copy link
Member

Performance results look good (no measurable difference). A second review would be good though.

@TobiHartmann
Copy link
Member

/reviewers 2

@openjdk
Copy link

openjdk bot commented Dec 1, 2022

@TobiHartmann
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 1, 2022
@chhagedorn
Copy link
Member

Do we really need this bailout? From your very first picture, it does not seem wrong to replace 125 Phi - the only problem is that 465 CastII also happens to have 432 CountedLoop as control input. This leads to the crash because we are removing two outputs of 432 CountedLoop at once which is not valid and previously unexpected (before JDK-8273585) . Couldn't we instead just update replace_parallel_iv() in such a way that it can handle this particular case of removing two outputs at once from the counted loop?

@y1yang0
Copy link
Member Author

y1yang0 commented Dec 1, 2022

it does not seem wrong to replace 125 Phi - the only problem is that 465 CastII also happens to have 432 CountedLoop as control input.

If replacing Phi#125 is acceptable, we need to reach a consensus that the form of Add->CastII->Phi can be considered as parallel IV, which is not valid before JDK-8273585. Prior to that, we only considered variables of the form X += constant(Add->Phi) to be IV.

I’m not sure whether we have such a consensus that the form of Add->CastII->Phi is considered a parallel IV beside a special case? If not, a conservative approach would be to reject it as a parallel IV at the beginning, in this way, subsequent Phi replacement will not occur.

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If replacing Phi#125 is acceptable, we need to reach a consensus that the form of Add->CastII->Phi can be considered as parallel IV

I think it should be safe and we've already done that since JDK-8273585 and haven't seen a crash related to that idea. But given how close we are to the fork, I suggest to go with your bailout fix for JDK 20 which is safer (and performance testing done by Tobias looks good). In this way, we really only optimize the pattern originally intended in JDK-8273585.

For the general case of allowing any cast node, I suggest to file an RFE and investigate again if that is possible/correct. I have the feeling that it is but it would be better to defer that to JDK 21. What do you think?

Thanks,
Christian

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 1, 2022
@TobiHartmann
Copy link
Member

That makes sense, you can re-use/extend JDK-8297307 for that.

Vladimir should also have a look at this again.

@y1yang0
Copy link
Member Author

y1yang0 commented Dec 1, 2022

For the general case of allowing any cast node, I suggest to file an RFE and investigate again if that is possible/correct.

Thanks, Christian

I think it’s really reasonable. As Tobias said, we can reuse https://bugs.openjdk.org/browse/JDK-8297307 for this purpose. Thanks!

@vnkozlov
Copy link
Contributor

vnkozlov commented Dec 6, 2022

That makes sense, you can re-use/extend JDK-8297307 for that.

Vladimir should also have a look at this again.

I agree with this suggestion.

@y1yang0
Copy link
Member Author

y1yang0 commented Dec 7, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Dec 7, 2022

Going to push as commit acf96c6.
Since your change was applied there have been 1806 commits pushed to the master branch:

  • ce89673: 8297804: (tz) Update Timezone Data to 2022g
  • 62baff5: 8298221: Problem list gc/metaspace/CompressedClassSpaceSizeInJmapHeap.java on macosx-aarch64
  • 16a5901: 8298214: ProblemList java/util/concurrent/forkjoin/AsyncShutdownNow.java
  • b4da0ee: 8296507: GCM using more memory than necessary with in-place operations
  • cd2182a: 8295724: VirtualMachineError: Out of space in CodeCache for method handle intrinsic
  • 2cdc019: 8298178: Update to use jtreg 7.1.1
  • 79d163d: 8293412: Remove unnecessary java.security.egd overrides
  • ea83cb9: 8297450: ScaledTextFieldBorderTest.java fails when run with -show parameter
  • 336d230: 8297958: NMT: Display peak values
  • 0d2a9ee: 8298142: Update internal comment on language features in SourceVersion
  • ... and 1796 more: https://git.openjdk.org/jdk/compare/af86cd3d8c0f8a874d1b738ad0caeeb7cd4c61d0...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 7, 2022
@openjdk openjdk bot closed this Dec 7, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 7, 2022
@y1yang0 y1yang0 deleted the JDK-8290432 branch December 7, 2022 03:08
@openjdk
Copy link

openjdk bot commented Dec 7, 2022

@y1yang0 Pushed as commit acf96c6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants