Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL #13269

Closed
wants to merge 19 commits into from

Conversation

eme64
Copy link
Contributor

@eme64 eme64 commented Mar 31, 2023

Context

During PhaseIdealLoop::do_unroll, we hack the loop-limit, and subtract stride from it. We have to prevent underflow on that subtract. Currently, we do this with a CMoveI. The problem with this: CMoveI is not smart enough to generate a precise type. For example, there are many cases where the input types get better, and underflow is not possible anymore. But the CMoveI does not detect this, and still has type min_jint..hi.

We have the same issue in PhaseIdealLoop::adjust_limit, where we use CMoveL to implement long max/min. The types are not as precise as they could and should be.

Problem

The imprecise type is used for the zero-trip-guard. It does not fold to false, even though the data-path into the post loop does constant fold to TOP. The graph breaks, and assert malformed control flow triggers.

Details: In these cases, we have the super-unrolled main-loop (SuperWord'ed, then further unrolled) directly leading to a vectorized post-loop. The effect is that there is no region/phi merging main-exit and main-zero-trip-guard. So the types are already more narrow here. It may be possible that the values are such that we find out that we should never enter the vectorized post-loop. But if data finds out and control does not, we get a broken graph.
Note: we have pre-loop. Then a main-loop and vectorized post loop. Then we merge the main-zero-trip-guard. And at the end we have the scalar post loop.

I have already recently fixed a bug around this CMoveI. 5a4945c I would now like to have a more satisfactory fix, that properly propagates the types.

Solution

PhaseIdealLoop::adjust_limit already converts the limit from int to long, and does all computations in long, including taking max/min with a CMoveL. I now use the so far unused MaxL/MinL. I implemented some missing Value/Identity components for it. Since MaxL/MinL is not implemented in the backend, I just expand it in macro-expansion to a CMoveL. At that point the loop-opts are over, and it is most likely ok that we do not make the types more precise after this.

I take the same approach for PhaseIdealLoop::do_unroll: convert limits to long, do subtraction in long, take MinL/MaxL to clamp it to the int-range (prevent subtraction underflow).

Discussion

This solution seems much cleaner to me, and I hope that we will see less bugs because of imprecise types in the limit computation, which were often due to the CMove not being smart enough to analyze all inputs (it would have to recognize a multitude of patterns, for the Cmp inputs and the direct inputs to the CMove - we currently do not do that, but just take the union of the input types - this is very inprecise).

There is a bit of an overhead here: We use longs even though we only want to have int values. But I think we should prefer a clean implementation here, with correct type computation. The performance impact is probably non-existent on 64-bit machines anyway.

Caveat

I found some cases with the same assert malformed control flow that are most likely skeleton/assertion predicate bugs JDK-8288981. Some of those cases were new patterns, for example where we PreMainPost a main loop.

I hope that this fix here at least reduces the frequency of failures significantly.

Testing

I added 2 regression tests. Our fuzzer seems to spit out examples regularly, so that gives us extra coverage.

Tested up to tier5 and stress testing. Performance testing running...

Future Work

We should implement MaxL/MinL in the backend. We should also use them during parsing. This would also allow to SuperWord the instruction, on the platforms that support it.

Should we add such an assert during IGVN? I think after IGVN, we should never have a MultiBranchNode that does not have the required number of outputs, right? We could add it to VerifyIterativeGVN.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13269/head:pull/13269
$ git checkout pull/13269

Update a local copy of the PR:
$ git checkout pull/13269
$ git pull https://git.openjdk.org/jdk.git pull/13269/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13269

View PR using the GUI difftool:
$ git pr show -t 13269

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13269.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 31, 2023

👋 Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 31, 2023

@eme64 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 31, 2023
@eme64 eme64 changed the title 8303466: C2: failed: malformed control flow. Introducing SubINoUnderflowNode 8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL Apr 4, 2023
@eme64 eme64 marked this pull request as ready for review April 5, 2023 10:57
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 5, 2023
@mlbridge
Copy link

mlbridge bot commented Apr 5, 2023

@rwestrel
Copy link
Contributor

rwestrel commented Apr 6, 2023

That looks reasonable to me.
Is the PhaseIdealLoop::adjust_limit() change required or is it some cleanup?
Have you run performance testing to be safe?

@eme64
Copy link
Contributor Author

eme64 commented Apr 6, 2023

@rwestrel

Is the PhaseIdealLoop::adjust_limit() change required or is it some cleanup?

At first I only fixed PhaseIdealLoop::do_unroll. That fixed my regression test examles. But, once that fix was introduced, another test failed, and that was because now the type of the CMove in PhaseIdealLoop::adjust_limit was not precise enough. So it seemed the best solution to fix them together, since they both have issues with CMove, and change the limit.

Have you run performance testing to be safe?

I'll do that, will report back.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk
Copy link

openjdk bot commented Apr 6, 2023

@eme64 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8303466: C2: failed: malformed control flow. Limit type made precise with MaxL/MinL

Reviewed-by: roland, kvn, chagedorn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 6, 2023
@eme64
Copy link
Contributor Author

eme64 commented Apr 10, 2023

@rwestrel thanks for the review!
The performance testing looks ok, I cannot see a significant runtime change.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have several unrolls we will have chain of MaxL/MinL nodes. Will the chain be folded by IGVN?

Comment on lines 2295 to 2296
Node* stride_l = new ConvI2LNode(stride);
register_new_node(stride_l, get_ctrl(limit));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make Long constant (_igvn.longcon(stride)) instead since stride is constant? Similar to underflow_clamp_l. My concern is you set control to constant which is not Root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will replace it with constant. Yes, I had the ctrl wrong, it should be root.

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me and indeed much cleaner! I only have some minor code style comments.

There is a bit of an overhead here: We use longs even though we only want to have int values. But I think we should prefer a clean implementation here, with correct type computation. The performance impact is probably non-existent on 64-bit machines anyway.

I agree with that.

I have already recently fixed a bug around this CMoveI. 5a4945c I would now like to have a more satisfactory fix, that properly propagates the types.

I've had a feeling that we are revisiting this again at some point.

Should we add such an assert during IGVN? I think after IGVN, we should never have a MultiBranchNode that does not have the required number of outputs, right? We could add it to VerifyIterativeGVN.

That would be a good idea to investigate in an RFE - maybe also for other nodes to assert on well-known input/output patterns. We've had such problems before after CCP with If nodes with only one out projection.

src/hotspot/share/opto/addnode.cpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.cpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.hpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.hpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.hpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.hpp Outdated Show resolved Hide resolved
@eme64
Copy link
Contributor Author

eme64 commented Apr 11, 2023

If we have several unrolls we will have chain of MaxL/MinL nodes. Will the chain be folded by IGVN?

@vnkozlov I fear it would not fold currently. The CMove would not fold before either, but with repeated unrolling, the CMove was reused, and so there was only ever a single CMove (unless some RC got in between).

I think in many cases, the type does not underflow, and the MaxL/MinL can be removed completely.
However, if that does not work, I think it now also fails to remove the repeated ConvI2L / ConvL2I. We would have to add more IGVN optimizations to fold things more.

I think the performance impact is now insignificant, if it does not fold. Because the limits are only calculated once per loop. We can still improve the folding, if you want. I can also do that in a follow-up RFE, and try to add some IR tests that target type-limit underflow, and count the MaxL/MinL nodes.

TLDR: @vnkozlov is it ok if I investingate & test MaxL/MinL and ConvI2L / ConvL2I folding in a follow-up RFE?

Thanks for the suggestions!

Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
@eme64
Copy link
Contributor Author

eme64 commented Apr 11, 2023

I added the idea about verifying out-proj of MultiBranch to this RFE JDK-8298951.

Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@vnkozlov
Copy link
Contributor

I think in many cases, the type does not underflow, and the MaxL/MinL can be removed completely.

I would disagree. In many cases limit is variable. At least I want you to check generated code for such case and add it to your test. Even if additional latency of several CMov (to which you convert Max/Min nodes) vs one node is negligible the bigger size of generated code may affect inlining.

TLDR: @vnkozlov is it ok if I investingate & test MaxL/MinL and ConvI2L / ConvL2I folding in a follow-up RFE?

Depending on result of investigation of generated code.

@jaskarth
Copy link
Member

However, if that does not work, I think it now also fails to remove the repeated ConvI2L / ConvL2I. We would have to add more IGVN optimizations to fold things more.

I think you're running into an issue where some nodes created by counted loop expansion aren't properly passed onto the IGVN worklist- I found the same thing while trying to investigate some strange code generation from small loops. If you make that follow-up RFE I would be happy to attach the cases that I found as well.

@eme64
Copy link
Contributor Author

eme64 commented Apr 13, 2023

@jaskarth please send me those cases, if it is many then maybe better via email. I'm generally working on doing verification of that kind, see JDK-8298951.

@jaskarth
Copy link
Member

I only had a handful of cases so I've attached them in this gist.
I found these a few weeks ago, but looking back I think I have misremembered the problem chain. Examples 3-5 may be a different bug as they deal with ConvL2I->ConvI2L chains instead of ConvI2L->ConvL2I chains as you are seeing, as the latter has an Identity() transform defined while it seems the former does not- I apologize for the noise if the issues are unrelated. Examples 1 and 2 could perhaps still be useful in diagnosing the issue, as they describe cases where ideal transforms that do exist aren't taken. I tried looking into that bug myself a while ago but didn't get far.

JDK-8298951 is exciting, it'll make reasoning about middle-end optimizations a lot easier :)

@eme64
Copy link
Contributor Author

eme64 commented Apr 17, 2023

@jaskarth I think your issues are not related, though I can look at them again once I get back to IGVN verification.

@vnkozlov I thought about it a bit more. With a simple example like Test::test, I get unrolling 2048, so we unroll 10-ish times. I see accordingly many ConvI2L, SubL, MaxL, ConvL2I nodes. Now, I can collapse the ConvL2I -> ConvI2L parts (the types guarantee that we never leave the int range, so conversion never clips anything), so it is only a chain of SubL -> MaxL nodes.

One idea would be to fold SubL -> MaxL -> SubL -> MaxL down to a single subtraction and maximum. Maybe that could be done, we just have to be very careful with the types. I'll give it a try, and it seems to work on a basic example.

The example:

./java -Xcomp -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:+TraceLoopOpts -XX:+PrintIdeal Test.java

public class Test {
    static int START = 0;
    static int FINISH = 512;
    static int RANGE = 512;

    public static void main(String args[]) {
        byte[] data = new byte[RANGE];
        test(data);
    }

    public static void test(byte[] data) {
        for (int j = START; j < FINISH; j++) {
            data[j] = (byte)(data[j] * 11);
        }
    }
}

What to do with this?

  • Performance testing did not show any difference. But maybe we do not trust that enough.
  • Before and now, the chain of unrolling-limits can be interrupted by range-check limits. We probably will just accept that this means that not all of the unrolling-limits can be folded together.

I have an alternative proposal:
Leave the MaxL/MinL node for the range-check limits, there are usually not that many RC-limits, and up to now we used a CMove node per such limit already anyway.

But for the unroll-limits, we introduce a SubINoUnderflow node, which does a safe (no-underflow) subtraction limit-stride.
These nodes can be folded together relatively easily.
I already had such an implementation before, and reverted it f5fcf60
I had already discussed this idea with @chhagedorn a while ago. But then decided against it once I also saw that I wanted a unified solution for RC-limits and unroll-limits. The downside is that it takes a new special node.

With this SubINoUnderflowNode idea, we would have a constant number of nodes added per RC-limit. And then for all the unroll limit adjustments together, we would only have one SubINoUnderflow node, as they would all collapse into one. At macro expansion, I can then expand it into a single CMove node.

But I think I can do the same with just collapsing SubL -> MaxL -> SubL -> MaxL to SubL -> MaxL. That may be cleaner.

@vnkozlov What do you think? Do you have any other ideas? What solution would you prefer?

@openjdk
Copy link

openjdk bot commented Apr 17, 2023

⚠️ @eme64 This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@vnkozlov
Copy link
Contributor

But I think I can do the same with just collapsing SubL -> MaxL -> SubL -> MaxL to SubL -> MaxL. That may be cleaner.

I prefer this if you can do it. So you have sequence (after folding Conv nodes)

MaxL(SubL(MaxL(SubL(limit, stride), min_int), stride*2), min_int);

Yes, I think it can be collapsed to:

MaxL(SubL(limit, stride*3), min_int);

If in any point of chain limit become min_int it will stay min_int (even if stride is max_int) because you use Long arithmetic and we have "small" limit on unrolling (16?).
If it does not hit min_int the result it similar to SubL(SubL((limit, stride), stride*2).
So you just need to correctly collect stride*N values.

@eme64
Copy link
Contributor Author

eme64 commented Apr 21, 2023

@vnkozlov I added some more IGVN optimizations that help to fold the SubL -> MaxL chains.

  1. fold_subI_no_underflow_pattern in MaxLNode::Ideal. Collapses SubL -> MaxL->SubL -> MaxL to a simple SubL -> MaxL.

  2. ConvI2LNode::Identity can now convert I2L(L2I(x)) => x. We need this, so that the Casts are not in the way for the first optimization.

I added verification, that these optimizations are really taken:

@Test
@Warmup(0)
@IR(counts = {IRNode.MAX_L, "> 0", IRNode.MAX_L, "<= 2"},
phase = CompilePhase.PHASEIDEALLOOP_ITERATIONS)
public static void test1() {
for (int j = START; j < FINISH; j++) {
data1[j] = (byte)(data1[j] * 11);
}
}
@Test
@Warmup(0)
@IR(counts = {IRNode.MIN_L, "> 0", IRNode.MIN_L, "<= 2"},
phase = CompilePhase.PHASEIDEALLOOP_ITERATIONS)
public static void test2() {
for (int j = FINISH-1; j >= START; j--) {
data2[j] = (byte)(data2[j] * 11);
}
}

Is this now ok?

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just minor comments.

// Collapse the "addition with overflow-protection" pattern, and the symetrical
// "subtraction with underflow-protection" pattern. These are created during the
// unrolling, when we have to adjust the limit by subtracting the stride, but want
// to protect agains underflow: MaxL(SubL(limit, stride), min_jint).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add note that SubL node is replaced with AddL and reversed stride ( I assume that is what happened here).

// | |
// Max/MinL (n)
//
Node* fold_subI_no_underflow_pattern(Node* n, PhaseGVN* phase) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this method and it comment before MaxLNode::add_ring so all MaxL and MinL method stay together.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good to me.

src/hotspot/share/opto/addnode.cpp Outdated Show resolved Hide resolved
src/hotspot/share/opto/addnode.cpp Outdated Show resolved Hide resolved
Co-authored-by: Tobias Hartmann <tobias.hartmann@oracle.com>
Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good!

src/hotspot/share/opto/addnode.hpp Outdated Show resolved Hide resolved
eme64 and others added 2 commits April 24, 2023 11:47
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
@eme64
Copy link
Contributor Author

eme64 commented Apr 26, 2023

Thanks @chhagedorn @TobiHartmann @vnkozlov @rwestrel for the reviews and suggestions!
The assert will now still fail with the fuzzer occasionally because of the assertion / skeleton predicate bug that @chhagedorn is already working on for a while. But I hope this fix will drastically reduce the rate of fuzzer failures with this assert.
/integrate

@openjdk
Copy link

openjdk bot commented Apr 26, 2023

Going to push as commit cc894d8.
Since your change was applied there have been 46 commits pushed to the master branch:

  • ed1ebd2: 8306652: Open source AWT MenuItem related tests
  • f3e8bd1: 8306755: Open source few Swing JComponent and AbstractButton tests
  • 1c1a73f: 8302908: RISC-V: Support masked vector arithmetic instructions for Vector API
  • adf62fe: 8304918: Remove unused decl field from AnnotatedType implementations
  • 00b1eac: 8306031: Update IANA Language Subtag Registry to Version 2023-04-13
  • 88d9ebf: 8306752: Open source several container and component AWT tests
  • 1c2dadc: 8306683: Open source several clipboard and color AWT tests
  • b372f28: 8306753: Open source several container AWT tests
  • e3ccaa6: 8306623: (bf) CharBuffer::allocate throws unexpected exception type with some CharSequences
  • d819deb: 8304423: Refactor FdLibm.java
  • ... and 36 more: https://git.openjdk.org/jdk/compare/62acc882bff32da287ac3ea22ebe43b90a724489...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 26, 2023
@openjdk openjdk bot closed this Apr 26, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 26, 2023
@openjdk
Copy link

openjdk bot commented Apr 26, 2023

@eme64 Pushed as commit cc894d8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants