Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.
/ jdk19 Public archive

8289954: C2: Assert failed in PhaseCFG::verify() after JDK-8183390 #130

Closed
wants to merge 1 commit into from
Closed

Conversation

pfustc
Copy link
Member

@pfustc pfustc commented Jul 11, 2022

Fuzzer tests report an assertion failure issue in C2 global code motion
phase. Git bisection shows the problem starts after our fix of post loop
vectorization (JDK-8183390). After some narrowing down work, we find it
is caused by below change in that patch.

@@ -422,14 +404,7 @@
       cl->mark_passed_slp();
     }
     cl->mark_was_slp();
-    if (cl->is_main_loop()) {
-      cl->set_slp_max_unroll(local_loop_unroll_factor);
-    } else if (post_loop_allowed) {
-      if (!small_basic_type) {
-        // avoid replication context for small basic types in programmable masked loops
-        cl->set_slp_max_unroll(local_loop_unroll_factor);
-      }
-    }
+    cl->set_slp_max_unroll(local_loop_unroll_factor);
   }
 }

This change is in function SuperWord::unrolling_analysis(). AFAIK, it
helps find a loop's max unroll count via some analysis. In the original
code, we have loop type checks and the slp max unroll value is set for
only some types of loops. But in JDK-8183390, the check was removed by
mistake. In my current understanding, the slp max unroll value applies
to slp candidate loops only - either main loops or RCE'd post loops -
so that check shouldn't be removed. After restoring it we don't see the
assertion failure any more.

The new jtreg created in this patch can reproduce the failed assertion,
which checks def_block->dominates(block) - the domination relationship
of two blocks. But in the case, I found the blocks are in an unreachable
inner loop, which I think ought to be optimized away in some previous C2
phases. As I'm not quite familiar with the C2's global code motion, so
far I still don't understand how slp max unroll count eventually causes
that problem. This patch just restores the if condition which I removed
incorrectly in JDK-8183390. But I still suspect that there is another
hidden bug exists in C2. I would be glad if any reviewers can give me
some guidance or suggestions.

Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8289954: C2: Assert failed in PhaseCFG::verify() after JDK-8183390

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/130/head:pull/130
$ git checkout pull/130

Update a local copy of the PR:
$ git checkout pull/130
$ git pull https://git.openjdk.org/jdk19 pull/130/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 130

View PR using the GUI difftool:
$ git pr show -t 130

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/130.diff

Fuzzer tests report an assertion failure issue in C2 global code motion
phase. Git bisection shows the problem starts after our fix of post loop
vectorization (JDK-8183390). After some narrowing down work, we find it
is caused by below change in that patch.

@@ -422,14 +404,7 @@
       cl->mark_passed_slp();
     }
     cl->mark_was_slp();
-    if (cl->is_main_loop()) {
-      cl->set_slp_max_unroll(local_loop_unroll_factor);
-    } else if (post_loop_allowed) {
-      if (!small_basic_type) {
-        // avoid replication context for small basic types in programmable masked loops
-        cl->set_slp_max_unroll(local_loop_unroll_factor);
-      }
-    }
+    cl->set_slp_max_unroll(local_loop_unroll_factor);
   }
 }

This change is in function `SuperWord::unrolling_analysis()`. AFAIK, it
helps find a loop's max unroll count via some analysis. In the original
code, we have loop type checks and the slp max unroll value is set for
only some types of loops. But in JDK-8183390, the check was removed by
mistake. In my current understanding, the slp max unroll value applies
to slp candidate loops only - either main loops or RCE'd post loops -
so that check shouldn't be removed. After restoring it we don't see the
assertion failure any more.

The new jtreg created in this patch can reproduce the failed assertion,
which checks `def_block->dominates(block)` - the domination relationship
of two blocks. But in the case, I found the blocks are in an unreachable
inner loop, which I think ought to be optimized away in some previous C2
phases. As I'm not quite familiar with the C2's global code motion, so
far I still don't understand how slp max unroll count eventually causes
that problem. This patch just restores the if condition which I removed
incorrectly in JDK-8183390. But I still suspect that there is another
hidden bug exists in C2. I would be glad if any reviewers can give me
some guidance or suggestions.

Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 11, 2022

👋 Welcome back pli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 11, 2022
@openjdk
Copy link

openjdk bot commented Jul 11, 2022

@pfustc The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jul 11, 2022
@mlbridge
Copy link

mlbridge bot commented Jul 11, 2022

Webrevs

Copy link

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I will test it.

There could be code which use slp_max_unroll value as indicator of main loop.
Or setting slp_max_unroll to pre-/post-loop exposed a bug.
I suggest to go with your fix for JDK 19 and may be investigate the issue in JDK 20.

Copy link

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing passed.

@openjdk
Copy link

openjdk bot commented Jul 11, 2022

@pfustc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8289954: C2: Assert failed in PhaseCFG::verify() after JDK-8183390

Reviewed-by: kvn, thartmann, roland

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • 4f3f74c: 8289127: Apache Lucene triggers: DEBUG MESSAGE: duplicated predicate failed which is impossible
  • 15d3329: 8281969: Bad result for the snippet @link tag if substring/regex consists of whitespace
  • c8e0315: 8290250: Shenandoah: disable Loom for iu mode
  • fb27ddc: 8290252: Add TEST.properties to java/nio/channels/FileChannel and move tests out of largeMemory sub-dir
  • fd89ab8: 8288112: C2: Error: ShouldNotReachHere() in Type::typerr()
  • 2bf6285: 8290209: jcup.md missing additional text
  • 73b83e0: 8290207: Missing notice in dom.md
  • ce36f6e: 8290203: ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java on linux-all
  • fff7f35: 8290201: ProblemList com/sun/jdi/InvokeHangTest.java on macosx-x64 in vthread mode
  • 128c6c6: 8290095: java/nio/channels/FileChannel/largeMemory/LargeGatheringWrite.java timed out
  • ... and 12 more: https://git.openjdk.org/jdk19/compare/b542bcba57a1ac79b9b7182dbf984b447754fafc...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 11, 2022
@pfustc
Copy link
Member Author

pfustc commented Jul 12, 2022

@vnkozlov Thanks for looking at this. I think a 2nd review is required, right?

@vnkozlov
Copy link

@vnkozlov Thanks for looking at this. I think a 2nd review is required, right?

yes

@pfustc
Copy link
Member Author

pfustc commented Jul 14, 2022

@dean-long Do you have any comments or suggestions on this? The failure was reported from your fuzzer test.

@dean-long
Copy link
Member

@pfustc Sorry, I'm not enough of an expert on SuperWord to review the fix. The test was generated automatically by Java Fuzzer.

@pfustc
Copy link
Member Author

pfustc commented Jul 14, 2022

@pfustc Sorry, I'm not enough of an expert on SuperWord to review the fix. The test was generated automatically by Java Fuzzer.

May I ask how do you generate and run the Fuzzer tests? Is there any instructions we can follow? Recently we see a couple of SuperWord issues reported by corner cases which are generated by the Fuzzer.

@dean-long
Copy link
Member

@pfustc take a look at https://github.com/AzulSystems/JavaFuzzer.

@pfustc
Copy link
Member Author

pfustc commented Jul 15, 2022

@pfustc take a look at https://github.com/AzulSystems/JavaFuzzer.

Thanks Dean. I will investigate that project.

@pfustc
Copy link
Member Author

pfustc commented Jul 15, 2022

@rwestrel @TobiHartmann Would you like to review this fix for jdk19? The RDP1 will end in one week.

@rwestrel
Copy link
Contributor

To have this fixed in jdk 19, you need to open a PR againsts jdk 19.
Isn't SuperWord::unrolling_analysis() only called for main and rce post loops anyway? What loop type doesn't the extra check exclude in the case of your test?

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@TobiHartmann
Copy link
Member

To have this fixed in jdk 19, you need to open a PR againsts jdk 19.

But this is a PR against JDK 19, right?

@rwestrel
Copy link
Contributor

To have this fixed in jdk 19, you need to open a PR againsts jdk 19.

Sorry I missed this was indeed against jdk 19.

@pfustc
Copy link
Member Author

pfustc commented Jul 18, 2022

To have this fixed in jdk 19, you need to open a PR againsts jdk 19.

Yeah, this is indeed against jdk 19.

Isn't SuperWord::unrolling_analysis() only called for main and rce post loops anyway? What loop type doesn't the extra check exclude in the case of your test?

It's a normal loop. I find the check is called from IdealLoopTree::policy_unroll_slp_analysis(). And that function is called from IdealLoopTree::policy_unroll() which accepts normal loops. I see the assert in IdealLoopTree::policy_unroll() says

assert(cl->is_normal_loop() || cl->is_main_loop(), "");

We probably need another fix to avoid this. But to reduce risks, I still propose we just restore the incorrectly updated code in this PR for jdk 19 and do complete fix in jdk 20.

@rwestrel
Copy link
Contributor

We probably need another fix to avoid this. But to reduce risks, I still propose we just restore the incorrectly updated code in this PR for jdk 19 and do complete fix in jdk 20.

That's reasonable.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@pfustc
Copy link
Member Author

pfustc commented Jul 18, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Jul 18, 2022

Going to push as commit 2677dd6.
Since your change was applied there have been 22 commits pushed to the master branch:

  • 4f3f74c: 8289127: Apache Lucene triggers: DEBUG MESSAGE: duplicated predicate failed which is impossible
  • 15d3329: 8281969: Bad result for the snippet @link tag if substring/regex consists of whitespace
  • c8e0315: 8290250: Shenandoah: disable Loom for iu mode
  • fb27ddc: 8290252: Add TEST.properties to java/nio/channels/FileChannel and move tests out of largeMemory sub-dir
  • fd89ab8: 8288112: C2: Error: ShouldNotReachHere() in Type::typerr()
  • 2bf6285: 8290209: jcup.md missing additional text
  • 73b83e0: 8290207: Missing notice in dom.md
  • ce36f6e: 8290203: ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java on linux-all
  • fff7f35: 8290201: ProblemList com/sun/jdi/InvokeHangTest.java on macosx-x64 in vthread mode
  • 128c6c6: 8290095: java/nio/channels/FileChannel/largeMemory/LargeGatheringWrite.java timed out
  • ... and 12 more: https://git.openjdk.org/jdk19/compare/b542bcba57a1ac79b9b7182dbf984b447754fafc...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 18, 2022
@openjdk openjdk bot closed this Jul 18, 2022
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 18, 2022
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jul 18, 2022
@openjdk
Copy link

openjdk bot commented Jul 18, 2022

@pfustc Pushed as commit 2677dd6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@pfustc pfustc deleted the 8289954 branch July 18, 2022 12:18
@RealFYang
Copy link
Member

Sorry, but seems that the same assertion failure is still happening when running the newly added test case with fastdebug build on linux-riscv64 platform. And I have attached the hs_err and reply files on the JBS issue. Please take another look.

@pfustc
Copy link
Member Author

pfustc commented Jul 26, 2022

Thanks @RealFYang for the information. I'm still investigating this in jdk 20 but so far I haven't got a clear clue. Just find if I ban SuperWord::unrolling_analysis() for normal loops, specifically with below change, that assertion fails again.

diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp
index ef66840628f..997fae367b1 100644
--- a/src/hotspot/share/opto/superword.cpp
+++ b/src/hotspot/share/opto/superword.cpp
@@ -217,6 +217,7 @@ void SuperWord::unrolling_analysis(int &local_loop_unroll_factor) {
   int *ignored_loop_nodes = NEW_RESOURCE_ARRAY(int, ignored_size);
   Node_Stack nstack((int)ignored_size);
   CountedLoopNode *cl = lpt()->_head->as_CountedLoop();
+  if (cl->is_normal_loop()) return;
   Node *cl_exit = cl->loopexit_or_null();
   int rpo_idx = _post_block.length();

And I saw JDK-8275330 by Roland fixed the same assertion failure before. And the test case which causes the failure looks similar with this one - there is an inner dead loop which fails to be optimized away by C2.

Hi @rwestrel , may I ask if you have any ideas or hints about this?

@TobiHartmann
Copy link
Member

TobiHartmann commented Jul 26, 2022

@pfustc, @RealFYang Please file a new bug for the remaining issue. Thanks!

@pfustc
Copy link
Member Author

pfustc commented Jul 26, 2022

@RealFYang, I have created a new JBS: https://bugs.openjdk.org/browse/JDK-8291025 and attached your hs_err_* file. Feel free to edit it if you have something to add.

BTW: It may be helpful if you could provide the output with VM option -XX:+TraceSuperWordLoopUnrollAnalysis.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants