-
Notifications
You must be signed in to change notification settings - Fork 59
8289954: C2: Assert failed in PhaseCFG::verify() after JDK-8183390 #130
Conversation
Fuzzer tests report an assertion failure issue in C2 global code motion phase. Git bisection shows the problem starts after our fix of post loop vectorization (JDK-8183390). After some narrowing down work, we find it is caused by below change in that patch. @@ -422,14 +404,7 @@ cl->mark_passed_slp(); } cl->mark_was_slp(); - if (cl->is_main_loop()) { - cl->set_slp_max_unroll(local_loop_unroll_factor); - } else if (post_loop_allowed) { - if (!small_basic_type) { - // avoid replication context for small basic types in programmable masked loops - cl->set_slp_max_unroll(local_loop_unroll_factor); - } - } + cl->set_slp_max_unroll(local_loop_unroll_factor); } } This change is in function `SuperWord::unrolling_analysis()`. AFAIK, it helps find a loop's max unroll count via some analysis. In the original code, we have loop type checks and the slp max unroll value is set for only some types of loops. But in JDK-8183390, the check was removed by mistake. In my current understanding, the slp max unroll value applies to slp candidate loops only - either main loops or RCE'd post loops - so that check shouldn't be removed. After restoring it we don't see the assertion failure any more. The new jtreg created in this patch can reproduce the failed assertion, which checks `def_block->dominates(block)` - the domination relationship of two blocks. But in the case, I found the blocks are in an unreachable inner loop, which I think ought to be optimized away in some previous C2 phases. As I'm not quite familiar with the C2's global code motion, so far I still don't understand how slp max unroll count eventually causes that problem. This patch just restores the if condition which I removed incorrectly in JDK-8183390. But I still suspect that there is another hidden bug exists in C2. I would be glad if any reviewers can give me some guidance or suggestions. Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
👋 Welcome back pli! A progress list of the required criteria for merging this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I will test it.
There could be code which use slp_max_unroll
value as indicator of main
loop.
Or setting slp_max_unroll
to pre-/post-loop exposed a bug.
I suggest to go with your fix for JDK 19 and may be investigate the issue in JDK 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing passed.
@pfustc This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 22 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@vnkozlov Thanks for looking at this. I think a 2nd review is required, right? |
yes |
@dean-long Do you have any comments or suggestions on this? The failure was reported from your fuzzer test. |
@pfustc Sorry, I'm not enough of an expert on SuperWord to review the fix. The test was generated automatically by Java Fuzzer. |
May I ask how do you generate and run the Fuzzer tests? Is there any instructions we can follow? Recently we see a couple of SuperWord issues reported by corner cases which are generated by the Fuzzer. |
@pfustc take a look at https://github.com/AzulSystems/JavaFuzzer. |
Thanks Dean. I will investigate that project. |
@rwestrel @TobiHartmann Would you like to review this fix for jdk19? The RDP1 will end in one week. |
To have this fixed in jdk 19, you need to open a PR againsts jdk 19. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
But this is a PR against JDK 19, right? |
Sorry I missed this was indeed against jdk 19. |
Yeah, this is indeed against jdk 19.
It's a normal loop. I find the check is called from
We probably need another fix to avoid this. But to reduce risks, I still propose we just restore the incorrectly updated code in this PR for jdk 19 and do complete fix in jdk 20. |
That's reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
/integrate |
Going to push as commit 2677dd6.
Your commit was automatically rebased without conflicts. |
Sorry, but seems that the same assertion failure is still happening when running the newly added test case with fastdebug build on linux-riscv64 platform. And I have attached the hs_err and reply files on the JBS issue. Please take another look. |
Thanks @RealFYang for the information. I'm still investigating this in jdk 20 but so far I haven't got a clear clue. Just find if I ban
And I saw JDK-8275330 by Roland fixed the same assertion failure before. And the test case which causes the failure looks similar with this one - there is an inner dead loop which fails to be optimized away by C2. Hi @rwestrel , may I ask if you have any ideas or hints about this? |
@pfustc, @RealFYang Please file a new bug for the remaining issue. Thanks! |
@RealFYang, I have created a new JBS: https://bugs.openjdk.org/browse/JDK-8291025 and attached your hs_err_* file. Feel free to edit it if you have something to add. BTW: It may be helpful if you could provide the output with VM option |
Fuzzer tests report an assertion failure issue in C2 global code motion
phase. Git bisection shows the problem starts after our fix of post loop
vectorization (JDK-8183390). After some narrowing down work, we find it
is caused by below change in that patch.
This change is in function
SuperWord::unrolling_analysis()
. AFAIK, ithelps find a loop's max unroll count via some analysis. In the original
code, we have loop type checks and the slp max unroll value is set for
only some types of loops. But in JDK-8183390, the check was removed by
mistake. In my current understanding, the slp max unroll value applies
to slp candidate loops only - either main loops or RCE'd post loops -
so that check shouldn't be removed. After restoring it we don't see the
assertion failure any more.
The new jtreg created in this patch can reproduce the failed assertion,
which checks
def_block->dominates(block)
- the domination relationshipof two blocks. But in the case, I found the blocks are in an unreachable
inner loop, which I think ought to be optimized away in some previous C2
phases. As I'm not quite familiar with the C2's global code motion, so
far I still don't understand how slp max unroll count eventually causes
that problem. This patch just restores the if condition which I removed
incorrectly in JDK-8183390. But I still suspect that there is another
hidden bug exists in C2. I would be glad if any reviewers can give me
some guidance or suggestions.
Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk19 pull/130/head:pull/130
$ git checkout pull/130
Update a local copy of the PR:
$ git checkout pull/130
$ git pull https://git.openjdk.org/jdk19 pull/130/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 130
View PR using the GUI difftool:
$ git pr show -t 130
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk19/pull/130.diff