-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8288204: GVN Crash: assert() failed: correct memory chain #9777
Conversation
👋 Welcome back yyang! A progress list of the required criteria for merging this PR into |
@kelthuzadx The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
With your fix, correctness still depends on the order in which nodes are processed by IGVN, right? Wouldn't this still reproduce with |
Hi @TobiHartmann , this patch works well with StressIGVN. There is an explicit dependency path jdk/src/hotspot/share/opto/memnode.cpp Lines 322 to 327 in b17a745
i.e. load node delayed its idealization until its memory input is processed. This means, MergeMem#1046 and its related node were always processed before processing load node. That's why we saw load->in(Addr) was changed from 969 to 473.
|
I just checked and I can still reproduce the issue with your fix. Simply run: So the root cause must be something else. |
@kelthuzadx This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Comment to keep this open. @kelthuzadx, let me know if you need help with reproducing this. |
Yes, I'm able to reproduce this, I'm working on other stuff, I will be back at the end of this month |
You are right, in the current fix, the correctness of LoadB#971(Ctrl Mem Addr) depends on idealization orders, I need more investigating. I saw a similar crash scenario at https://bugs.openjdk.org/browse/JDK-8233164
In this case, we are not able to find precise _src_type(after fix) from MergeMem. The full story begins at the creation of LoadBNode in array_copy_forward:
We can not find alias index of 969 from current 338#ArrayCopy->in(Mem), so generated LoadB has base memory, i.e. 242#Proj of 337. Later, ArrayCopyNode::finish_transform replaces 242#Proj with 337#ArrayCopy->in(Mem)(i.e. 1046, originated from 466) when idealizing 337#ArrayCopy. Now we see the following LoadB at the first IGVN iteration:
The rest of the story is as mentioned in PR content: we lookup an alias index of 969 from 1046#MergeMem and got 1109 whose type is
In short, the problem is, once ArrayCopyNode is expanded(idealized), we are not able to know precise _src_type/_dest_type from LoadNode. I'm not an expert on this, my two thoughts at the moment are
@TobiHartmann Do you have any ideas or comments? Thanks. |
|
EA may incorrectly processing LoadB#971 when it splits unique memory slice for allocation instance. |
@vnkozlov LoadB#971 is not processed by EA, it was expanded by ArrayCopyNode#338(which is processed by EA) in arraycopy_forward:
jdk/src/hotspot/share/opto/arraycopynode.cpp Lines 383 to 408 in 78454b6
mem is Proj#242, mm(
Then, Proj#242 is replaced by MergeMem#466 when idealizing ArrayCopy#337
At the first idealization of LoadB#971, we lookup the alias type of AddP#969(
Later, AddP#969 is replaced by 473, it has alias type
Now Phi#1109 and AddP#473 have mismatched types, we finally hit the assertion. |
Thank you for providing additional information. I need to look on this more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that the chain of casts in MemNode::optimize_memory_chain()
can't convert bottom array type byte[int:>=0]:exact+any*
to instance array type byte[int:8]:NotNull:exact+any *,iid=177
. Mostly because it does not adjust array's parameters. This cast chain simply does not work for arrays.
You removed the cast chain. But you loose check for case when types are really not compatible. PhiNode::split_out_instance()
does not take into account the current Phi's type when it creates new Phi based on Address type: PhiNode *nphi = slice_memory(at);
Instead I think you need additional casts specifically for arrays (following code is simplified example) before comparing types:
if (t_opp->isa_aryptr() && t->isa_aryptr() && (t_opp->elem() == t->elem())) {
t->is_aryptr()->cast_to_size(t_opp->isa_aryptr()->size())->cast_to_stable(t_oop->is_stable());
}
Make sense. But I guess we should cast more beyond the array parameter, i.e. the offset should be cast as well? Because we have a Phi#1109(
|
@kelthuzadx Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the fix looks reasonable to me. I added some comments.
Please merge with master, currently the build fails with:
src/hotspot/share/opto/type.cpp:4920:16: error: no declaration matches 'const TypePtr* TypeAryPtr::with_offset(int) const'
[2022-12-08T10:34:55,711Z] 4920 | const TypePtr *TypeAryPtr::with_offset(int offset) const {
[2022-12-08T10:34:55,711Z] | ^~~~~~~~~~
[2022-12-08T10:34:55,711Z] /src/hotspot/share/opto/type.cpp:4892:19: note: candidate is: 'virtual const TypeAryPtr* TypeAryPtr::with_offset(intptr_t) const'
[2022-12-08T10:34:55,711Z] 4892 | const TypeAryPtr* TypeAryPtr::with_offset(intptr_t offset) const {
* @bug 8288204 | ||
* @summary GVN Crash: assert() failed: correct memory chain | ||
* | ||
* @run main/othervm -XX:CompileCommand=compileonly,compiler.c2.TestGVNCrash::test compiler.c2.TestGVNCrash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not reproduce the issue with latest JDK 20, please add -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN
and @key stress randomness
.
src/hotspot/share/opto/memnode.cpp
Outdated
mem_t = mem_t->is_aryptr() | ||
->cast_to_size(t_oop->is_aryptr()->size()) | ||
->with_offset(t_oop->is_aryptr()->offset()) | ||
->is_aryptr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need cast_to_stable
as well here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this even if it does not appear in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes.
Several tests (for example, compiler/arraycopy/TestArrayCopyAsLoadsStores.java) are now failing with:
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (workspace/open/src/hotspot/share/opto/phaseX.cpp:843), pid=3983761, tid=3983777
# assert(i->_idx >= k->_idx) failed: Idealize should return new nodes, use Identity to return old nodes
#
# JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2022-12-23-0641545.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2022-12-23-0641545.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x179ad0c] PhaseGVN::transform_no_reclaim(Node*)+0xec
Current CompileTask:
C2: 2222 478 b 4 compiler.arraycopy.TestArrayCopyAsLoadsStores::m14 (9 bytes)
Stack: [0x00007f86dc7f5000,0x00007f86dc8f6000], sp=0x00007f86dc8f20e0, free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x179ad0c] PhaseGVN::transform_no_reclaim(Node*)+0xec (phaseX.cpp:843)
V [libjvm.so+0x141be0f] LibraryCallKit::inline_arraycopy()+0x71f (library_call.cpp:5289)
V [libjvm.so+0x1438712] LibraryIntrinsic::generate(JVMState*)+0x302 (library_call.cpp:115)
V [libjvm.so+0xcbfbe9] Parse::do_call()+0x389 (doCall.cpp:662)
V [libjvm.so+0x176c5f8] Parse::do_one_bytecode()+0x638 (parse2.cpp:2704)
V [libjvm.so+0x175a734] Parse::do_one_block()+0x844 (parse1.cpp:1555)
V [libjvm.so+0x175b697] Parse::do_all_blocks()+0x137 (parse1.cpp:707)
V [libjvm.so+0x176021d] Parse::Parse(JVMState*, ciMethod*, float)+0xb3d (parse1.cpp:614)
V [libjvm.so+0x918c40] ParseGenerator::generate(JVMState*)+0x110 (callGenerator.cpp:99)
V [libjvm.so+0xb0275d] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x168d (compile.cpp:760)
V [libjvm.so+0x916857] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x4e7 (c2compiler.cpp:113)
V [libjvm.so+0xb0fa2c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa7c (compileBroker.cpp:2237)
V [libjvm.so+0xb107e8] CompileBroker::compiler_thread_loop()+0x5d8 (compileBroker.cpp:1916)
V [libjvm.so+0x107d066] JavaThread::thread_main_inner()+0x206 (javaThread.cpp:709)
V [libjvm.so+0x1a723c0] Thread::call_run()+0x100 (thread.cpp:224)
V [libjvm.so+0x1712553] thread_native_entry(Thread*)+0x103 (os_linux.cpp:739)
Commenting out transformation in array_copy_forward works now, all test under test/hotspot/jtreg/compiler passed except tests that always failed. PhaseGVN::transform_no_reclaim still crashes when reverting this patch and only adding mm->transform in array_copy_forward, so at least this is not related to this fix. Though, I don't see why it causes the crash at first glance.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Tests now pass (still running). Would still be good to know why the additional transform call is an issue.
/reviewers 2 |
@y1yang0 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 196 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@TobiHartmann |
|
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last update looks good.
In general when you return new node in some |
/integrate |
Going to push as commit 0459159.
Your commit was automatically rebased without conflicts. |
Hi can I have a review for this fix? LoadBNode::Ideal crashes after performing GVN right after EA. The bad IR is as follows:
The memory input of Load#971 is Phi#1109 and the address input of Load#971 is AddP whose object base is CheckCastPP#335:
The type of Phi#1109 is
byte[int:>=0]:exact+any *
whilebyte[int:8]:NotNull:exact+any *,iid=177
is the type of CheckCastPP#335 due to EA, they have different alias index, that's why we hit the assertion at L226:jdk/src/hotspot/share/opto/memnode.cpp
Lines 207 to 226 in b17a745
(t is
byte[int:>=0]:exact+any *
, t_adr isbyte[int:8]:NotNull:exact+any *,iid=177
).There is a long story. In the beginning, LoadB#971 is generated at array_copy_forward, and GVN transformed it iteratively:
In this case, we get alias index 5 from address input AddP#969, and step it through MergeMem#1046, we found Phi#1109 then, that's why LoadB->in(Mem) is changed from MergeMem#1046 to Phi#1109 (Which finally leads to crash).
After applying this patch, some related nodes are pushed into the GVN worklist, before stepping through MergeMem#1046, the address input is already changed to AddP#473. i.e., we get alias index 32 from address input AddP#473, and step it through MergeMem#1046, we found StoreB#191 then,LoadB->in(Mem) is changed from MergeMem#1046 to StoreB#191.
The well-formed IR looks like this:

Thanks for your patience.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9777/head:pull/9777
$ git checkout pull/9777
Update a local copy of the PR:
$ git checkout pull/9777
$ git pull https://git.openjdk.org/jdk pull/9777/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9777
View PR using the GUI difftool:
$ git pr show -t 9777
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9777.diff