-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed #24570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
|
@rwestrel This change is no longer ready for integration - check the PR body for details. |
|
@rwestrel |
|
/contributor add @eme64 |
|
@rwestrel |
Webrevs
|
|
It would be great if we have union memory slices for this. |
Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it? |
test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java
Outdated
Show resolved
Hide resolved
|
@rwestrel There are other places in C2 when the memory slices are not expressive enough, which leads us to wrapping those accesses in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel Thanks for looking into that. I vaguely remember working on #18265 and being quite confused about the way we treat memory slices 😅
I think this seems like a good step in the right direction. Though I am a little unsure about the effect on proj_out_or_null, as you can see below. Maybe we can have a simple query that just checks has_proj(TypeFunc::Memory)? But there are also some cases where you actually use the projection, and I'm not sure if that means you might be missing some if there are multiple?
Like @merykitty I wonder if there are other cases where we have similar issues with memory slices. But I guess we should tackle those separately.
| _adr_type = adr_type; | ||
| } | ||
| virtual int Opcode() const; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have these overridden? Just so you can print the _adr_type :)
#ifndef PRODUCT
virtual void dump_spec(outputStream *st) const;
virtual void dump_compact_spec(outputStream *st) const;
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, or does it already get printed from the adr_type() i.e. the virtual method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no I don't think so... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
src/hotspot/share/opto/escape.cpp
Outdated
| } | ||
| result = result->in(MemNode::Memory); | ||
| } | ||
| if (!is_instance && result->Opcode() == Op_NarrowMemProj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you are checking for NarrowMemProj and casting to it in multiple places. Why not enable the macro to do is_... and as_...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
src/hotspot/share/opto/escape.cpp
Outdated
| if (!is_instance && result->Opcode() == Op_NarrowMemProj) { | ||
| // Memory for non known instance can safely skip over a known instance allocation (that memory state doesn't access | ||
| // the result of an allocation for a known instance). | ||
| assert(result->as_Proj()->_con == TypeFunc::Memory, "a NarrowMemProj can only be a memory projection"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we verify that already in the NarrowMemProj, i.e. its constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeFunc::Memory is now hardcoded in the NarrowMemProj constructor.
| void InitializeNode::replace_mem_projs_by(Node* mem, Compile* C) { | ||
| for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) { | ||
| ProjNode* proj = fast_out(i)->as_Proj(); | ||
| if (proj->_con == TypeFunc::Memory) { | ||
| C->gvn_replace_by(proj, mem); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void InitializeNode::replace_mem_projs_by(Node* mem, PhaseIterGVN* igvn) { | ||
| for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) { | ||
| ProjNode* proj = fast_out(i)->as_Proj(); | ||
| if (proj->_con == TypeFunc::Memory) { | ||
| igvn->replace_node(proj, mem); | ||
| --i; --imax; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment to the code why we need both variants?
test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java
Outdated
Show resolved
Hide resolved
| * @run main/othervm -Xcomp -XX:-TieredCompilation | ||
| * -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test* | ||
| * compiler.macronodes.TestEliminationOfAllocationWithoutUse | ||
| * @run main/othervm -Xcomp | ||
| * -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test* | ||
| * compiler.macronodes.TestEliminationOfAllocationWithoutUse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a run without Xcomp make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some test methods run a loop for only a few iterations, not sure a run without -Xcomp makes much sense.
test/hotspot/jtreg/compiler/macronodes/TestInitializingStoreCapturing.java
Outdated
Show resolved
Hide resolved
|
@rwestrel You should update the bug title, it just sounds too generic. It is really helpful when the "blame" history gives you a helpful comment rather than I suggest |
…tionWithoutUse.java Co-authored-by: Andrey Turbanov <turbanoff@gmail.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
…pturing.java Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
…tionWithoutUse.java Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
src/hotspot/share/opto/multnode.cpp
Outdated
|
|
||
| void NarrowMemProjNode::dump_spec(outputStream *st) const { | ||
| ProjNode::dump_spec(st); | ||
| dump_adr_type(st); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define a special version of NarrowMemProjNode::dump_adr_type or could we just have the same effect calling MemNode::dump_adr_type(this, _adr_type, st) here?
src/hotspot/share/opto/multnode.cpp
Outdated
|
|
||
| void NarrowMemProjNode::dump_compact_spec(outputStream *st) const { | ||
| ProjNode::dump_compact_spec(st); | ||
| dump_adr_type(st); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a package declaration (and make the corresponding class names fully qualified in the @run directives).
| analyzer.shouldContain("++++ Eliminated: 51 Allocate"); | ||
| analyzer.shouldContain("++++ Eliminated: 84 Allocate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are not more allocations removed. The message is confusing.
"Eliminated: 84 Allocate" logs that node number 84 was eliminated (and not 84 nodes).
This patch changes the number of nodes required at allocations so it also has an impact on node numbering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. Expecting specific C2 node identifiers seems fragile. I understand it is a pre-existing issue, but since this changeset needs to address it anyway, please consider making it more robust by e.g. using regular expression matching. Here is a suggestion, feel free to incorporate it: 9fd6378. The ultimately improvement would be using the IR test framework, but that is out of scope here.
| // Memory Phi - most recent unique Phi split out | ||
| // from this Phi | ||
| // MemNode - new memory input for this node | ||
| // ChecCastPP - allocation that this is a cast of | ||
| // CheckCastPP - allocation that this is a cast of | ||
| // allocation - CheckCastPP of the allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a new entry here explaining how _node_map is used for NarrowMemProjNode nodes.
…tionWithoutUse.java Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
|
@robcasloz thanks for the review. New commit addresses most of your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments, Roland. I have a couple of follow-up questions. I also realized that we need to adjust IGV's custom logic to schedule the new projection nodes more accurately and combine them into their parent nodes when using the "Condense graph" filter. Please consider incorporating the following patch into this changeset: 63a536a.
| // elimination. Simply add the MemBarStoreStore after object | ||
| // initialization. | ||
| MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxBot); | ||
| MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment. A follow-up question: the full comment below makes the argument that re-ordering by the compiler can't happen by construction because a later Store that publishes the just allocated object reference is indirectly control dependent on the Initialize node. However, in this case, there may be no such Initialize node (init == nullptr || init->req() < InitializeNode::RawStores). I assume the memory barrier relaxation is still OK in this scenario because we cannot have later, publishing stores of the allocated object reference? That is, if there exists such a store then there must necessarily exist an Initialize node? Or is there any other reason I am missing? It would be good to clarify this point in the comment.
|
/contributor add @robcasloz |
|
@robcasloz Thanks for the patches. I added them. |
|
@rwestrel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for and for addressing all comments and questions, Roland. Looks good!
| // elimination. Simply add the MemBarStoreStore after object | ||
| // initialization. | ||
| MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxBot); | ||
| MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification and the pointer to the comment in initialize_object, that helps.
|
Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work? |
The conflict is caused by the integration of JDK-8360031, which relaxes the assertion in jdk/src/hotspot/share/opto/memnode.cpp Line 4232 in 430041d
|
I did a bit of testing and updating the asserted invariant to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel Thanks for your continued work on this, and you patience with the slow reviews 🙈
I stumbled a bit over the many overloads of apply_to_narrow_mem_projs, and left some comments for that below.
There are really only one use of already_has_narrow_mem_proj_with_adr_type and one of apply_to_narrow_mem_projs, and you have a whole lot of methods that implement a lot of abstractions that confuse me. Is there a plan behind this, or just an artefact of many refactorings?
Could we not just implement those 2 methods directly using apply_to_projs_any_iterator?
I'll continue reviewing other parts now.
src/hotspot/share/opto/escape.cpp
Outdated
| } | ||
| return MultiNode::CONTINUE; | ||
| }; | ||
| init->apply_to_narrow_mem_projs(i, process_narrow_proj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is not a init->for_each_narrow_mem_proj(callback), that has an internal iterator?
Because with this API, I'm wondering: What would happen if I feed apply_to_narrow_mem_projs an iterator that does not belong to the init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was a plain for_each_narrow_mem_proj, your callback would also not have to return anything, you could remove this line: return MultiNode::CONTINUE;.
src/hotspot/share/opto/memnode.hpp
Outdated
| template <class Callback> ProjNode* apply_to_narrow_mem_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback) const { | ||
| return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIteratorFast>(UsesIteratorFast(imax, i, this), callback); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any use for this method? I could not find any.
src/hotspot/share/opto/memnode.hpp
Outdated
| template <class Callback> ProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { | ||
| return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIterator>(UsesIterator(i, this), callback); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to have a public API where we can pass in an iterator? What if someone uses it with an iterator for the wrong node?
The only use for it seems to be ConnectionGraph::split_unique_types with DUIterator.
Is there a reason you want to pass the iterator explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: why does it return anything? What kind of callback is exected here? I think the public API deserves a bit of documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use in ConnectionGraph::split_unique_types really does not need a return value, and also could work fine with a callback that return void.
src/hotspot/share/opto/escape.cpp
Outdated
| bool already_has_narrow_mem_proj_with_adr_type = init->already_has_narrow_mem_proj_with_adr_type(new_adr_type); | ||
| if (adr_type != new_adr_type && !already_has_narrow_mem_proj_with_adr_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could avoid the iteration inside already_has_narrow_mem_proj_with_adr_type if we did the adr_type != new_adr_type first. Feel free to ignore if you think this is a micro-optimization ;)
src/hotspot/share/opto/memnode.cpp
Outdated
| template<class Callback> ProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const { | ||
| auto filter = [&](NarrowMemProjNode* proj) { | ||
| if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj())) { | ||
| return BREAK_AND_RETURN_CURRENT_PROJ; | ||
| } | ||
| return CONTINUE; | ||
| }; | ||
| return apply_to_narrow_mem_projs(filter); | ||
| } | ||
|
|
||
| bool InitializeNode::already_has_narrow_mem_proj_with_adr_type(const TypePtr* adr_type) const { | ||
| auto find_proj = [](ProjNode* proj) { | ||
| return BREAK_AND_RETURN_CURRENT_PROJ; | ||
| }; | ||
| return apply_to_narrow_mem_projs(find_proj, adr_type) != nullptr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I seeing this right: already_has_narrow_mem_proj_with_adr_type calls apply_to_narrow_mem_projs with callback = find_proj. find_proj returns BREAK_AND_RETURN_CURRENT_PROJ, which is an element from ApplyToProjs. That would mean that when we call the callback, we get an enum element, and not a boolean, right?
If that is the case, you should probably not do an implicit comparison on this line:
if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj())) {
Hotspot style guide does not like implicit conversion. You should use an explicit comparison.
I think it would also be more clear what is happening. Currently, I'm a bit confused.
All the overloadings of apply_to_narrow_mem_projs make it a bit hard to see what goes where :/
I wonder if we really need all that complexity.
src/hotspot/share/opto/memnode.hpp
Outdated
| // Iterate with i over all NarrowMemProj uses calling callback | ||
| template <class Callback, class Iterator> ProjNode* apply_to_narrow_mem_projs_any_iterator(Iterator i, Callback callback) const { | ||
| auto filter = [&](ProjNode* proj) { | ||
| if (proj->is_NarrowMemProj() && callback(proj->as_NarrowMemProj())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the callback return here? Are we sure this is not an implicit zero/null check, that the hotspot style guide would not be happy with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second batch of comments.
Will continue later at class NarrowMemProjNode.
| return MultiNode::CONTINUE; | ||
| }; | ||
| init->apply_to_projs(move_proj, TypeFunc::Memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "for each" using callback with void return would create a little less noise.
src/hotspot/share/opto/macro.cpp
Outdated
| // See comment below in this if's other branch that explains why a raw memory MemBar is good enough. If init is | ||
| // null, this allocation does have an InitializeNode but this logic can't locate it (see comment in | ||
| // PhaseMacroExpand::initialize_object()). | ||
| MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be nicer to have the explanation here than below, and refer from below to here? Would help the reading flow ;)
src/hotspot/share/opto/macro.cpp
Outdated
| return MultiNode::CONTINUE; | ||
| }; | ||
| init->apply_to_projs(find_raw_mem, TypeFunc::Memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "for each" with void return callback would reduce noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatives:
init->unique_raw_mem_proj()init->unique_out(raw_mem_proj_predicate)with boolean predicate.
Both of those would assert if we find multiple or none.
src/hotspot/share/opto/macro.cpp
Outdated
| transform_later(ctrl); | ||
| Node* mem = new ProjNode(init, TypeFunc::Memory); | ||
| transform_later(mem); | ||
| Node* existing_raw_mem_proj = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny suggestion:
existing_raw_mem_proj -> old_raw_mem_proj, to emphasize that it is old, and will be replaced.
| DUIterator_Fast imax, i = fast_outs(imax); | ||
| auto replace_proj = [&](ProjNode* proj) { | ||
| igvn->replace_node(proj, mem); | ||
| --i; --imax; | ||
| return CONTINUE; | ||
| }; | ||
| apply_to_projs(imax, i, replace_proj, TypeFunc::Memory); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouff, it's a little sad that we modify the iterator both outside and inside the method call. But I don't have a better solution now either. It may just be what we have to do.
src/hotspot/share/opto/multnode.cpp
Outdated
| auto find_proj = [&](ProjNode* proj) { | ||
| assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || proj->Opcode() == (which_proj ? Op_IfTrue : Op_IfFalse), | ||
| "incorrect projection node at If/RangeCheck: IfTrue on false path or IfFalse on true path"); | ||
| return BREAK_AND_RETURN_CURRENT_PROJ; | ||
| }; | ||
| return apply_to_projs(find_proj, which_proj); | ||
| } | ||
|
|
||
| ProjNode* MultiNode::proj_out_or_null(uint which_proj, bool is_io_use) const { | ||
| for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) { | ||
| ProjNode* proj = fast_out(i)->isa_Proj(); | ||
| if (proj != nullptr && (proj->_con == which_proj) && (proj->_is_io_use == is_io_use)) { | ||
| return proj; | ||
| assert(number_of_projs(which_proj, is_io_use) <= 1, "only when there's a single projection"); | ||
| auto find_proj = [](ProjNode* proj) { | ||
| return BREAK_AND_RETURN_CURRENT_PROJ; | ||
| }; | ||
| return apply_to_projs(find_proj, which_proj, is_io_use); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A find_first API with a boolean predicate could reduce some noise here.
src/hotspot/share/opto/multnode.cpp
Outdated
|
|
||
| template<class Callback> ProjNode* MultiNode::apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const { | ||
| auto filter = [&](ProjNode* proj) { | ||
| if (proj->_is_io_use == is_io_use && callback(proj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implicit zero check?
src/hotspot/share/opto/multnode.cpp
Outdated
| auto count_projs = [&](ProjNode* proj) { | ||
| cnt++; | ||
| return CONTINUE; | ||
| }; | ||
| apply_to_projs(count_projs, which_proj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each pattern would have been nice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last batch of comments for this round.
I'm quite happy with the solution @rwestrel , thanks for coming up with this 😊
I think really all my suggestions / requests are about the complexity / implementation around the apply_... methods. It seems to me that it makes sense to internally implement it with the apply_... functionality that take callbacks which return a ApplyToProjs. But I think many of the public API methods could have a simpler semantic, for example many could just be for_each_... calls that just take a callback with void return / no return.
| return sizeof(*this); | ||
| } | ||
| public: | ||
| NarrowMemProjNode(Node* src, const TypePtr* adr_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you feed it any other src than a InitializeNode*?
| NarrowMemProjNode(Node* src, const TypePtr* adr_type) | |
| NarrowMemProjNode(InitializeNode* src, const TypePtr* adr_type) |
| #endif | ||
| }; | ||
|
|
||
| template <class Callback> ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this not belong right after the MultiNode? Or even in multnode.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs the ProjNode declaration because it accesses proj->_con and can't be in multnode.cpp because of the template parameter.
src/hotspot/share/opto/multnode.hpp
Outdated
|
|
||
| template <class Callback> ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const { | ||
| auto filter = [&](ProjNode* proj) { | ||
| if (proj->_con == which_proj && callback(proj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit zero check?
| auto enqueue_init_mem_projs = [&](ProjNode* proj) { | ||
| add_users_to_worklist0(proj, worklist); | ||
| return MultiNode::CONTINUE; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for_each call below would mean we would not need to return anything.
src/hotspot/share/opto/multnode.cpp
Outdated
| void NarrowMemProjNode::dump_spec(outputStream *st) const { | ||
| ProjNode::dump_spec(st); | ||
| MemNode::dump_adr_type(_adr_type, st); | ||
| } | ||
|
|
||
| void NarrowMemProjNode::dump_compact_spec(outputStream *st) const { | ||
| ProjNode::dump_compact_spec(st); | ||
| MemNode::dump_adr_type(_adr_type, st); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show us an example out put of dump? I'm just wondering if there maybe needs to be a space between the two, and if it is immediately readable :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Node::dump already takes care of dumping the adr_type. So I removed the dump methods from NarrowMemProjNode. Here is an example output:
59 Initialize === 83 1 62 1 1 1 121 [[ 124 123 63 64 65 ]] !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
63 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @java/lang/Object *, idx=4; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
64 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @java/lang/Object+8 * [narrowklass], idx=5; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
65 NarrowMemProj === 59 [[ 15 ]] #2 Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
66 CheckCastPP === 125 86 [[ 79 ]] #float[int:1] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)
src/hotspot/share/opto/memnode.cpp
Outdated
| } | ||
|
|
||
|
|
||
| template<class Callback> ProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit: we will only ever return a NarrowMemProj, so you might as well make the return value more precise ;)
|
@eme64 I pushed a new commit which should address all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel Thanks for the updates, it already looks better :)
I had a few minutes to look over the apply_.. solutions. I left a few comments, and hope that we can make the code just a little slicker still ;)
| // Run callback on all NarrowMem proj uses using passed iterator | ||
| template <class Callback> NarrowMemProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const { | ||
| return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIterator>(UsesIterator(i, this), callback); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that the upper two can be merged. Maybe all "overloadings" of apply_to_narrow_mem_projs can be merged, no? Or are there really multiple uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we could call apply_to_narrow_mem_projs_any_iterator directly from the two uses:
already_has_narrow_mem_proj_with_adr_typefor_each_narrow_mem_proj_with_new_uses
| template <class Callback> NarrowMemProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback) const { | ||
| DUIterator_Fast imax, i = fast_outs(imax); | ||
| return apply_to_narrow_mem_projs_any_iterator(UsesIteratorFast(imax, i, this), callback); | ||
| } | ||
|
|
||
|
|
||
| template<class Callback> NarrowMemProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const { | ||
| auto filter = [&](NarrowMemProjNode* proj) { | ||
| if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj()) == BREAK_AND_RETURN_CURRENT_PROJ) { | ||
| return BREAK_AND_RETURN_CURRENT_PROJ; | ||
| } | ||
| return CONTINUE; | ||
| }; | ||
| return apply_to_narrow_mem_projs(filter); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the upper method is only used by the lower here. Why not just collapse them? It would also reduce the "overloading noise".
| // Run callback on projections with iterator passed as argument | ||
| template<class Callback> ProjNode* apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback) const { | ||
| return apply_to_projs_any_iterator<Callback, UsesIteratorFast>(UsesIteratorFast(imax, i, this), callback); | ||
| } | ||
|
|
||
| // Same but with default iterator | ||
| template<class Callback> ProjNode* apply_to_projs(Callback callback) const { | ||
| DUIterator_Fast imax, i = fast_outs(imax); | ||
| return apply_to_projs(imax, i, callback); | ||
| } | ||
|
|
||
| // Same but only for Proj node whose _con matches which_proj | ||
| template <class Callback> ProjNode* apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const; | ||
|
|
||
| // Same but with default iterator | ||
| template<class Callback> ProjNode* apply_to_projs(Callback callback, uint which_proj) const { | ||
| DUIterator_Fast imax, i = fast_outs(imax); | ||
| return apply_to_projs(imax, i, callback, which_proj); | ||
| } | ||
|
|
||
| // Same but for matching _con and _is_io_use | ||
| template <class Callback> ProjNode* apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be public? Or could they be protected, so they are only available to subtypes?
And do we really need all the variants of apply_to_projs, or could we collapse them a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a lot of boilerplate, would be nice if it was a little slicker ;)
An
Initializenode for anAllocatenode is created with a memoryProjof adr type raw memory. In order for stores to be captured, thememory state out of the allocation is a
MergeMemwith slices for thevarious object fields/array element set to the raw memory
Projofthe
Initializenode. IfPhis need to be created during latertransformations from this memory state, The
Phifor a particularslice gets its adr type from the type of the
Projwhich is rawmemory. If during macro expansion, the
Allocateis found to have nouse and so can be removed, the
Projout of theInitializeisreplaced by the memory state on input to the
Allocate. APhiforsome slice for a field of an object will end up with the raw memory
state on input to the
Allocatenode. As a result, memory state atthe
Phiis incorrect and incorrect execution can happen.The fix I propose is, rather than have a single
Projfor the memorystate out of the
Initializewith adr type raw memory, to use oneProjper slice added to the memory state after theInitalize. Eachof the
Projshould return the right adr type for its slice. For thatI propose having a new type of
Proj:NarrowMemProjthat capturesthe right adr type.
Logic for the construction of the
Allocate/Initializesubgraph istweaked so the right adr type captured in is own
NarrowMemProjisadded to the memory sugraph. Code that removes an allocation or moves
it also has to be changed so it correctly takes the multiple memory
projections out of the
Initializenode into account.One tricky issue is that when EA split types for a scalar replaceable
Allocatenode:1- the adr type captured in the
NarrowMemProjbecomes out of syncwith the type of the slices for the allocation
2- before EA, the memory state for one particular field out of the
Initializenode can be used for aStoreto the just allocatedobject or some other. So we can have a chain of
Stores, some tothe newly allocated object, some to some other objects, all of them
using the state of
NarrowMemProjout of theInitialize. Aftersplit unique types, the
NarrowMemProjis for the slice of aparticular allocation. So
Stores to some other objects shouldn'tuse that memory state but the memory state before the
Allocate.For that, I added logic to update the adr type of
NarrowMemProjduring split unique types and update the memory input of
Stores thatdon't depend on the memory state out of the allocation site.
I also wrote a verification pass to check that, in the memory graph,
nodes on a particular slice have the right adr type. That verification
pass is not included here because it uncovered issues that are
unrelated to this particular issue so I intend to propose it (with
fixes for those other issues) separately.
I reused the test cases that Emanuel included in
#18265 and went over all issues
linked to this one, tried the test cases and added the ones that I
could reproduce.
/cc hotspot-compiler
Progress
Issue
Reviewers
Contributors
<epeter@openjdk.org><rcastanedalo@openjdk.org>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570$ git checkout pull/24570Update a local copy of the PR:
$ git checkout pull/24570$ git pull https://git.openjdk.org/jdk.git pull/24570/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24570View PR using the GUI difftool:
$ git pr show -t 24570Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24570.diff
Using Webrev
Link to Webrev Comment