Skip to content

Conversation

@rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Apr 10, 2025

An Initialize node for an Allocate node is created with a memory
Proj of adr type raw memory. In order for stores to be captured, the
memory state out of the allocation is a MergeMem with slices for the
various object fields/array element set to the raw memory Proj of
the Initialize node. If Phis need to be created during later
transformations from this memory state, The Phi for a particular
slice gets its adr type from the type of the Proj which is raw
memory. If during macro expansion, the Allocate is found to have no
use and so can be removed, the Proj out of the Initialize is
replaced by the memory state on input to the Allocate. A Phi for
some slice for a field of an object will end up with the raw memory
state on input to the Allocate node. As a result, memory state at
the Phi is incorrect and incorrect execution can happen.

The fix I propose is, rather than have a single Proj for the memory
state out of the Initialize with adr type raw memory, to use one
Proj per slice added to the memory state after the Initalize. Each
of the Proj should return the right adr type for its slice. For that
I propose having a new type of Proj: NarrowMemProj that captures
the right adr type.

Logic for the construction of the Allocate/Initialize subgraph is
tweaked so the right adr type captured in is own NarrowMemProj is
added to the memory sugraph. Code that removes an allocation or moves
it also has to be changed so it correctly takes the multiple memory
projections out of the Initialize node into account.

One tricky issue is that when EA split types for a scalar replaceable
Allocate node:

1- the adr type captured in the NarrowMemProj becomes out of sync
with the type of the slices for the allocation

2- before EA, the memory state for one particular field out of the
Initialize node can be used for a Store to the just allocated
object or some other. So we can have a chain of Stores, some to
the newly allocated object, some to some other objects, all of them
using the state of NarrowMemProj out of the Initialize. After
split unique types, the NarrowMemProj is for the slice of a
particular allocation. So Stores to some other objects shouldn't
use that memory state but the memory state before the Allocate.

For that, I added logic to update the adr type of NarrowMemProj
during split unique types and update the memory input of Stores that
don't depend on the memory state out of the allocation site.

I also wrote a verification pass to check that, in the memory graph,
nodes on a particular slice have the right adr type. That verification
pass is not included here because it uncovered issues that are
unrelated to this particular issue so I intend to propose it (with
fixes for those other issues) separately.

I reused the test cases that Emanuel included in
#18265 and went over all issues
linked to this one, tried the test cases and added the ones that I
could reproduce.

/cc hotspot-compiler


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed (Bug - P3)

Reviewers

Contributors

  • Emanuel Peter <epeter@openjdk.org>
  • Roberto Castañeda Lozano <rcastanedalo@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570
$ git checkout pull/24570

Update a local copy of the PR:
$ git checkout pull/24570
$ git pull https://git.openjdk.org/jdk.git pull/24570/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24570

View PR using the GUI difftool:
$ git pr show -t 24570

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24570.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 10, 2025

👋 Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 10, 2025

@rwestrel This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 10, 2025
@openjdk
Copy link

openjdk bot commented Apr 10, 2025

@rwestrel
The hotspot-compiler label was successfully added.

@rwestrel
Copy link
Contributor Author

/contributor add @eme64

@openjdk
Copy link

openjdk bot commented Apr 10, 2025

@rwestrel
Contributor Emanuel Peter <epeter@openjdk.org> successfully added.

@rwestrel rwestrel marked this pull request as ready for review April 10, 2025 11:48
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 10, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 10, 2025

@merykitty
Copy link
Member

It would be great if we have union memory slices for this.

@rwestrel
Copy link
Contributor Author

It would be great if we have union memory slices for this.

Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it?

@merykitty
Copy link
Member

@rwestrel There are other places in C2 when the memory slices are not expressive enough, which leads us to wrapping those accesses in CPUMemBar because otherwise the memory edges would be incorrect. This would become more prominent in Valhalla as atomic flat accesses try to write multiple fields at once.

https://github.com/openjdk/valhalla/blob/6b476fa7d67bf80591ed86d291b167d3d5ee5c28/src/hotspot/share/opto/inlinetypenode.cpp#L855

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel Thanks for looking into that. I vaguely remember working on #18265 and being quite confused about the way we treat memory slices 😅

I think this seems like a good step in the right direction. Though I am a little unsure about the effect on proj_out_or_null, as you can see below. Maybe we can have a simple query that just checks has_proj(TypeFunc::Memory)? But there are also some cases where you actually use the projection, and I'm not sure if that means you might be missing some if there are multiple?

Like @merykitty I wonder if there are other cases where we have similar issues with memory slices. But I guess we should tackle those separately.

_adr_type = adr_type;
}
virtual int Opcode() const;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have these overridden? Just so you can print the _adr_type :)

#ifndef PRODUCT
  virtual void dump_spec(outputStream *st) const;
  virtual void dump_compact_spec(outputStream *st) const;
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, or does it already get printed from the adr_type() i.e. the virtual method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no I don't think so... right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit.

}
result = result->in(MemNode::Memory);
}
if (!is_instance && result->Opcode() == Op_NarrowMemProj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you are checking for NarrowMemProj and casting to it in multiple places. Why not enable the macro to do is_... and as_...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit.

if (!is_instance && result->Opcode() == Op_NarrowMemProj) {
// Memory for non known instance can safely skip over a known instance allocation (that memory state doesn't access
// the result of an allocation for a known instance).
assert(result->as_Proj()->_con == TypeFunc::Memory, "a NarrowMemProj can only be a memory projection");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify that already in the NarrowMemProj, i.e. its constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeFunc::Memory is now hardcoded in the NarrowMemProj constructor.

Comment on lines 5454 to 5471
void InitializeNode::replace_mem_projs_by(Node* mem, Compile* C) {
for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) {
ProjNode* proj = fast_out(i)->as_Proj();
if (proj->_con == TypeFunc::Memory) {
C->gvn_replace_by(proj, mem);
}
}
}

void InitializeNode::replace_mem_projs_by(Node* mem, PhaseIterGVN* igvn) {
for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) {
ProjNode* proj = fast_out(i)->as_Proj();
if (proj->_con == TypeFunc::Memory) {
igvn->replace_node(proj, mem);
--i; --imax;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment to the code why we need both variants?

Comment on lines +34 to +39
* @run main/othervm -Xcomp -XX:-TieredCompilation
* -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test*
* compiler.macronodes.TestEliminationOfAllocationWithoutUse
* @run main/othervm -Xcomp
* -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test*
* compiler.macronodes.TestEliminationOfAllocationWithoutUse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a run without Xcomp make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some test methods run a loop for only a few iterations, not sure a run without -Xcomp makes much sense.

@eme64
Copy link
Contributor

eme64 commented Apr 23, 2025

@rwestrel You should update the bug title, it just sounds too generic. It is really helpful when the "blame" history gives you a helpful comment rather than Umbrella or incorrect result or some bug numbers.

I suggest Use NarrowMemProj to project / split memory slice after Initialize, but you probably have an even better idea ;)

rwestrel and others added 8 commits April 24, 2025 11:15
…tionWithoutUse.java

Co-authored-by: Andrey Turbanov <turbanoff@gmail.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
…pturing.java

Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
…tionWithoutUse.java

Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>

void NarrowMemProjNode::dump_spec(outputStream *st) const {
ProjNode::dump_spec(st);
dump_adr_type(st);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define a special version of NarrowMemProjNode::dump_adr_type or could we just have the same effect calling MemNode::dump_adr_type(this, _adr_type, st) here?


void NarrowMemProjNode::dump_compact_spec(outputStream *st) const {
ProjNode::dump_compact_spec(st);
dump_adr_type(st);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a package declaration (and make the corresponding class names fully qualified in the @run directives).

Comment on lines 52 to 53
analyzer.shouldContain("++++ Eliminated: 51 Allocate");
analyzer.shouldContain("++++ Eliminated: 84 Allocate");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not more allocations removed. The message is confusing.
"Eliminated: 84 Allocate" logs that node number 84 was eliminated (and not 84 nodes).
This patch changes the number of nodes required at allocations so it also has an impact on node numbering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Expecting specific C2 node identifiers seems fragile. I understand it is a pre-existing issue, but since this changeset needs to address it anyway, please consider making it more robust by e.g. using regular expression matching. Here is a suggestion, feel free to incorporate it: 9fd6378. The ultimately improvement would be using the IR test framework, but that is out of scope here.

Comment on lines 563 to 567
// Memory Phi - most recent unique Phi split out
// from this Phi
// MemNode - new memory input for this node
// ChecCastPP - allocation that this is a cast of
// CheckCastPP - allocation that this is a cast of
// allocation - CheckCastPP of the allocation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new entry here explaining how _node_map is used for NarrowMemProjNode nodes.

rwestrel and others added 8 commits September 22, 2025 13:53
…tionWithoutUse.java

Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
Co-authored-by: Roberto Castañeda Lozano <robcasloz@users.noreply.github.com>
@rwestrel
Copy link
Contributor Author

@robcasloz thanks for the review. New commit addresses most of your comments.

robcasloz added a commit to robcasloz/jdk that referenced this pull request Sep 24, 2025
Copy link
Contributor

@robcasloz robcasloz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments, Roland. I have a couple of follow-up questions. I also realized that we need to adjust IGV's custom logic to schedule the new projection nodes more accurately and combine them into their parent nodes when using the "Condense graph" filter. Please consider incorporating the following patch into this changeset: 63a536a.

// elimination. Simply add the MemBarStoreStore after object
// initialization.
MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxBot);
MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment. A follow-up question: the full comment below makes the argument that re-ordering by the compiler can't happen by construction because a later Store that publishes the just allocated object reference is indirectly control dependent on the Initialize node. However, in this case, there may be no such Initialize node (init == nullptr || init->req() < InitializeNode::RawStores). I assume the memory barrier relaxation is still OK in this scenario because we cannot have later, publishing stores of the allocated object reference? That is, if there exists such a store then there must necessarily exist an Initialize node? Or is there any other reason I am missing? It would be good to clarify this point in the comment.

@rwestrel
Copy link
Contributor Author

/contributor add @robcasloz

@rwestrel
Copy link
Contributor Author

@robcasloz Thanks for the patches. I added them.

@openjdk
Copy link

openjdk bot commented Sep 29, 2025

@rwestrel
Contributor Roberto Castañeda Lozano <rcastanedalo@openjdk.org> successfully added.

Copy link
Contributor

@robcasloz robcasloz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for and for addressing all comments and questions, Roland. Looks good!

// elimination. Simply add the MemBarStoreStore after object
// initialization.
MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxBot);
MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification and the pointer to the comment in initialize_object, that helps.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 29, 2025
@robcasloz
Copy link
Contributor

Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work?

@robcasloz
Copy link
Contributor

Hi @rwestrel, could you please have a look at the merge conflicts of this PR so that we can progress further with the review of this work?

The conflict is caused by the integration of JDK-8360031, which relaxes the assertion in

assert(outcnt() > 0 && outcnt() <= 2, "Only one or two out edges allowed");
which is also touched by this changeset. Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset?

@robcasloz
Copy link
Contributor

Is the current assertion in mainline (after JDK-8360031) still valid in the context of this changeset?

I did a bit of testing and updating the asserted invariant to (outcnt() > 0 && outcnt() <= 2) || Opcode() == Op_Initialize seems to work.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel Thanks for your continued work on this, and you patience with the slow reviews 🙈

I stumbled a bit over the many overloads of apply_to_narrow_mem_projs, and left some comments for that below.

There are really only one use of already_has_narrow_mem_proj_with_adr_type and one of apply_to_narrow_mem_projs, and you have a whole lot of methods that implement a lot of abstractions that confuse me. Is there a plan behind this, or just an artefact of many refactorings?

Could we not just implement those 2 methods directly using apply_to_projs_any_iterator?

I'll continue reviewing other parts now.

}
return MultiNode::CONTINUE;
};
init->apply_to_narrow_mem_projs(i, process_narrow_proj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is not a init->for_each_narrow_mem_proj(callback), that has an internal iterator?

Because with this API, I'm wondering: What would happen if I feed apply_to_narrow_mem_projs an iterator that does not belong to the init?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a plain for_each_narrow_mem_proj, your callback would also not have to return anything, you could remove this line: return MultiNode::CONTINUE;.

Comment on lines 1426 to 1428
template <class Callback> ProjNode* apply_to_narrow_mem_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback) const {
return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIteratorFast>(UsesIteratorFast(imax, i, this), callback);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any use for this method? I could not find any.

Comment on lines 1422 to 1424
template <class Callback> ProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const {
return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIterator>(UsesIterator(i, this), callback);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have a public API where we can pass in an iterator? What if someone uses it with an iterator for the wrong node?

The only use for it seems to be ConnectionGraph::split_unique_types with DUIterator.

Is there a reason you want to pass the iterator explicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: why does it return anything? What kind of callback is exected here? I think the public API deserves a bit of documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use in ConnectionGraph::split_unique_types really does not need a return value, and also could work fine with a callback that return void.

Comment on lines 4456 to 4457
bool already_has_narrow_mem_proj_with_adr_type = init->already_has_narrow_mem_proj_with_adr_type(new_adr_type);
if (adr_type != new_adr_type && !already_has_narrow_mem_proj_with_adr_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we could avoid the iteration inside already_has_narrow_mem_proj_with_adr_type if we did the adr_type != new_adr_type first. Feel free to ignore if you think this is a micro-optimization ;)

Comment on lines 5476 to 5491
template<class Callback> ProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const {
auto filter = [&](NarrowMemProjNode* proj) {
if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj())) {
return BREAK_AND_RETURN_CURRENT_PROJ;
}
return CONTINUE;
};
return apply_to_narrow_mem_projs(filter);
}

bool InitializeNode::already_has_narrow_mem_proj_with_adr_type(const TypePtr* adr_type) const {
auto find_proj = [](ProjNode* proj) {
return BREAK_AND_RETURN_CURRENT_PROJ;
};
return apply_to_narrow_mem_projs(find_proj, adr_type) != nullptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I seeing this right: already_has_narrow_mem_proj_with_adr_type calls apply_to_narrow_mem_projs with callback = find_proj. find_proj returns BREAK_AND_RETURN_CURRENT_PROJ, which is an element from ApplyToProjs. That would mean that when we call the callback, we get an enum element, and not a boolean, right?

If that is the case, you should probably not do an implicit comparison on this line:
if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj())) {
Hotspot style guide does not like implicit conversion. You should use an explicit comparison.

I think it would also be more clear what is happening. Currently, I'm a bit confused.

All the overloadings of apply_to_narrow_mem_projs make it a bit hard to see what goes where :/
I wonder if we really need all that complexity.

// Iterate with i over all NarrowMemProj uses calling callback
template <class Callback, class Iterator> ProjNode* apply_to_narrow_mem_projs_any_iterator(Iterator i, Callback callback) const {
auto filter = [&](ProjNode* proj) {
if (proj->is_NarrowMemProj() && callback(proj->as_NarrowMemProj())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the callback return here? Are we sure this is not an implicit zero/null check, that the hotspot style guide would not be happy with?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second batch of comments.

Will continue later at class NarrowMemProjNode.

Comment on lines 5610 to 5612
return MultiNode::CONTINUE;
};
init->apply_to_projs(move_proj, TypeFunc::Memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "for each" using callback with void return would create a little less noise.

Comment on lines 1606 to 1609
// See comment below in this if's other branch that explains why a raw memory MemBar is good enough. If init is
// null, this allocation does have an InitializeNode but this logic can't locate it (see comment in
// PhaseMacroExpand::initialize_object()).
MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be nicer to have the explanation here than below, and refer from below to here? Would help the reading flow ;)

Comment on lines 1644 to 1646
return MultiNode::CONTINUE;
};
init->apply_to_projs(find_raw_mem, TypeFunc::Memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "for each" with void return callback would reduce noise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatives:

  • init->unique_raw_mem_proj()
  • init->unique_out(raw_mem_proj_predicate) with boolean predicate.
    Both of those would assert if we find multiple or none.

transform_later(ctrl);
Node* mem = new ProjNode(init, TypeFunc::Memory);
transform_later(mem);
Node* existing_raw_mem_proj = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny suggestion:
existing_raw_mem_proj -> old_raw_mem_proj, to emphasize that it is old, and will be replaced.

Comment on lines +5461 to +5468
DUIterator_Fast imax, i = fast_outs(imax);
auto replace_proj = [&](ProjNode* proj) {
igvn->replace_node(proj, mem);
--i; --imax;
return CONTINUE;
};
apply_to_projs(imax, i, replace_proj, TypeFunc::Memory);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouff, it's a little sad that we modify the iterator both outside and inside the method call. But I don't have a better solution now either. It may just be what we have to do.

Comment on lines 49 to 63
auto find_proj = [&](ProjNode* proj) {
assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || proj->Opcode() == (which_proj ? Op_IfTrue : Op_IfFalse),
"incorrect projection node at If/RangeCheck: IfTrue on false path or IfFalse on true path");
return BREAK_AND_RETURN_CURRENT_PROJ;
};
return apply_to_projs(find_proj, which_proj);
}

ProjNode* MultiNode::proj_out_or_null(uint which_proj, bool is_io_use) const {
for (DUIterator_Fast imax, i = fast_outs(imax); i < imax; i++) {
ProjNode* proj = fast_out(i)->isa_Proj();
if (proj != nullptr && (proj->_con == which_proj) && (proj->_is_io_use == is_io_use)) {
return proj;
assert(number_of_projs(which_proj, is_io_use) <= 1, "only when there's a single projection");
auto find_proj = [](ProjNode* proj) {
return BREAK_AND_RETURN_CURRENT_PROJ;
};
return apply_to_projs(find_proj, which_proj, is_io_use);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A find_first API with a boolean predicate could reduce some noise here.


template<class Callback> ProjNode* MultiNode::apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const {
auto filter = [&](ProjNode* proj) {
if (proj->_is_io_use == is_io_use && callback(proj)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implicit zero check?

Comment on lines 77 to 81
auto count_projs = [&](ProjNode* proj) {
cnt++;
return CONTINUE;
};
apply_to_projs(count_projs, which_proj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each pattern would have been nice here.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last batch of comments for this round.

I'm quite happy with the solution @rwestrel , thanks for coming up with this 😊

I think really all my suggestions / requests are about the complexity / implementation around the apply_... methods. It seems to me that it makes sense to internally implement it with the apply_... functionality that take callbacks which return a ApplyToProjs. But I think many of the public API methods could have a simpler semantic, for example many could just be for_each_... calls that just take a callback with void return / no return.

return sizeof(*this);
}
public:
NarrowMemProjNode(Node* src, const TypePtr* adr_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you feed it any other src than a InitializeNode*?

Suggested change
NarrowMemProjNode(Node* src, const TypePtr* adr_type)
NarrowMemProjNode(InitializeNode* src, const TypePtr* adr_type)

#endif
};

template <class Callback> ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not belong right after the MultiNode? Or even in multnode.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs the ProjNode declaration because it accesses proj->_con and can't be in multnode.cpp because of the template parameter.


template <class Callback> ProjNode* MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const {
auto filter = [&](ProjNode* proj) {
if (proj->_con == which_proj && callback(proj)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit zero check?

Comment on lines 2595 to 2598
auto enqueue_init_mem_projs = [&](ProjNode* proj) {
add_users_to_worklist0(proj, worklist);
return MultiNode::CONTINUE;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for_each call below would mean we would not need to return anything.

Comment on lines 265 to 273
void NarrowMemProjNode::dump_spec(outputStream *st) const {
ProjNode::dump_spec(st);
MemNode::dump_adr_type(_adr_type, st);
}

void NarrowMemProjNode::dump_compact_spec(outputStream *st) const {
ProjNode::dump_compact_spec(st);
MemNode::dump_adr_type(_adr_type, st);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show us an example out put of dump? I'm just wondering if there maybe needs to be a space between the two, and if it is immediately readable :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Node::dump already takes care of dumping the adr_type. So I removed the dump methods from NarrowMemProjNode. Here is an example output:

 59  Initialize  === 83 1 62 1 1 1 121  [[ 124 123 63 64 65 ]]  !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)                                                                                                                                     
TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)                                    
  63  NarrowMemProj  === 59  [[ 15 ]] #2  Memory: @java/lang/Object *, idx=4; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)                                                                                                                        
  64  NarrowMemProj  === 59  [[ 15 ]] #2  Memory: @java/lang/Object+8 * [narrowklass], idx=5; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)                                                                                                        
  65  NarrowMemProj  === 59  [[ 15 ]] #2  Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=6; !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57)                                                                     
  66  CheckCastPP  === 125 86  [[ 79 ]]  #float[int:1] (java/lang/Cloneable,java/io/Serializable):NotNull:exact * !jvms: TestInitializingStoreCapturing::testInitializeArray @ bci:1 (line 57) 

}


template<class Callback> ProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit: we will only ever return a NarrowMemProj, so you might as well make the return value more precise ;)

@rwestrel
Copy link
Contributor Author

@eme64 I pushed a new commit which should address all your comments.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 28, 2025
Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel Thanks for the updates, it already looks better :)

I had a few minutes to look over the apply_.. solutions. I left a few comments, and hope that we can make the code just a little slicker still ;)

Comment on lines +1425 to +1428
// Run callback on all NarrowMem proj uses using passed iterator
template <class Callback> NarrowMemProjNode* apply_to_narrow_mem_projs(DUIterator& i, Callback callback) const {
return apply_to_narrow_mem_projs_any_iterator<Callback, UsesIterator>(UsesIterator(i, this), callback);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one still needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that the upper two can be merged. Maybe all "overloadings" of apply_to_narrow_mem_projs can be merged, no? Or are there really multiple uses?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we could call apply_to_narrow_mem_projs_any_iterator directly from the two uses:

  • already_has_narrow_mem_proj_with_adr_type
  • for_each_narrow_mem_proj_with_new_uses

Comment on lines +5470 to +5484
template <class Callback> NarrowMemProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback) const {
DUIterator_Fast imax, i = fast_outs(imax);
return apply_to_narrow_mem_projs_any_iterator(UsesIteratorFast(imax, i, this), callback);
}


template<class Callback> NarrowMemProjNode* InitializeNode::apply_to_narrow_mem_projs(Callback callback, const TypePtr* adr_type) const {
auto filter = [&](NarrowMemProjNode* proj) {
if (proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj()) == BREAK_AND_RETURN_CURRENT_PROJ) {
return BREAK_AND_RETURN_CURRENT_PROJ;
}
return CONTINUE;
};
return apply_to_narrow_mem_projs(filter);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the upper method is only used by the lower here. Why not just collapse them? It would also reduce the "overloading noise".

Comment on lines +120 to +141
// Run callback on projections with iterator passed as argument
template<class Callback> ProjNode* apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback) const {
return apply_to_projs_any_iterator<Callback, UsesIteratorFast>(UsesIteratorFast(imax, i, this), callback);
}

// Same but with default iterator
template<class Callback> ProjNode* apply_to_projs(Callback callback) const {
DUIterator_Fast imax, i = fast_outs(imax);
return apply_to_projs(imax, i, callback);
}

// Same but only for Proj node whose _con matches which_proj
template <class Callback> ProjNode* apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj) const;

// Same but with default iterator
template<class Callback> ProjNode* apply_to_projs(Callback callback, uint which_proj) const {
DUIterator_Fast imax, i = fast_outs(imax);
return apply_to_projs(imax, i, callback, which_proj);
}

// Same but for matching _con and _is_io_use
template <class Callback> ProjNode* apply_to_projs(Callback callback, uint which_proj, bool is_io_use) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be public? Or could they be protected, so they are only available to subtypes?

And do we really need all the variants of apply_to_projs, or could we collapse them a little?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a lot of boilerplate, would be nice if it was a little slicker ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

graal graal-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

8 participants