Skip to content

8373591: C2: Fix the memory around some intrinsics nodes#28789

Open
merykitty wants to merge 9 commits intoopenjdk:masterfrom
merykitty:intrinsicsadrtype
Open

8373591: C2: Fix the memory around some intrinsics nodes#28789
merykitty wants to merge 9 commits intoopenjdk:masterfrom
merykitty:intrinsicsadrtype

Conversation

@merykitty
Copy link
Member

@merykitty merykitty commented Dec 12, 2025

Hi,

This is extracted from #28570 , there are 2 issues here:

  • Some intrinsics nodes advertise incorrect adr_type. For example, AryEqNode reports adr_type being TypeAryPtr::BYTES (it inherits this from StrIntrinsicNode). This is incorrect, however, as it can accept char[] inputs, too. Another case is VectorizedHashCodeNode, which reports its adr_type being TypePtr::BOTTOM, but it actually extracts a memory slice and does not consume the whole memory.
  • For nodes such as StrInflatedCopyNode, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage.

Testing:

  • tier1-4,hs-precheckin-comp,hs-comp-stress

Please kindly review, thanks a lot.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8373591: C2: Fix the memory around some intrinsics nodes (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789
$ git checkout pull/28789

Update a local copy of the PR:
$ git checkout pull/28789
$ git pull https://git.openjdk.org/jdk.git pull/28789/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28789

View PR using the GUI difftool:
$ git pr show -t 28789

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28789.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 12, 2025

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@merykitty merykitty changed the title 8373591: C2: FIx the memory around some intrinsics nodes 8373591: C2: Fix the memory around some intrinsics nodes Dec 12, 2025
@merykitty
Copy link
Member Author

@eme64 I have extracted the fix of memory around intrinsics nodes in the other PR to this PR and added a unit test for the potential issue.

@openjdk
Copy link

openjdk bot commented Dec 12, 2025

@merykitty This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 12, 2025
@openjdk
Copy link

openjdk bot commented Dec 12, 2025

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 12, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 12, 2025

Webrevs

Node* res_mem = _gvn.transform(new SCMemProjNode(_gvn.transform(str)));
set_memory(res_mem, TypeAryPtr::BYTES);
if (adr_type == TypePtr::BOTTOM) {
set_all_memory(res_mem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. Doesn't StrCompressedCopyNode only write to dst? So the only part of the memory state that it updates is the one for TypeAryPtr::BYTES?

Copy link
Member Author

@merykitty merykitty Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be fixed by appending a MemBarCPUOrderNode on the slice of src?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really great idea! I have implemented it.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Dec 12, 2025
@openjdk
Copy link

openjdk bot commented Dec 12, 2025

@merykitty hotspot, shenandoah have been added to this pull request based on files touched in new commit(s).

// dependency:
// StoreC -> MemBar -> MergeMem -> compress_string -> MergeMem -> CharMem
// -------------------------------->
Node* all_mem = reset_memory();
Copy link
Contributor

@rwestrel rwestrel Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code sequence is used several times. Would it make sense to factor it out in its own method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@merykitty
Copy link
Member Author

/label remove shenandoah hotspot

@openjdk openjdk bot removed shenandoah shenandoah-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Dec 16, 2025
@openjdk
Copy link

openjdk bot commented Dec 16, 2025

@merykitty
The shenandoah label was successfully removed.

The hotspot label was successfully removed.

Copy link
Contributor

@rwestrel rwestrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 17, 2025
@openjdk
Copy link

openjdk bot commented Jan 7, 2026

@merykitty this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout intrinsicsadrtype
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Jan 7, 2026
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jan 7, 2026
@dean-long
Copy link
Member

So after looking at this PR I have learned that C2 can control reordering of memory operations in at least 3 ways: anti-dependencies, memory slices, or membars. Are there are rules-of-thumb on which is best to use? Using a membar seems the most conservative but probably allows fewer optimizations.

By the way, I see that LibraryCallKit::inline_encodeISOArray and corresponding Java method do pretty much the same things a compress. So I tried adding a test for it in TestAntiDependency.java. But to my surprise, it passes, even without the fixes in this PR. I would expect it to fail, because the existing code uses TypeAryPtr::BYTES, so how does it prevent the movement of a char[] store in the test?

@dean-long
Copy link
Member

Dumb question: why are these intrinsic nodes not implemented as MemNodes?

For nodes such as StrInflatedCopyNode, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case

Why is that?

so we should fix it by making the nodes kill all the memory they consume.

Why can't we use MergeMem and memory slices/aliases like regular load and store?

@dean-long
Copy link
Member

This may be unrelated, but I checked to see if we treat Op_EncodeISOArray the same as Op_StrCompressedCopy everywhere. In two places in ConnectionGraph::split_unique_types, we treat them differently. For both we look at in(MemNode::Memory), but for Op_EncodeISOArray we also look at use->in(3). I don't understand this code well enough to decide if this a missing optimization or a correctness issue.

@merykitty
Copy link
Member Author

@dean-long Thanks for taking a look.

So I tried adding a test for it in TestAntiDependency.java. But to my surprise, it passes, even without the fixes in this PR

I have added a test for this method. If it does not fail then adding -XX:+StressGCM -XX:+StressLCM may help.

Dumb question: why are these intrinsic nodes not implemented as MemNodes?

I think it is because only LoadNode and StoreNode are MemNode, even LoadStoreNode does not extend MemNode.

For nodes such as StrInflatedCopyNode, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case

Why is that?

During PhaseIdealLoop::get_late_ctrl, we only check the anti-dependency when a node returns true for is_Load():

  if (n->is_Load() && LCA != early) {
    LCA = get_late_ctrl_with_anti_dep(n->as_Load(), early, LCA);
  }

During PhaseCFG::schedule_late, we only check the anti-dependency when a node has the flag Flag_needs_anti_dependence_check set.

bool Node::needs_anti_dependence_check() const {
  if (req() < 2 || (_flags & Flag_needs_anti_dependence_check) == 0) {
    return false;
  }
  return in(1)->bottom_type()->has_memory();
}

We may fix these places, but since it is a really rare occurrence that a node consumes some memory and produces some but the latter is different from the former, so it is more reasonable to fix the graph at these nodes.

so we should fix it by making the nodes kill all the memory they consume.

Why can't we use MergeMem and memory slices/aliases like regular load and store?

Thanks to Roland's suggestion, now it only kills the 2 slices it concerns with and not the whole memory state.

@merykitty
Copy link
Member Author

This may be unrelated, but I checked to see if we treat Op_EncodeISOArray the same as Op_StrCompressedCopy everywhere. In two places in ConnectionGraph::split_unique_types, we treat them differently. For both we look at in(MemNode::Memory), but for Op_EncodeISOArray we also look at use->in(3). I don't understand this code well enough to decide if this a missing optimization or a correctness issue.

I believe it is because before this change, EncodeISOArray does not consume the memory of the destination like StrCompressed, so it may miss being pushed on the worklist. As a result, checking for in(3) ensures the node is visited. After this change, EncodeISOArray correctly consumes the memory of its destination, so that become unnecessary.

@dean-long
Copy link
Member

I'm still looking at this, but I'm getting confused by all the special cases. I wish C2 handling of memory was more uniform.

@dean-long
Copy link
Member

It looks like we use SCMemProjNode for nodes that both read and write memory, so why is it not used for StrInflatedCopyNode in inflate_string?

@merykitty
Copy link
Member Author

@dean-long SCMemProj is used for nodes that modify memory but still want to return a value. StrInflatedNode does not return a value so it does not need to use an SCMemProj, it can be the memory node itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants