Skip to content

Conversation

@franferrax
Copy link
Contributor

@franferrax franferrax commented Aug 6, 2025

Hi, this pull request is a second take of 1383fec, by updating the CmpUNode type as either TypeInt::CC_LE (case 1a) or TypeInt::CC_LT (case 1b) instead of updating the BoolNode type as TypeInt::ONE.

With this approach a56cd37 becomes unnecessary. Additionally, having the right type in CmpUNode could potentially enable further optimizations.

Testing

In order to evaluate the changes, the following testing has been performed:


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26666/head:pull/26666
$ git checkout pull/26666

Update a local copy of the PR:
$ git checkout pull/26666
$ git pull https://git.openjdk.org/jdk.git pull/26666/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26666

View PR using the GUI difftool:
$ git pr show -t 26666

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26666.diff

Using Webrev

Link to Webrev Comment

This partially reverts commit a56cd37,
which will no longer be needed after this change.
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 6, 2025

👋 Welcome back fferrari! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 6, 2025

@franferrax This change is no longer ready for integration - check the PR body for details.

@openjdk
Copy link

openjdk bot commented Aug 6, 2025

@franferrax The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Aug 6, 2025
@franferrax franferrax marked this pull request as ready for review August 7, 2025 10:39
@franferrax
Copy link
Contributor Author

@rwestrel / @TobiHartmann / @chhagedorn: this is my first contribution in C2 besides OJVG reviews and backports, please let me know if I should be testing something else.

@tabjy: as the original 1383fec author, I would greatly appreciate an additional review from you.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 7, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 7, 2025

Webrevs

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this! I have some small suggestions, otherwise, it looks good to me!

Additionally, having the right type in CmpUNode could potentially enable further optimizations.

Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test.

I'll also give this a spin in our testing.

//
// (1a) and (1b) is covered by this method since we can directly return the corresponding TypeInt::CC_*
// while (2) is covered in BoolNode::Ideal since we create a new non-constant node (see [CMPU_MASK]).
const Type* CmpUNode::Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to directly name these:
in1 -> andI
in2- > rhs

Then it's easier to follow the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I was going to use similar names and later regretted. Suggestion accepted in 27ed1a3.

}

return _test.cc2logical(input_type);
return _test.cc2logical( phase->type( in(1) ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _test.cc2logical( phase->type( in(1) ) );
return _test.cc2logical(phase->type(in(1)));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion accepted in 27ed1a3.

// |
// Bool
//
void PhaseCCP::push_bool_with_cmpu_and_mask(Unique_Node_List& worklist, const Node* use) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to double-check but I think it's fine to remove the notification code since we already have push_cmpu() in place which looks through the AddI:

// CmpU nodes can get their type information from two nodes up in the graph (instead of from the nodes immediately
// above). Make sure they are added to the worklist if nodes they depend on are updated since they could be missed
// and get wrong types otherwise.
void PhaseCCP::push_cmpu(Unique_Node_List& worklist, const Node* use) const {
uint use_op = use->Opcode();
if (use_op == Op_AddI || use_op == Op_SubI) {
for (DUIterator_Fast imax, i = use->fast_outs(imax); i < imax; i++) {
Node* cmpu = use->fast_out(i);
const uint cmpu_opcode = cmpu->Opcode();
if (cmpu_opcode == Op_CmpU || cmpu_opcode == Op_CmpU3) {
// Got a CmpU or CmpU3 which might need the new type information from node n.
push_if_not_bottom_type(worklist, cmpu);
}
}
}
}

So, whenever m or 1 changes, we will re-add the CmpU to the CCP worklist with push_cmpu(). The x does not matter for the application of Value_cmpu_and_mask().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was oversimplifying the problem, my way of thinking it was the following one:

m    x  m    1
 \  /    \  /
 AndI    AddI       grandparents
    \    /
     CmpU              parent
      |
     Bool            grandchild

"As we were updating a grandchild based on its grandparents, we needed an ad-hoc worklist push for the grandchild. Since we now update the type of CmpU based on its parents, the canonical parent-to-child propagations should work, and we don't need any ad-hoc grandparents-to-grandchild worklist push anymore."

But as you noted, non-immediate CmpU inputs such as m or 1 can change and should affect the CmpU type. Luckily, this already was the case for previous CmpU optimizations.


For case 1a, we don't need PhaseCCP::push_cmpu because m is also an immediate input of CmpU.

m    x
 \  /
 AndI     m
    \    /
     CmpU
      |
     Bool

I'm now realizing this was a very lucky situation. The AndI input isn't problematic even when PhaseCCP::push_cmpu doesn't handle the use_op == Op_AndI case, because:

  • x does not affect the application of Value_cmpu_and_mask()
  • In case 1a, m is a direct input of CmpU
  • In case 1b, the AddI input is handled in PhaseCCP::push_cmpu (use_op == Op_AddI)

Please let me know if you think we should add a comment in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good summary! Thanks for double-checking again. It's indeed only for 1b a probably that's handled by push_cmpu(). It probably would not hurt to add a comment that push_cmpu handles this case, just to be sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I added the comments in 32a7940.

//------------------------------CmpUNode---------------------------------------
// Compare 2 unsigned values (integer or pointer), returning condition codes (-1, 0 or 1).
class CmpUNode : public CmpNode {
static const Type* Value_cmpu_and_mask(PhaseValues*, const Node*, const Node*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually add matching parameter names as found in the source file:

  static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* in1, const Node* in2);

or with the renaming above:

  static const Type* Value_cmpu_and_mask(PhaseValues* phase, const Node* andI, const Node* rhs);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion accepted in 27ed1a3.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 11, 2025
Copy link
Contributor

@galderz galderz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @franferrax. Did you consider adding an IR test or similar that would expose the inconsistent state? Would it be feasible?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 12, 2025
@franferrax
Copy link
Contributor Author

Hi @galderz,

I'm afraid I had to be a bit opaque because this was partially discussed in the VG.

I was just referring to the fact that the CmpUNode type was being kept as TypeInt::CC, while we know more than that:

  • 1a. (x & m) <=u m and (m & x) <=u m are always true, so CmpU(x & m, m) and CmpU(m & x, m) are known to be TypeInt::CC_LE
  • 1b. (x & m) <u m + 1 and (m & x) <u m + 1 are always true for m != -1, so CmpU(x & m, m + 1) and CmpU(m & x, m + 1) are known to be TypeInt::CC_LT (when m != -1)

In both cases the BoolNode type was being changed to TypeInt::ONE but we weren't updating the CmpUNode type. So the BoolNode was incorporating the 1a/1b knowledge while the CmpUNode was not (that is the "inconsistency").

Since a BoolNode with TypeInt::ONE is removed, I don't know if there is a way to check the mentioned inconsistency with an IR test. What we do have is the reproducer for a56cd37, feel free to reach me internally for further details.

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update looks good, thanks! I'll run some testing and report back again.

Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test.

Just double-checking, were you able to find such a test which now improves the IR with the better type info and CmpU while we could not with the old code? Otherwise, you could also file a follow-up RFE.

// |
// Bool
//
void PhaseCCP::push_bool_with_cmpu_and_mask(Unique_Node_List& worklist, const Node* use) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good summary! Thanks for double-checking again. It's indeed only for 1b a probably that's handled by push_cmpu(). It probably would not hurt to add a comment that push_cmpu handles this case, just to be sure.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 13, 2025
@franferrax
Copy link
Contributor Author

@chhagedorn

Could you already find some examples, where this change gives us an improved IR? If so, you could also add it as IR test.

Just double-checking, were you able to find such a test which now improves the IR with the better type info and CmpU while we could not with the old code? Otherwise, you could also file a follow-up RFE.

Sorry for not replying that, I'm working on it.

We were explicitly matching the BoolNode tests, so let's explore the tests we were previously discarding.

For case 1a, we were explicitly matching BoolTest::le, but now CmpUNode has TypeInt::CC_LE reflecting the fact that m & x ≤u m is always true, so:

Test Symbolic representation Result Improved IR
BoolTest::eq m & x =u m unknown no
BoolTest::ne m & x ≠u m unknown no
BoolTest::le m & x ≤u m true no (old optimization)
BoolTest::ge m & x ≥u m unknown no
BoolTest::lt m & x <u m unknown no
BoolTest::gt m & x >u m false yes

For case 1b, we were explicitly matching BoolTest::lt, but now CmpUNode has TypeInt::CC_LT reflecting the fact that m & x <u m + 1 is always true (if m ≠ -1), so:

Test Symbolic representation Result if m ≠ -1 Improved IR
BoolTest::eq m & x =u m + 1 false yes
BoolTest::ne m & x ≠u m + 1 true yes
BoolTest::le m & x ≤u m + 1 true yes
BoolTest::ge m & x ≥u m + 1 false yes
BoolTest::lt m & x <u m + 1 true no (old optimization)
BoolTest::gt m & x >u m + 1 false yes

I will work on adding IR tests for these cases.

Regarding real-world use cases, we need to rule out BoolTest::lt, as it didn't improve for case 1a and was already optimized in old code for case 1b.

I've found some possible candidates but haven't fully analyzed them yet:

@tabjy
Copy link
Member

tabjy commented Aug 13, 2025

Thank you @franferrax for catching and addressing the inconsistent state. I neglected that in my original PR. I think it would be beneficial to include your tables of the two cases in the comments too. Thank you for the hard work.

Add a comment with the BoolTest::cc2logical inferences tables, as
suggested by @tabjy.

Also, add a comment explaining how PhaseCCP::push_cmpu is handling
grandparent updates in the case 1b, as agreed with @chhagedorn.
According to my IGV observations, these inversions aren't necessarily
effective. Also, I assume it is safe to remove them because if I apply
this change to the master branch, the test still passes (tested at
f95af74).
I also checked the test is now failing in the master branch (at
f95af74).
@franferrax
Copy link
Contributor Author

Thank you @franferrax for catching and addressing the inconsistent state. I neglected that in my original PR. I think it would be beneficial to include your tables of the two cases in the comments too. Thank you for the hard work.

@tabjy thanks for the suggestion, I added the tables in 32a7940.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 14, 2025
@franferrax
Copy link
Contributor Author

franferrax commented Aug 14, 2025

Hi @chhagedorn,

I added the new tests in e6b1cb8. One problem I'm facing is that I'm unable to generate Bool nodes with arbitrary BoolTest values. Even if I try the assert inversions I removed in 10e1e3f, C2 has preference for BoolTest::ne, BoolTest::le and BoolTest::lt. Instead of using BoolTest::eq, BoolTest::gt or BoolTest::ge, it swaps what is put in IfTrue and IfFalse.

Even if javac generates an ifeq and an ifne with the same inputs, instead of a single CmpU with two Bools (BoolTest::eq and BoolTest::ne), I get a single Bool (BoolTest::ne) with two If (one of them swapping IfTrue with IfFalse). I guess this is some sort of canonicalization to enable further optimizations.

Do you know a way to influence the Bool's BoolTest value? Or @rwestrel do you?

This means the following 8 cases are not really testing what they claim, but repeating other cases with IfTrue and IfFalse swapped:

  • testCase1aOptimizeAsFalseForGT(xm|mx) (they should use BoolTest::gt, but use BoolTest::le)
  • testCase1bOptimizeAsFalseForEQ(xm|mx) (they should use BoolTest::eq, but use BoolTest::ne)
  • testCase1bOptimizeAsFalseForGE(xm|mx) (they should use BoolTest::ge, but use BoolTest::lt)
  • testCase1bOptimizeAsFalseForGT(xm|mx) (they should use BoolTest::gt, but use BoolTest::le)

Even if we don't find a way to influence the BoolTest, the cases are still valid and can be kept (just in case the described behaviour changes).

Correctness of the tests with the following name format should be
checked in the TestFramework.run() JVM process, with the C2 compiled
version of these methods. TestFramework's warmup ensures this.

testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx)
@franferrax
Copy link
Contributor Author

Learning a bit more about the IR tests framework, I noticed testCorrectness isn't probably doing what we want.

It should execute the compiled versions of the following @Test methods:

testCase(1a|1b)(OptimizeAsTrue|OptimizeAsFalse)For(EQ|NE|LE|GE|LT|GT)(xm|mx)

But the @Test methods warmup only occurs for TestFramework.run() and in a different JVM process.

So directly invoking testCorrectness outside TestFramework.run() is executed in the parent JVM without any warmup.

I think I fixed this in e2f8a43 by making testCorrectness a @Runner of those @Test methods, but please confirm.

@franferrax
Copy link
Contributor Author

Absence note

Today is the last day before a ~2 weeks vacation, so my next working day is Monday, September 1st.

Please feel free to keep giving feedback and/or reviews, and I will continue when I'm back.

Cheers,
Francisco

rhs_m = rhs->in(1);
const TypeInt* rhs_m_type = phase->type(rhs_m)->isa_int();
// Exclude any case where m == -1 is possible.
if (rhs_m_type != nullptr && (rhs_m_type->_lo > -1 || rhs_m_type->_hi < -1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use !rhs_m_type->contains(-1)

Copy link
Contributor Author

@franferrax franferrax Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted in 25aa9d7. Simple smoke-test check: builds and the IR test passes.

@chhagedorn
Copy link
Member

Hi @franferrax, hope you had a good vacation!

Hi @chhagedorn,

I added the new tests in e6b1cb8. One problem I'm facing is that I'm unable to generate Bool nodes with arbitrary BoolTest values. Even if I try the assert inversions I removed in 10e1e3f, C2 has preference for BoolTest::ne, BoolTest::le and BoolTest::lt. Instead of using BoolTest::eq, BoolTest::gt or BoolTest::ge, it swaps what is put in IfTrue and IfFalse.

Even if javac generates an ifeq and an ifne with the same inputs, instead of a single CmpU with two Bools (BoolTest::eq and BoolTest::ne), I get a single Bool (BoolTest::ne) with two If (one of them swapping IfTrue with IfFalse). I guess this is some sort of canonicalization to enable further optimizations.

Do you know a way to influence the Bool's BoolTest value? Or @rwestrel do you?

This means the following 8 cases are not really testing what they claim, but repeating other cases with IfTrue and IfFalse swapped:

  • testCase1aOptimizeAsFalseForGT(xm|mx) (they should use BoolTest::gt, but use BoolTest::le)
  • testCase1bOptimizeAsFalseForEQ(xm|mx) (they should use BoolTest::eq, but use BoolTest::ne)
  • testCase1bOptimizeAsFalseForGE(xm|mx) (they should use BoolTest::ge, but use BoolTest::lt)
  • testCase1bOptimizeAsFalseForGT(xm|mx) (they should use BoolTest::gt, but use BoolTest::le)

Even if we don't find a way to influence the BoolTest, the cases are still valid and can be kept (just in case the described behaviour changes).

Hm, that's a good point. Parse::do_if() indeed always canonicalizes the Bool nodes... But I was sure we can still somehow end up with non-canonicalized versions again with some tricks. I was curious and played around with some examples and could indeed find test cases for gt, ge , and eq.

I was then also thinking about notification code in IGVN. We already concluded further up that it's not needed for CCP because CmpU nodes below AddI nodes are put to the worklist again. However, with IGVN, we could modify the graph above the AndI as well. We miss notification code for CmpU below AndI. I changed my test cases further to also run into such a missing optimization case. When run with -XX:VerifyIterativeGVN=1110, we indeed get such an assertion failure with the proposed patch (it also triggers an assertion failure already with mainline code). This could be easily fixed with:

diff --git a/src/hotspot/share/opto/phaseX.cpp b/src/hotspot/share/opto/phaseX.cpp
--- a/src/hotspot/share/opto/phaseX.cpp	(revision afa8e79ba1a76066cf969cb3b5f76ea804780872)
+++ b/src/hotspot/share/opto/phaseX.cpp	(date 1756472877934)
@@ -2553,7 +2553,7 @@
   if (use_op == Op_AndI || use_op == Op_AndL) {
     for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) {
       Node* u = use->fast_out(i2);
-      if (u->Opcode() == Op_RShiftI || u->Opcode() == Op_RShiftL) {
+      if (u->Opcode() == Op_RShiftI || u->Opcode() == Op_RShiftL || u->Opcode() == Op_CmpU) {
         worklist.push(u);
       }
     }

Here are the test cases with some further comments explaining how it works and how to run it:
Test.java

This will produce the following IR (at PhaseIdealLoop1):
image
I guess you could easily transform these into IR tests and check that we have 4 CmoveI/CmpU nodes in PhaseIdealLoop1 and then no more in PhaseIdealLoop2. What do you think?

@franferrax
Copy link
Contributor Author

Hi @chhagedorn, thank you for the additional work and your insights. This is much appreciated from a learner perspective.

I didn't fully analyze the Test.java you provided yet, but wanted to check if you are aiming to include the missing IGVN notification code as part of this issue (and its corresponding test). Or are you working on an independent issue?

My availability will be limited as the October CPU approaches, but it will try to find some timeboxes to make TestBoolNodeGVN.java emit the right test cases for gt, ge , and eq.

@chhagedorn
Copy link
Member

Hi @chhagedorn, thank you for the additional work and your insights. This is much appreciated from a learner perspective.

Sure, you're welcome :-)

I didn't fully analyze the Test.java you provided yet, but wanted to check if you are aiming to include the missing IGVN notification code as part of this issue (and its corresponding test). Or are you working on an independent issue?

I think you could squeeze that in here as well. With mainline, you probably need a different notification code because we need to add the Bool node instead of the CmpU node. But with this patch, we only require the CmpU. So, I guess it's not worth to fix it separately only to update it again with this patch.

My availability will be limited as the October CPU approaches, but it will try to find some timeboxes to make TestBoolNodeGVN.java emit the right test cases for gt, ge , and eq

Sounds good, no hurry. Thanks for taking another look to improve the test!

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 2, 2025

@franferrax This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 30, 2025

@franferrax This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Oct 30, 2025
@franferrax
Copy link
Contributor Author

/open

@openjdk openjdk bot reopened this Nov 1, 2025
@openjdk
Copy link

openjdk bot commented Nov 1, 2025

@franferrax This pull request is now open

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 30, 2025

@franferrax This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

5 participants