Skip to content

Conversation

@rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Jun 21, 2024

I propose removing PhaseIdealLoop::cast_incr_before_loop() and the
CastII nodes that it adds at counted loop heads.

They were added to prevent nodes to float above the zero trip guard
when the backedge of a counted loop is removed. In particular, when a
range check is hoisted by predication, pre/main/post loops are created
and if one of the main or post loops lose its backedge, an array load
that's control dependent on a predicate above the pre loop could float
above the zero trip guard of the main or post loop. That can no longer
happen AFAICT with changes related to assert predicates. The array
load is now updated to have a control dependency that's below the zero
trip guard.

The reason I'm revisiting this is that I noticed that
PhaseIdealLoop::cast_incr_before_loop() has a bug. When it adds the
CastII, it looks for the loop phi and picks input 1 of the phi it
finds as input to the CastII. To find the loop phi, it starts from
the loop incremement and loop for a use that's a phi and has the loop
head as control. It never checks that the phi it finds is the loop
phi. There can be more than one phi as uses of the increment at the
loop head and it can pick the wrong one. I tried to write a test case
where this would cause a bug but couldn't actually find any use for
the CastII anymore.

In my testing, the only issue when the CastII are not added is that
some IR tests for vectorization fails:

compiler/vectorization/TestPopulateIndex.java
compiler/vectorization/runner/ArrayShiftOpTest.java
compiler/vectorization/runner/LoopArrayIndexComputeTest.java

because removing the CastII causes split if to occur with some nodes
that take the loop phi as input. That then causes pattern matching
during superword to break. I added logic to prevent split if for those
cases.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop() (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19831/head:pull/19831
$ git checkout pull/19831

Update a local copy of the PR:
$ git checkout pull/19831
$ git pull https://git.openjdk.org/jdk.git pull/19831/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19831

View PR using the GUI difftool:
$ git pr show -t 19831

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19831.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 21, 2024

👋 Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 21, 2024

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop()

Reviewed-by: chagedorn, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 45 new commits pushed to the master branch:

  • eb7ead5: 8336873: BasicSplitPaneDivider:oneTouchExpandableChanged() should mention that implementation depends on SplitPane.supportsOneTouchButtons property
  • 0ddcd70: 8335120: assert(!target->can_be_statically_bound() || target == cha_monomorphic_target) failed
  • 26e3d53: 8338716: Re-visit "interrupt handling" in jdk.internal.loader.Resource
  • 72a4900: 8338888: SystemDictionary::class_name_symbol has incorrect length check
  • a8ac287: 8339126: JNI exception pending in Inflater.c
  • d08b5bd: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small
  • d03ec7a: 8339030: frame::print_value_on(outputStream* st, JavaThread *thread) doesn't need thread argument
  • eff6d9c: 8339167: Remove AbstractPoolEntry.PrimitiveEntry to reduce boxing overheads
  • a98ecad: 8338897: Small startup regression remains after JDK-8309622 and JDK-8331932
  • 3d49fb8: 8338103: Stabilize and open source a Swing OGL ButtonResizeTest
  • ... and 35 more: https://git.openjdk.org/jdk/compare/ce83f6af64efd673b83c945765f68e8a3bf89774...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 21, 2024
@openjdk
Copy link

openjdk bot commented Jun 21, 2024

@rwestrel The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 21, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 21, 2024

Webrevs

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to do but could also potentially be a risk. However, I've run t1-7 testing without finding anything and we are early in the release for JDK 24. So, I guess we could take the risk.

// Check for aggressive application of 'split-if' optimization,
bool split_thru_phi_could_prevent_vectorization(Node* n, Node* n_blk);

// Check for aggressive application of 'split-if' optimization,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check for aggressive application of 'split-if' optimization,
// Check for aggressive application of 'split-if' optimization,

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 24, 2024
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@rwestrel
Copy link
Contributor Author

Christian ran performance testing with this change and reported 2 regressions:

Crypto-ChaCha20Poly1305.decrypt - Windows x64:
-1.97%, - 2.25%, -2.27%
openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes-algorithm:NativePRNG-dataSize:64-provider:-shared:true - MacOSX aarch64:
-7.63%, -6.37%, -12.27%

For the regression on windows:
Most of the time is spent in stub routines. Only about 20% of the time is spent in a compiled method javax.crypto.Cipher::doFinal. This patch doesn't affect the stub routines so javax.crypto.Cipher::doFinal have to be slowed down significantly (10%) to have a total slow down of 2%. Christian provided the profiler output of running the benchmark with and without the patch on the system where the regression was observed. The snippets of code where time is spent in javax.crypto.Cipher::doFinal are identical with or without the patch so it's hard to see how this patch could cause a 10% regression on that particular method.

For the regression on macos:
Running the benchmark on a similar system with or without the patch, I see the score can vary quite a bite from one run to the other (by ~10%). Running with a profiler, I see that ~50% of the time is spent in compiled code. I observed that while 50% of the time is spent in a compiled method, it's not always the same method. Inlining decisions change from one run to another. Christian ran performance testing to check the effect of inlining with:

-XX:CompileCommand=option,sun.security.provider.SecureRandom::engineNextBytes,inline

he reported a smaller regression (-2%)
with:

-XX:CompileCommand=option,sun.security.provider.SecureRandom::engineNextBytes,dontinline 

no regression anymore.
I looked at the snippet of compiled code that's reported as the hottest (a counted loop). While the code is largely similar it's a bit different (I see an extra spill and blocks are not laid out the same way) so I also looked at the IR for the same runs and I see no difference. So again, it's hard to see how the patch can cause the regression.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 8, 2024

@rwestrel This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@rwestrel
Copy link
Contributor Author

@vnkozlov what do you think of the comment above about performance regressions? In your opinion, is it enough data to proceed with the change as is?

@vnkozlov
Copy link
Contributor

I am not comfortable with big regression on MacOSX aarch64 even if you can't reproduce it locally. We need to rerun that testing to make sure it is random as you said.

Please, merge latest JDK.

@rwestrel
Copy link
Contributor Author

I am not comfortable with big regression on MacOSX aarch64 even if you can't reproduce it locally. We need to rerun that testing to make sure it is random as you said.

Christian ran performance testing again (thanks Christian!) and there was no regression this time.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run some testing again with latest master, just to be sure.

@chhagedorn
Copy link
Member

Testing passed!

@rwestrel
Copy link
Contributor Author

rwestrel commented Sep 3, 2024

@vnkozlov @chhagedorn thanks for the reviews and testing.

@rwestrel
Copy link
Contributor Author

rwestrel commented Sep 3, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Sep 3, 2024

Going to push as commit 3a88fd4.
Since your change was applied there have been 80 commits pushed to the master branch:

  • 62dad3a: 8339351: Remove duplicate line in FileMapHeader::print
  • 0e6bb51: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only
  • b1163bc: 8256211: assert fired in java/net/httpclient/DependentPromiseActionsTest (infrequent)
  • 03ba37e: 8339169: Remove NaiveHuffman coder
  • a136a85: 8338768: Introduce runtime lookup to check for static builds
  • 9d7d85a: 8339298: Remove unused function declaration poll_for_safepoint
  • 92aafb4: 8338934: vmTestbase/nsk/jvmti/FieldWatch/TestDescription.java tests timeout intermittently
  • 392bdd5: 8339248: RISC-V: Remove li64 macro assembler routine and related code
  • 4f071ce: 8311163: Parallel: Improve large object handling during evacuation
  • b840b13: 8338882: Clarify matching order of MessageFormat subformat factory styles
  • ... and 70 more: https://git.openjdk.org/jdk/compare/ce83f6af64efd673b83c945765f68e8a3bf89774...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 3, 2024
@openjdk openjdk bot closed this Sep 3, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 3, 2024
@openjdk
Copy link

openjdk bot commented Sep 3, 2024

@rwestrel Pushed as commit 3a88fd4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants