Skip to content

Conversation

@merykitty
Copy link
Member

@merykitty merykitty commented Nov 6, 2023

Hi,

When transforming a Phi into a CMove, the threshold is set to be approximately BlockLayoutMinDiamondPercentage, the reason is given:

// BlockLayoutByFrequency optimization moves infrequent branch
// from hot path. No point in CMOV'ing in such case

This sets the default value of the threshold to be around 18%, which is too conservative. The reason also does not make a lot of sense since the important property which makes jumping expensive is not code layout. We should remove this.

Please kindly review, thank you very much.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8319451: PhaseIdealLoop::conditional_move is too conservative (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16524/head:pull/16524
$ git checkout pull/16524

Update a local copy of the PR:
$ git checkout pull/16524
$ git pull https://git.openjdk.org/jdk.git pull/16524/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16524

View PR using the GUI difftool:
$ git pr show -t 16524

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16524.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 6, 2023

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 6, 2023
@openjdk
Copy link

openjdk bot commented Nov 6, 2023

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 6, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 6, 2023

Webrevs

@merykitty
Copy link
Member Author

This is the result of the benchmark running on my machine. The result is consistent with the common knowledge, as the CMove version seems to offer better performance when the uncommon path frequency is above 1%.

                                    Before             After
Benchmark  (freq)  Mode  Cnt    Score     Error    Score    Error  Units
CMove.run       3  avgt    5  179.402 ±   5.504  181.112 ±  8.658  us/op
CMove.run       6  avgt    5  191.884 ±  15.271  189.571 ± 11.785  us/op
CMove.run      10  avgt    5  196.144 ±   8.946  198.864 ±  1.440  us/op
CMove.run      20  avgt    5  287.090 ± 141.540  198.979 ±  4.015  us/op
CMove.run      30  avgt    5  353.721 ± 109.473  198.730 ±  2.302  us/op
CMove.run      60  avgt    5  489.541 ±   4.100  199.645 ±  4.552  us/op
CMove.run     100  avgt    5  675.543 ±  12.810  199.739 ±  2.202  us/op
CMove.run     200  avgt    5  199.176 ±   2.957  199.584 ±  2.111  us/op
CMove.run     300  avgt    5  198.941 ±   2.642  200.112 ±  3.551  us/op
CMove.run     600  avgt    5  199.266 ±   3.999  199.023 ±  1.202  us/op

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, especially the property that run-to-run variance drops in the 1-20% range.

@TobiHartmann told me he'll review and that he has queued up a batch of benchmarks to evaluate this change more widely.

Comment on lines 39 to 40
@Param({"3", "6", "10", "20", "30", "60", "100", "200", "300", "600"})
int freq;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That freq is expressed in "occurrences per thousand" was only obvious after reading the code - perhaps a probability between 0 and 1 (with the appropriate adjustment to r.nextFloat() < freq below) would be slightly more intuitive?

Suggested change
@Param({"3", "6", "10", "20", "30", "60", "100", "200", "300", "600"})
int freq;
@Param({"0.003", "0.006", "0.01", "0.02", "0.03", "0.06", "0.1", "0.2", "0.3", "0.6"})
float freq;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cl4es You are right, I have fixed that.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. All tests passed and performance results look neutral (no statistically significant improvements or regressions).

@TobiHartmann
Copy link
Member

/reviewers 2

@openjdk
Copy link

openjdk bot commented Nov 13, 2023

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8319451: PhaseIdealLoop::conditional_move is too conservative

Reviewed-by: redestad, thartmann, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 73 new commits pushed to the master branch:

  • fde5b16: 8321514: UTF16 string gets constructed incorrectly from codepoints if CompactStrings is not enabled
  • 45a9ade: 8202598: keytool -certreq output contains inconsistent line separators
  • 62b7c5e: 8319647: Few java/lang/System/LoggerFinder/modules tests ignore vm flags
  • 69014cd: 8320682: [AArch64] C1 compilation fails with "Field too big for insn"
  • 5a97dbf: 8322034: Parallel: Remove unused methods in PSAdaptiveSizePolicy
  • 2838a91: 8288989: Make tests not depend on the source code
  • d2ba3b1: 8312150: Remove -Xnoagent option
  • d632d74: 8321820: TestLoadNIdeal fails on 32-bit because -XX:+UseCompressedOops is not recognized
  • ddbbd36: 8320279: Link issues in java.xml module-info.java
  • c8ad7b7: 8321974: Crash in ciKlass::is_subtype_of because TypeAryPtr::_klass is not initialized
  • ... and 63 more: https://git.openjdk.org/jdk/compare/c42535f1110d60d1472080ad4fcadb8acbeb4c4b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 13, 2023
@openjdk
Copy link

openjdk bot commented Nov 13, 2023

@TobiHartmann
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 13, 2023
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

Looking on history of this code and I added it to address JDK-7097546.

But later it was found not correct for some case and I even had similar fix prototype: JDK-8034833. There was additional changes proposed there: in block.hpp and .ad file.

Please, look on attached in that report test and additional code changes there. May be be we can improve more cmove. It could be done separately from this your fix if you want to spend more time on it.

@merykitty
Copy link
Member Author

@vnkozlov I have investigated a little bit.

For these kinds of loops

public static int test(int result, int limit, int mask) { // mask = 15
    for (int i = 0; i < limit; i++) {
      if ((i&mask) == 0) result++; // Non frequent
    }
    return result;
}

Since this loop is perfectly predictable, no threshold of CMove transformation may offer performance advantages. I don't think this predictable branch is common, though.

Regarding the register pressure relating to a CMove, the main issue is that our local code motion does not do much (it does some heuristics around calls; the other element is block-wise latency, which is kind of useless in LCM context), I have tried some heuristics but it is easy to find a case where it is insufficient. I think it is probably a good idea to reimplement LCM using a more optimal algorithm.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for additional investigation.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 7, 2023
@merykitty
Copy link
Member Author

@TobiHartmann @vnkozlov @cl4es Thanks a lot for your reviews and testings, if there is nothing that concerns you, I will integrate the patch,

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work on this and for picking up on my suggestions for the microbenchmark! Looking forward to see the effects of this on a wider selection of benchmarks.

@merykitty
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 19, 2023

Going to push as commit ac968c3.
Since your change was applied there have been 113 commits pushed to the master branch:

  • 0ad6c9e: 8322255: Generational ZGC: ZPageSizeMedium should be set before MaxTenuringThreshold
  • fff2e58: 8322195: RISC-V: Minor improvement of MD5 instrinsic
  • 7b4d62c: 8322300: Remove redundant arg in PSAdaptiveSizePolicy::adjust_promo_for_pause_time
  • 76637c5: 8321648: Integral gather optimized mask computation.
  • 59073fa: 8322154: RISC-V: JDK-8315743 missed change in MacroAssembler::load_reserved
  • 808a039: 8321815: Shenandoah: gc state should be synchronized to java threads only once per safepoint
  • 459957f: 8322062: com/sun/jdi/JdwpAllowTest.java does not performs negative testing with prefix length
  • b98d13f: 8259637: java.io.File.getCanonicalPath() returns different values for same path
  • 4f3de09: 8321940: Improve CDSHeapVerifier in handling of interned strings
  • 1fde8b8: 8321933: TestCDSVMCrash.java spawns two processes
  • ... and 103 more: https://git.openjdk.org/jdk/compare/c42535f1110d60d1472080ad4fcadb8acbeb4c4b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 19, 2023
@openjdk openjdk bot closed this Dec 19, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 19, 2023
@openjdk
Copy link

openjdk bot commented Dec 19, 2023

@merykitty Pushed as commit ac968c3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants