Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 #705

Closed
wants to merge 1 commit into from

Conversation

ericcaspole
Copy link

@ericcaspole ericcaspole commented Oct 16, 2020

We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems.

Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set.

JBS: https://bugs.openjdk.java.net/browse/JDK-8254913
Thanks,
Eric


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8254913: Increase InlineSmallCode default from 2000 to 2500 for x64

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/705/head:pull/705
$ git checkout pull/705

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Oct 16, 2020
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 16, 2020

Hi @ericcaspole, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user ericcaspole" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Oct 16, 2020

@ericcaspole The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 16, 2020
@AzeemJiva
Copy link

AzeemJiva commented Oct 16, 2020

Do you have any results? Changing defaults without published results makes the process opaque.

@voitylov
Copy link

It would be interesting to understand the impact on code cache size as well and see a definition of OCI system so that the other members of the community could check some other HW.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the global inlining defaults without providing the justifying performance data is not acceptable. @cl4es, please rescind your review if you can, so that change is not integrated by accident.

@bridgekeeper bridgekeeper bot removed the oca Needs verification of OCA signatory status label Oct 19, 2020
@openjdk
Copy link

openjdk bot commented Oct 19, 2020

@ericcaspole This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254913: Increase InlineSmallCode default from 2000 to 2500 for x64

Reviewed-by: redestad, shade, azeemj

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the master branch:

  • 6020991: 8255068: [JVMCI] errors during compiler creation can be hidden
  • 8d9e6d0: 8255041: Zero: remove old JSR 292 support leftovers
  • 0efdde1: 8238669: Long.divideUnsigned is extremely slow for certain values (Needs to be Intrinsic)
  • 365f19c: 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics
  • f813a28: 8254692: (se) Clarify the behaviour of the non-abstract SelectorProvider::inheritedChannel
  • c9269bf: 8255036: Shenandoah: Reset GC state for root verifier
  • 839f01d: 8242068: Signed JAR support for RSASSA-PSS and EdDSA
  • e559bd2: 8254889: name_and_sig_as_C_string usages in frame coding without ResourceMark
  • da97ab5: 8253474: Javadoc clean up in HttpsExchange, HttpsParameters, and HttpsServer
  • 7e26404: 8255000: C2: Unify IGVN processing when loop opts are over
  • ... and 97 more: https://git.openjdk.java.net/jdk/compare/07ec35e2e50bde9d3cf9e35733837dfd377ef1ab...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 19, 2020
@mlbridge
Copy link

mlbridge bot commented Oct 19, 2020

Webrevs

@cl4es
Copy link
Member

cl4es commented Oct 19, 2020

What constitutes a full promotion is proprietary, but I can verify no regressions across at least:

  • SPECjvm2008*
  • SPECjbb2005
  • SPECjbb2015**
  • Dacapo
  • Renaissance

Notable improvements on Renaissance-DecTree (6%), Renaissance-Reactors (5%), Dacapo-lusearch (2%)
Various micros in the OpenJDK corpus improve significantly, e.g., AES_ECB_NoPadding 10% ChaCha20Poly1305 7%. No detected regressions.

I agree that some more info about code cache utilization across a more complete range of workloads is a reasonable request, but we've detected no regressions in our sample of footprint tests. I'm sure @ericcaspole won't integrate this without first coming back with more data on this.

* not running the broken compiler sub-benchmarks
** interestingly most SPECjbb2015 publications in recent years have included -XX:InlineSmallCode=10k or more, see https://www.spec.org/jbb2015/results/

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is OK, looks good then.

@ericcaspole
Copy link
Author

The motivation for this change is, among other things, that we discovered a big regression in this micro in 11 vs 8:

https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/NewInstance.java

jdk1.8.0_271:

Benchmark Mode Cnt Score Error Units
NewInstance.threeDifferentProtected avgt 5 61.413 ± 0.078 ns/op
NewInstance.threeDifferentPublic avgt 5 56.705 ± 0.096 ns/op
NewInstance.threeDifferentPublicConstant avgt 5 43.316 ± 0.094 ns/op
NewInstance.threeDifferentPublicFinal avgt 5 56.586 ± 0.140 ns/op
NewInstance.threeSameProtected avgt 5 46.714 ± 0.045 ns/op
NewInstance.threeSamePublic avgt 5 44.565 ± 0.058 ns/op

jdk-11.0.9:

Benchmark Mode Cnt Score Error Units
NewInstance.threeDifferentProtected avgt 5 851.560 ± 44.299 ns/op
NewInstance.threeDifferentPublic avgt 5 854.080 ± 7.619 ns/op
NewInstance.threeDifferentPublicConstant avgt 5 911.749 ± 33.695 ns/op
NewInstance.threeDifferentPublicFinal avgt 5 830.804 ± 21.219 ns/op
NewInstance.threeSameProtected avgt 5 785.063 ± 2.580 ns/op
NewInstance.threeSamePublic avgt 5 792.167 ± 0.538 ns/op

jdk-11.0.9 w/ -XX:InlineSmallCode=2500:

Benchmark Mode Cnt Score Error Units
NewInstance.threeDifferentProtected avgt 5 58.091 ± 0.012 ns/op
NewInstance.threeDifferentPublic avgt 5 55.514 ± 0.062 ns/op
NewInstance.threeDifferentPublicConstant avgt 5 43.233 ± 0.079 ns/op
NewInstance.threeDifferentPublicFinal avgt 5 54.955 ± 0.103 ns/op
NewInstance.threeSameProtected avgt 5 44.216 ± 0.013 ns/op
NewInstance.threeSamePublic avgt 5 44.214 ± 0.009 ns/op

This carries on into 14, then other fixes in 15 make it go bi-modal where it may or may not get inlined run to run.

Also, increasing InlineSmallCode for x64 to 2500 makes it now equal to ARM64, which we think is sensible.

Here are some results on a recent internal testing build of jdk-16 default vs -XX:InlineSmallCode=2500.
All these results were statistically insignificant.

Name Pct-Diff

"SPECjvm2008-Compress-G1", 0.889,
"SPECjvm2008-Crypto.aes-G1", 2.884,
"SPECjvm2008-Crypto.rsa-G1", 0.230,
"SPECjvm2008-Crypto.signverify-G1", 0.824,
"SPECjvm2008-Derby-ParGC", 0.878,
"SPECjvm2008-FFT.large-G1", 1.626,
"SPECjvm2008-FFT.small-G1", 2.023,
"SPECjvm2008-LU.large-ParGC", -14.624,
"SPECjvm2008-LU.small-ParGC", -0.633,
"SPECjvm2008-MPEG-ParGC", 0.326,
"SPECjvm2008-MonteCarlo-G1", 3.054,
"SPECjvm2008-MonteCarlo-ZGC", -2.247,
"SPECjvm2008-SOR.large-ParGC", 10.729,
"SPECjvm2008-SOR.small-ParGC", 0.127,
"SPECjvm2008-Serial-ParGC", 0.909,
"SPECjvm2008-Sparse.large-G1", 0.204,
"SPECjvm2008-Sparse.small-G1", 0.594,
"SPECjvm2008-XML.transform-G1", 0.349,
"SPECjvm2008-XML.validation-G1", 0.378,

These tests are run on the generally available OCI BM2.52 platform. See https://docs.cloud.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm

@AzeemJiva
Copy link

Should this be backported to 11 then?

@ericcaspole
Copy link
Author

/integrate

@openjdk openjdk bot closed this Oct 21, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 21, 2020
@openjdk
Copy link

openjdk bot commented Oct 21, 2020

@ericcaspole Since your change was applied there have been 109 commits pushed to the master branch:

  • 56ea490: 8233343: Deprecate -XX:+CriticalJNINatives flag which implements JavaCritical native functions
  • 615b759: 8255070: Shenandoah: Use single thread for concurrent CLD liveness test
  • 6020991: 8255068: [JVMCI] errors during compiler creation can be hidden
  • 8d9e6d0: 8255041: Zero: remove old JSR 292 support leftovers
  • 0efdde1: 8238669: Long.divideUnsigned is extremely slow for certain values (Needs to be Intrinsic)
  • 365f19c: 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics
  • f813a28: 8254692: (se) Clarify the behaviour of the non-abstract SelectorProvider::inheritedChannel
  • c9269bf: 8255036: Shenandoah: Reset GC state for root verifier
  • 839f01d: 8242068: Signed JAR support for RSASSA-PSS and EdDSA
  • e559bd2: 8254889: name_and_sig_as_C_string usages in frame coding without ResourceMark
  • ... and 99 more: https://git.openjdk.java.net/jdk/compare/07ec35e2e50bde9d3cf9e35733837dfd377ef1ab...master

Your commit was automatically rebased without conflicts.

Pushed as commit 85a8949.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Oct 22, 2020

I've got ~4% regression after adding -XX:InlineSmallCode=2500 in some JSON parsing benchmarks where huge generated methods with loops that have long jumps are tested.

Steps to reproduce:

  1. (Optional) Check if sbt is installed
sbt -version
  1. Clone a shallow copy of the jsoniter-scala repo:
git clone --depth 1 git@github.com:plokhotnyuk/jsoniter-scala.git
  1. Run benchmarks with default options:
sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 -f 5 TwitterAPIReading.jsoniterScala'
  1. Run benchmarks with an additional -XX:InlineSmallCode=2500 option:
sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 -f 5 -jvmArgsAppend "-XX:InlineSmallCode=2500" TwitterAPIReading.jsoniterScala'

Below are results from my notebook:

by default

image

with -XX:InlineSmallCode=2500

image

@mlbridge
Copy link

mlbridge bot commented Oct 23, 2020

Mailing list message from eric.caspole at oracle.com on hotspot-compiler-dev:

Andriy,
OK thanks for the report.
Eric

On 10/22/20 4:05 AM, Andriy Plokhotnyuk wrote:

@mlbridge
Copy link

mlbridge bot commented Oct 24, 2020

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 21/10/2020 20:45, Azeem Jiva wrote:

Should this be backported to 11 then?

Probably not. It's not the sort of thing that should be backported
to a release without convincing reproduceable evidence.

--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
7 participants