Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272873: C2: Inlining should not depend on absolute call site counts #5238

Closed
wants to merge 1 commit into from

Conversation

veresov
Copy link
Contributor

@veresov veresov commented Aug 24, 2021

C2 considers absolute call site counts in its inlining decisions, which seems very wrong considering the asynchronous nature of profiling and background compilation (See InlineTree::should_inline()). It causes substantial over-inlining, which in presence of a depth-first inlining can lead to an early cut off. It also is inherently unstable. C2 already uses call frequency as an additional factor and it's better to consider only that in the inlining heuristic. I did extensive benchmarking it yielded almost no losses and single-digit wins (up to 5%) on some benchmarks. I think it's safe to remove/deprecate InlineFrequencyCount and continue using InlineFrequencyRatio instead. I found that converting the frequency computation to FP and setting InlineFrequencyRatio=0.25 (inline a method that is called a least 25% of the time) works best.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8272873: C2: Inlining should not depend on absolute call site counts

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5238/head:pull/5238
$ git checkout pull/5238

Update a local copy of the PR:
$ git checkout pull/5238
$ git pull https://git.openjdk.java.net/jdk pull/5238/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5238

View PR using the GUI difftool:
$ git pr show -t 5238

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5238.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 24, 2021

👋 Welcome back iveresov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 24, 2021
@openjdk
Copy link

openjdk bot commented Aug 24, 2021

@veresov The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org labels Aug 24, 2021
@openjdk
Copy link

openjdk bot commented Aug 24, 2021

@veresov
The hotspot-compiler label was successfully added.

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.org label Aug 24, 2021
@openjdk
Copy link

openjdk bot commented Aug 24, 2021

@veresov
The hotspot label was successfully removed.

@mlbridge
Copy link

mlbridge bot commented Aug 24, 2021

Webrevs

@veresov veresov changed the title 8272873: Inlining should not depend on absolute call site counts 8272873: C2: Inlining should not depend on absolute call site counts Aug 24, 2021
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see InlineFrequencyCount flag in java man page so you don't need to fix it.

@@ -36,7 +36,7 @@
import jdk.test.lib.Platform;
public class IntxTest {
private static final String FLAG_NAME = "OnStackReplacePercentage";
private static final String FLAG_DEBUG_NAME = "InlineFrequencyCount";
private static final String FLAG_DEBUG_NAME = "BciProfileWidth";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this addition for BciProfileWidth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test requires any diagnostic intx flag. I just found the first available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

"Ratio of call site execution to caller method invocation") \
range(0, max_jint) \
\
product_pd(intx, InlineFrequencyCount, DIAGNOSTIC, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to file CSR to remove product flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a diagnostic flag, so it doesn't need a CSR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed that.

@vnkozlov
Copy link
Contributor

Please run mach5 testing too.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.
Do testing.

@openjdk
Copy link

openjdk bot commented Aug 24, 2021

@veresov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272873: C2: Inlining should not depend on absolute call site counts

Reviewed-by: kvn, vlivanov, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • 7212561: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah
  • e36cbd8: 8242847: G1 should not clear mark bitmaps with no marks
  • 2ef6871: 8272447: Remove 'native' ranked Mutex
  • 63e062f: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions
  • c5a2712: 8272850: Drop zapping values in the Zap* option descriptions
  • 1e3e333: 8272884: Make VoidClosure::do_void pure virtual
  • 0f428ca: 8272570: C2: crash in PhaseCFG::global_code_motion
  • b17b821: 8272639: jpackaged applications using microphone on mac
  • 0e7288f: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions
  • 6ace805: 8272856: DoubleFlagWithIntegerValue uses G1GC-only flag
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/22ef4f065315c1238216849ce9ce71b8207b43f8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 24, 2021
@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

@vnkozlov I ran mach5 with this before. But, let me to do it again to make sure there are no unpleasant surprises.

@iwanowww
Copy link

Nice finding, Igor!

It looks like 2 unrelated changes (InlineFrequencyCount removal and InlineFrequencyRatio adjustment) are merged into one patch. Does it make sense to handle them separately?

@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

Well it (the change to the inlining heuristic) removes the last mention of InlineFrequencyCount and needs to change the type of InlineFrequencyRatio, so yes, I'd like them to be in one change.

@dean-long
Copy link
Member

Are there any performance regressions worth mentioning?
Do the counters involved in InlineFrequencyRatio decay so that we favor recent history over ancient history?

@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

No, no real regressions. The counters don't decay, both numbers essentially come from an MDO, which currently doesn't decay.

@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

@vnkozlov The testing is squeaky clean.

@dean-long
Copy link
Member

It looks like we still have UseCounterDecay, which used to trigger decay in older compilation policies. Is that flag obsolete now and replaced by something better? It seems like that's part of the reason InlineFrequencyCount is now less useful, because the counts no longer decay.

@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

Yeah, UseCounterDecay does nothing and needs to be removed. Decay would actually work in an interesting way, it would cause you to inline less in general into a method that hasn't been invoked a lot. So, it would, in essence, reduce the quality of methods that are mildly hot, saving possibly on compilation resources. It's an interesting effect, but I doubt that it was the intention behind InlineFrequencyCount.

@veresov
Copy link
Contributor Author

veresov commented Aug 24, 2021

Btw, counters in an MDO never decayed. What did decay were main invocation counters, and that was primarily a part of the compilation / code cache management policy.

@veresov
Copy link
Contributor Author

veresov commented Aug 25, 2021

@iwanowww & @dean-long Good to go with this?

@dean-long
Copy link
Member

How did the old value of InlineFrequencyRatio=20 ever make sense? Unless the call site is virtual and is calling a different method, it seems like the ratio would be <= 1, and a ratio > 1 would mean we are inlining the wrong method.

@veresov
Copy link
Contributor Author

veresov commented Aug 25, 2021

A > 1 ratio means the call site is evaluated more than once per invocation - it's in a loop. So 20 means that the call site should be a loop with more than 20 iterations. It kind of made sense.

The current value of 0.25 means that we'll try to inline call sites that are evaluated in at least 25% of the invocations of the caller.

@veresov
Copy link
Contributor Author

veresov commented Aug 26, 2021

Thanks, guys, for the reviews!

@veresov
Copy link
Contributor Author

veresov commented Aug 26, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Aug 26, 2021

Going to push as commit 673ce7e.
Since your change was applied there have been 22 commits pushed to the master branch:

  • 7212561: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah
  • e36cbd8: 8242847: G1 should not clear mark bitmaps with no marks
  • 2ef6871: 8272447: Remove 'native' ranked Mutex
  • 63e062f: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions
  • c5a2712: 8272850: Drop zapping values in the Zap* option descriptions
  • 1e3e333: 8272884: Make VoidClosure::do_void pure virtual
  • 0f428ca: 8272570: C2: crash in PhaseCFG::global_code_motion
  • b17b821: 8272639: jpackaged applications using microphone on mac
  • 0e7288f: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions
  • 6ace805: 8272856: DoubleFlagWithIntegerValue uses G1GC-only flag
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/22ef4f065315c1238216849ce9ce71b8207b43f8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Aug 26, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 26, 2021
@openjdk
Copy link

openjdk bot commented Aug 26, 2021

@veresov Pushed as commit 673ce7e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dean-long
Copy link
Member

@veresov That makes sense now. I incorrectly assumed invoke_count was for the callee, not the caller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants