Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8267532: C2: Profile and prune untaken exception handlers #16416

Closed
wants to merge 44 commits into from

Conversation

JornVernee
Copy link
Member

@JornVernee JornVernee commented Oct 30, 2023

The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to close() the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the close() call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance.

There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch.

The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each
exception handler of a method in the MethodData for that method (which holds all the profiling
data). Then when looking up the exception handler after an exception is thrown, we mark the
exception handler as entered. When C2 parses the exception handler block, and it sees that it has
never been entered, we emit an uncommon trap instead.

I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets.

Finally, there was an assert firing in freeze_internal in continuationFreezeThaw.cpp:

assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0),
     "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count());

This assert relies on has_monitors being set for a method, which in itself relies on monitorenter and monitorexit being parsed. However, if we prune untaken exception handlers, we might not see any monitorexit, which is a problem for OSR compilations since then we might also not see any monitorenter. After some investigation, it turns out that ciMethod already tracks whether monitor bytecodes are being used, so we can just piggyback on that instead of relying on monitorenter or monitorexit being parsed. We can follow the existing pattern for how has_reserved_stack_access is being tracked (which I've done). See a484206 a33a905 and d727df7

Benchmark with -XX:-PruneDeadExceptionHandlers:

Benchmark                                                      Mode  Cnt      Score     Error   Units
ResourceScopeCloseMin.confined_close                           avgt   30     10.458 ±   0.070   ns/op
ResourceScopeCloseMin.confined_close:gc.alloc.rate             avgt   30   9480.988 ±  63.335  MB/sec
ResourceScopeCloseMin.confined_close:gc.alloc.rate.norm        avgt   30    104.000 ±   0.001    B/op
ResourceScopeCloseMin.confined_close:gc.count                  avgt   30    119.000            counts
ResourceScopeCloseMin.confined_close:gc.time                   avgt   30     94.000                ms
ResourceScopeCloseMin.confined_close_notry                     avgt   30      4.691 ±   0.063   ns/op
ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate       avgt   30  11383.693 ± 151.145  MB/sec
ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate.norm  avgt   30     56.000 ±   0.001    B/op
ResourceScopeCloseMin.confined_close_notry:gc.count            avgt   30    120.000            counts
ResourceScopeCloseMin.confined_close_notry:gc.time             avgt   30    104.000                ms

with -XX:+PruneDeadExceptionHandlers:

Benchmark                                                      Mode  Cnt      Score     Error   Units
ResourceScopeCloseMin.confined_close                           avgt   30      4.563 ±   0.043   ns/op
ResourceScopeCloseMin.confined_close:gc.alloc.rate             avgt   30  11702.868 ± 108.816  MB/sec
ResourceScopeCloseMin.confined_close:gc.alloc.rate.norm        avgt   30     56.000 ±   0.001    B/op
ResourceScopeCloseMin.confined_close:gc.count                  avgt   30    121.000            counts
ResourceScopeCloseMin.confined_close:gc.time                   avgt   30     93.000                ms
ResourceScopeCloseMin.confined_close_notry                     avgt   30      4.601 ±   0.054   ns/op
ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate       avgt   30  11605.391 ± 134.000  MB/sec
ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate.norm  avgt   30     56.000 ±   0.001    B/op
ResourceScopeCloseMin.confined_close_notry:gc.count            avgt   30    121.000            counts
ResourceScopeCloseMin.confined_close_notry:gc.time             avgt   30    101.000                ms

Note that with the optimization turned on, timing and gc.alloc.rate.norm is ~equal.

I also noticed through other experiments that C2's ability to inline improves, due to inline_instructions_size being reduced for methods with untaken exception handlers, which might bring the size under InlineSmallCode, and allow the method to be inlined again.

Finally, I've changed all the foreign benchmarks to use try-with-resources where they were working around this issue, and verified that allocations go down when turning on the optimization.

Testing : tier 1-6. Local run of hotspot_compiler suite with -XX:+DeoptimizeALot and with -XX:+StressPrunedExceptionHandlers.

Special thanks to Tobias Hartmann, and Vladimir Ivanov for the discussion during the design process of this patch.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issues

  • JDK-8267532: C2: Profile and prune untaken exception handlers (Enhancement - P3)
  • JDK-8310011: Arena with try-with-resources is slower than it should be (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416
$ git checkout pull/16416

Update a local copy of the PR:
$ git checkout pull/16416
$ git pull https://git.openjdk.org/jdk.git pull/16416/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16416

View PR using the GUI difftool:
$ git pr show -t 16416

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16416.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 30, 2023

👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 30, 2023

@JornVernee The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Oct 30, 2023
@JornVernee
Copy link
Member Author

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Oct 30, 2023
@openjdk
Copy link

openjdk bot commented Oct 30, 2023

@JornVernee
The core-libs label was successfully removed.

@iwanowww
Copy link

The issue occurs for OSR compilations where the monitorenter is before the loop (outside of the compiled code) ...

Thanks for the clarifications.

Spotted another inconsistency: C1 doesn't set has_monitor on monitorexit while C2 does. So, seems like C1 is also affected (even without branch pruning).

Please, file a separate bug for it. The code in question is used in verification code and available only in debug VM. If the failures block this patch, you can comment out relevant asserts. They'll be re-enabled as part of the proper fix.

@dean-long
Copy link
Member

Spotted another inconsistency: C1 doesn't set has_monitor on monitorexit while C2 does. So, seems like C1 is also affected (even without branch pruning).

Is it possible for C1 to see a monitorexit but not the earlier monitorenter?

@iwanowww
Copy link

Is it possible for C1 to see a monitorexit but not the earlier monitorenter?

Hm, good question, Dean. I was under impression that there are no guarantees C1 visits all reachable bytecodes of a method during parsing, so monitorenter can be missed. But after reexamining GraphBuilder implementation I don't think it is possible.

@dean-long
Copy link
Member

I had a similar concern and also convinced myself it's not a problem for C1. If it was, we should have seen an assert in the loom code that depends on has_monitor().

@JornVernee
Copy link
Member Author

Please, file an RFE to explore pruning of unreached call sites.

Filed: https://bugs.openjdk.org/browse/JDK-8320271

@JornVernee
Copy link
Member Author

JornVernee commented Nov 17, 2023

I've removed the fix for the has_monitors issue 1 and filed: https://bugs.openjdk.org/browse/JDK-8320310

I've also added checks using too_many_traps as suggested 2

P.S. it looks like the too_many_traps check was too strong, as it will return true if there are too many traps with the same deopt reason anywhere in the compiled nmethod. I've removed it again for now.

@JornVernee
Copy link
Member Author

Missing profiling would be bad, as in that case we'd always try to prune the exception handler. i.e. it's not just a missed optimization.

Yes, pathological recompilation is another scenario to consider. You can sprinkle Compile::too_many_traps checks (both as asserts and product checks) to ensure profiling information is up-to-date.

I also realize that what I said here is not quite true, as we mark the handler as entered in the deopt code. So, if we deopt once, we won't get an uncommon trap again. (there's a test case for that as well).

Copy link

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks very good.

I have been thinking about the following choices made in this PR:

  • amount of profiling data: binary (seen vs not seen) vs integral (branch count)
  • deoptimization action: reinterpret vs made_not_entrant
  • place where uncommon trap is inserted (Parse vs ciTypeFlow)

I haven't come with strong arguments to change any of these choices, so I'm the patch as it is now. We can adjust them later as follow-up enhancements if we decide to do so.

On naming: ex_handler is used only once - GraphKit::has_ex_handler(). Everywhere else in the code base exception_handler is used. Please, align the naming. Feel free to adjust GraphKit::has_ex_handler().

The tests are very nice! Can you, please, point me to the test case which covers profiling in interpreter?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 23, 2023
@JornVernee
Copy link
Member Author

On naming: ex_handler is used only once - GraphKit::has_ex_handler(). Everywhere else in the code base exception_handler is used. Please, align the naming. Feel free to adjust GraphKit::has_ex_handler().

Done.

The tests are very nice! Can you, please, point me to the test case which covers profiling in interpreter?

I've added 2 more test cases that target interpreter profiling specifically. See: dfd5da1

@JornVernee
Copy link
Member Author

Another round of tier 1 - 8 testing came back clean. I'm planning to integrate the patch tomorrow.

@JornVernee
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 28, 2023

Going to push as commit a5ccd3b.
Since your change was applied there have been 120 commits pushed to the master branch:

  • 464dc3d: 8319633: runtime/posixSig/TestPosixSig.java intermittent timeouts on UNIX
  • efc3922: 8319048: Monitor deflation unlink phase prolongs time to safepoint
  • debf0ec: 8313355: javax/management/remote/mandatory/notif/ListenerScaleTest.java failed with "Exception: Failed: ratio=792.2791601423487"
  • 20aae3c: 8320533: Adjust capstone integration for v6 changes
  • 0678253: 8320602: Lock contention in SchemaDVFactory.getInstance()
  • f1a24f6: 8318599: HttpURLConnection cache issues leading to crashes in JGSS w/ native GSS introduced by 8303809
  • 7848ed7: 8301856: Generated .spec file for RPM installers uninstalls desktop launcher on update
  • 726f854: 8320706: RuntimePackageTest.testUsrInstallDir test fails on Linux
  • 1bb250c: 8261837: SIGSEGV in ciVirtualCallTypeData::translate_from
  • 5f7f2c4: 8320249: tools/jpackage/share/AddLauncherTest.java#id1 fails intermittently on Windows in verifyDescription
  • ... and 110 more: https://git.openjdk.org/jdk/compare/8ec6b8de3bb3d7aeebdcb45d761b18cce3bab75e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 28, 2023
@openjdk openjdk bot closed this Nov 28, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 28, 2023
@openjdk
Copy link

openjdk bot commented Nov 28, 2023

@JornVernee Pushed as commit a5ccd3b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@JornVernee JornVernee deleted the PruneDeadCatchBlocks branch November 28, 2023 10:18
@mlbridge
Copy link

mlbridge bot commented Dec 1, 2023

Mailing list message from Vitaly Provodin on hotspot-dev:

Hi all,

With the latest changes I got the following error

=======================8<----------------------
./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
}
^
1 error generated.
make[3]: *** [/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [hotspot-server-libs] Error 2
make[2]: *** Waiting for unfinished jobs?.
=======================8<???????????

Here is my build environment

Configuration summary:
* Name: macosx-aarch64-server-release
* Debug level: release
* HS debug level: product
* JVM variants: server
* JVM features: server: 'cds compiler1 compiler2 dtrace epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: macosx, CPU architecture: aarch64, address length: 64
* Version string: 22+9-b1917 (22)
* Source date: 1701334649 (2023-11-30T08:57:29Z)

Tools summary:
* Boot JDK: openjdk version "22" 2024-03-19 OpenJDK Runtime Environment JBR-22+9-1795-nomod (build 22+9-b1795) OpenJDK 64-Bit Server VM JBR-22+9-1795-nomod (build 22+9-b1795, mixed mode)
* Toolchain: clang (clang/LLVM from Xcode 12.2)
* Sysroot: /Applications/Xcode_12.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk
* C Compiler: Version 12.0.0 (at /usr/bin/clang)
* C++ Compiler: Version 12.0.0 (at /usr/bin/clang++)

Could you please clarify how to overcome this issue?

Thanks,
Vitaly

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20231130/937cf93e/attachment-0001.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 1, 2023

Mailing list message from Vladimir Kozlov on hotspot-dev:

I hit is too on my Mac, I filed following bug and assigned to Jorn.

https://bugs.openjdk.org/browse/JDK-8321141

Note, I checked and all testing passed when 8267532 was reviewed.
May be something to do with old Xcode I used to compile or something else.

Thanks,
Vladimir K

On 11/30/23 4:47 AM, Vitaly Provodin wrote:

Hi all,

With the latest changes I got the following error

=======================8<----------------------
./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void function does not return a value in all control paths
[-Werror,-Wreturn-type]
}
^
1 error generated.
make[3]: ***
[/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [hotspot-server-libs] Error 2
make[2]: *** Waiting for unfinished jobs?.
=======================8<???????????

Here is my build environment

Configuration summary:
* Name: macosx-aarch64-server-release
* Debug level: release
* HS debug level: product
* JVM variants: server
* JVM features: server: 'cds compiler1 compiler2 dtrace epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc
serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: macosx, CPU architecture: aarch64, address length: 64
* Version string: 22+9-b1917 (22)
* Source date: 1701334649 (2023-11-30T08:57:29Z)

Tools summary:
* Boot JDK: openjdk version "22" 2024-03-19 OpenJDK Runtime Environment JBR-22+9-1795-nomod (build 22+9-b1795) OpenJDK
64-Bit Server VM JBR-22+9-1795-nomod (build 22+9-b1795, mixed mode)
* Toolchain: clang (clang/LLVM from Xcode 12.2)
* Sysroot: /Applications/Xcode_12.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk
* C Compiler: Version 12.0.0 (at /usr/bin/clang)
* C++ Compiler: Version 12.0.0 (at /usr/bin/clang++)

Could you please clarify how to overcome this issue?

Thanks,
Vitaly

@mlbridge
Copy link

mlbridge bot commented Dec 1, 2023

Mailing list message from Jorn Vernee on hotspot-dev:

Hello,

It seems to be an issue with XCode 12.2 not supporting the [[noreturn]]
attribute. Note that the build guide recommends at least XCode 14 [1],
so you may want to upgrade XCode to see if that helps.

Jorn

[1]: https://github.com/openjdk/jdk/blob/master/doc/building.md#macos

On 30/11/2023 19:48, Vladimir Kozlov wrote:

I hit is too on my Mac, I filed following bug and assigned to Jorn.

https://bugs.openjdk.org/browse/JDK-8321141

Note, I checked and all testing passed when 8267532 was reviewed.
May be something to do with old Xcode I used to compile or something
else.

Thanks,
Vladimir K

On 11/30/23 4:47 AM, Vitaly Provodin wrote:

Hi all,

With the latest changes I got the following error

=======================8<----------------------
./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void
function does not return a value in all control paths
[-Werror,-Wreturn-type]
}
^
1 error generated.
make[3]: ***
[/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o]
Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [hotspot-server-libs] Error 2
make[2]: *** Waiting for unfinished jobs?.
=======================8<???????????

Here is my build environment

Configuration summary:
* Name: macosx-aarch64-server-release
* Debug level: release
* HS debug level: product
* JVM variants: server
* JVM features: server: 'cds compiler1 compiler2 dtrace epsilongc
g1gc jfr jni-check jvmci jvmti management parallelgc serialgc
services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: macosx, CPU architecture: aarch64, address
length: 64
* Version string: 22+9-b1917 (22)
* Source date: 1701334649 (2023-11-30T08:57:29Z)

Tools summary:
* Boot JDK: openjdk version "22" 2024-03-19 OpenJDK Runtime
Environment JBR-22+9-1795-nomod (build 22+9-b1795) OpenJDK 64-Bit
Server VM JBR-22+9-1795-nomod (build 22+9-b1795, mixed mode)
* Toolchain: clang (clang/LLVM from Xcode 12.2)
* Sysroot:
/Applications/Xcode_12.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk
* C Compiler: Version 12.0.0 (at /usr/bin/clang)
* C++ Compiler: Version 12.0.0 (at /usr/bin/clang++)

Could you please clarify how to overcome this issue?

Thanks,
Vitaly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
9 participants