Skip to content

Conversation

@ruben-arm
Copy link
Contributor

@ruben-arm ruben-arm commented Aug 7, 2025

The C2 exception handler stub code is only a trampoline to the generated exception handler blob. This change removes the extra step on the way to the generated blob.

According to some comments in the source code, the exception handler stub code used to be patched upon deoptimization, however presumably these comments are outdated as the patching upon deoptimization happens for post-call NOPs only.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365047: Remove exception handler stub code in C2 (Enhancement - P4)

Reviewers

Contributors

  • Martin Doerr <mdoerr@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26678/head:pull/26678
$ git checkout pull/26678

Update a local copy of the PR:
$ git checkout pull/26678
$ git pull https://git.openjdk.org/jdk.git pull/26678/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26678

View PR using the GUI difftool:
$ git pr show -t 26678

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26678.diff

Using Webrev

Link to Webrev Comment

The C2 exception handler stub code is only a trampoline to the
generated exception handler blob. This change removes the extra
step on the way to the generated blob.

According to some comments in the source code, the exception handler
stub code used to be patched upon deoptimization, however presumably
these comments are outdated as the patching upon deoptimization happens
for post-call NOPs only.
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 7, 2025

👋 Welcome back ruben-arm! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 7, 2025

@ruben-arm This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8365047: Remove exception handler stub code in C2

Co-authored-by: Martin Doerr <mdoerr@openjdk.org>
Reviewed-by: mdoerr, dlong, dfenacci, adinn, fyang, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 38 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dean-long, @dafedafe, @adinn, @RealFYang, @TheRealMDoerr, @theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Aug 7, 2025

@ruben-arm The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Aug 7, 2025
@ruben-arm ruben-arm marked this pull request as ready for review August 8, 2025 11:09
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 8, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 8, 2025

@dean-long
Copy link
Member

This looks good. How much testing have you done?

Maybe we can get rid of CodeOffsets::DeoptMH next.

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results from Oracle are good. You need one more review.

@openjdk
Copy link

openjdk bot commented Aug 9, 2025

⚠️ @ruben-arm the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout pr-8365047
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 9, 2025
Copy link
Contributor

@dafedafe dafedafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for "cleaning" this @ruben-arm.
Did you run some testing (on the touched platforms)?
I doubt that there is anything perceivable but, out of curiosity, did you notice any performance change?

@ruben-arm
Copy link
Contributor Author

Thank you for the reviews, @dean-long, @dafedafe,

How much testing have you done?
Did you run some testing (on the touched platforms)?

I've run tier1-tier3 tests, however only on AArch64 and x86-64.
I can run more tests on these platforms if that might be useful.

I doubt that there is anything perceivable but, out of curiosity, did you notice any performance change?

I've not measured performance impact of the patch separately. It is a part of a bigger effort to reduce memory footprint of compiled code. The cumulative effect from this and similar patches is expected to be a noticeable performance improvement, however I wouldn't expect a significant observable effect from this patch only.

The decrease for C2-compiled code on AArch64 should be 4-12 bytes (depends on size of code cache) per nmethod, however the nmethod's alignment requirement might hide some of these improvements. I'm also exploring possibility to reduce footprint of the deoptimization handler stub codes.

Maybe we can get rid of CodeOffsets::DeoptMH next.

I will look into this. I have a proof-of-concept patch reducing each deoptimization handler stub code to 1 instruction each on AArch64 - that instruction traps and the rest of the deoptimization handler logic continues in signal handler. However, I would agree it would be preferable to remove the stub code completely if possible.

Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me.

@dean-long
Copy link
Member

Re: CodeOffsets::DeoptMH, what I meant is that it is just an exact copy of the CodeOffsets::Deopt stub code. Apparently at some point in the JSR 292 evolution, we needed to distinguish between the two, but it looks to me that we no longer need to do that. So we should be able to get rid of DeoptMH and related code, like nmethod::is_deopt_mh_entry().

@dafedafe
Copy link
Contributor

I've run tier1-tier3 tests, however only on AArch64 and x86-64.
I can run more tests on these platforms if that might be useful.

As you've touched a few other platforms' files it might be a good idea (to be on the safe-side).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 14, 2025
@ruben-arm
Copy link
Contributor Author

Thanks for the feedback, I've updated the change.

I've run tier1-tier3 tests, however only on AArch64 and x86-64.
I can run more tests on these platforms if that might be useful.

As you've touched a few other platforms' files it might be a good idea (to be on the safe-side).

To clarify my earlier comment: when I said "I can run more tests on these platforms if that might be useful.", I meant more testing on AArch64 and x86-64 only. I don't have any other platform available for testing.

Does the testing infrastructure on the Github/CI provide a way to run the tests on other platforms by any chance?

Otherwise, to stay on the safe side, I could revert the change for the platforms other than AArch64 and x86-64.

Please let me know which path you prefer.

@dafedafe
Copy link
Contributor

I don't have any other platform available for testing.

Sorry, I misread your comment.

Does the testing infrastructure on the Github/CI provide a way to run the tests on other platforms by any chance?

Not that I'm aware of but you could for instance ask port maintainers to run a few tiers for you: https://wiki.openjdk.org/display/HotSpot/Ports

@ruben-arm
Copy link
Contributor Author

@TheRealMDoerr, @RealFYang, @offamitkumar,
Could you run tier1-tier3 tests on your platforms for this patch please?

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruben-arm : Thanks for the ping. Changes LGTM. Tier1-3 test good on linux-riscv64.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 16, 2025
@offamitkumar
Copy link
Member

@ruben-arm I don't see any test failure on s390x in tier1. Code change looks good to me. CC: @RealLucy

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tier 1 has passed on linux ppc64le. Nice cleanup!

@ruben-arm
Copy link
Contributor Author

Thank you for running the tests @TheRealMDoerr, @RealFYang, @offamitkumar.

@dafedafe, could you approve if the patch looks good, please? (there have been no code changes since your last comment)

Copy link
Contributor

@dafedafe dafedafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @ruben-arm!

@ruben-arm
Copy link
Contributor Author

ruben-arm commented Aug 28, 2025

Looks good! Thanks @ruben-arm!

Thank you @dafedafe.

In the meantime, I realized the AArch32 tests weren't run during my testing earlier - and attempted running them. I found that many of the tests are failing without this patch. Nevertheless, I noticed that more tests are failing with this patch. I've just identified the root cause - described below. This issue is caused by the current version of the patch - at least for AArch32.

Once the exception handler stub code is removed, the deoptimization handler stub code can become adjacent to the main code. Occasionally the main code ends with a BL which is never meant to return. That BL - if adjacent to the stub code - writes the address of the deoptimization stub code into LR, causing an issue for subsequent frame processing, as the design assumption is: if a return address points to the deoptimization stub code, then deoptimization is in progress.
For this to apply, there should be no call instructions right before the deoptimization stub code.

Presumably, the most straightforward fix could be to emit a NOP at the end of the main code if otherwise a BL/BLR would be the last instruction there. I'd appreciate feedback on whether this approach is acceptable.

@adinn
Copy link
Contributor

adinn commented Aug 28, 2025

Presumably, the most straightforward fix could be to emit a NOP at the end of the main code if otherwise a BL/BLR would be the last instruction there. I'd appreciate feedback on whether this approach is acceptable.

Wow, that's a bizarre bug. Maybe better might be to generate an illegal instruction rather than a nop but a nop would probably do.

@dean-long
Copy link
Member

Re: SafeFetch, it is probably OK to make NativePostCallNop_at slightly slower for uses like make_deoptimized(), but the oopmap optimizations like CodeCache::find_blob_and_oopmap() were highly optimized to make loom/VirtualThread performance reasonable. Adding a SafeFetch here might cause a regression.

@theRealAph
Copy link
Contributor

We shouldn't leave such fragile code in once we've noticed it. IMO it's a false economy to avoid SafeFetch on efficiency grounds. If needs be, there are ways to make it faster.

@theRealAph, I agree - there should be a mechanism to ensure the function can't cause a crash connected to reading outside the code blob.

Would it be suitable if this is handled as a separate issue and PR dedicated to resolving it?

Perhaps. I'm not sure it needs it, though.

One aspect I'm still unsure about, when considering the SafeFetch-based approach: as far as I'm aware, there is no guarantee that a code outside a code blob can't ever have a NOP+MOVK sequence (though of course chances of that are very low) and, certainly, there is no guarantee that an arbitrary data outside code blob would not match the pattern.

No, but when we have stack corruption it's better that we don't crash while unwinding. All we're doing by using SafeFetch is adding a little robustness. We're not making things worse.

[ As an aside, I believe we should use SafeFetch during unwinding a lot more than we do at present. A corrupted stack often results in a crash during printing a diagnostic trace, and it's good to catch such errors sooner than later. ]

If it is possible for the check to happen for a location outside code blob and that would happen to result in a false-positive match, then the retrieved information would be unreliable and might lead to further issues including assumption that an arbitrary instruction sequence or data is interpreted as a CodeBlob.

Indeed so. One error would lead to another.

There is a special case of potentially having a call instruction right at the end of the code blob. As far as I understand, it is not currently possible for platforms for which continuations are enabled because every call is followed by the post-call NOP sequence.

If you step through the stack unwinding code you will encounter calls
without post-call NOPs.

In case it actually might happen in some case, the SafeFetch would guarantee there is no crash, however in my understanding the code might still be fragile: if there instead happens to be another CodeBlob next to the current one, the check will try to interpret the header data as a post-call NOP sequence.

It's possible, but it's far more likely that we'll get a usable stack trace in a crash dump.

There can be a false-positive match - as there is no guarantee the data in the header will never match with the post-call NOP sequence pattern. In that case, similarly to the more generic case, an arbitrary data or code could be interpreted as a CodeBlob leading to an unpredictable behaviour.

All of this is true, but you are almost implying that we shouldn't make this thing over here robust because there is this other thing over there that needs fixing too.

I'm reminded of the Hubble Space Telescope. The testing error that led to the focussing problem was so gross - about a millimetre - that it could have been detected with a school ruler. It wasn't measured, perhaps because the tolerances to which it was supposed to have been built were much tighter than that.

I'd like to solve the issue with fragility of the post-call NOP check, however it does appear to me that only adding SafeFetch might not fully resolve the concern: it would prevent faults but wouldn't address the false positive matches. If, however, another guarantee is added that post-call NOP check never looks beyond the code blob of the call site, then SafeFetch would not be required. What do you think about this perspective?

It may not be possible. Post-call NOPS exist only to make stack unwinding as fast as possible, and if I recall correctly there's already a debug-mode consistency check.

@theRealAph
Copy link
Contributor

But I'm not going to push SafeFetch any further. I've said my piece.

@theRealAph
Copy link
Contributor

Re: SafeFetch, it is probably OK to make NativePostCallNop_at slightly slower for uses like make_deoptimized(), but the oopmap optimizations like CodeCache::find_blob_and_oopmap() were highly optimized to make loom/VirtualThread performance reasonable. Adding a SafeFetch here might cause a regression.

Sure, but 2 things:
Loom doesn't meed post-call NOPs as much as it used to.
We could fairly easily make SafeFetch much faster than it is, if needs be.
But anyway, I approved this patch.

@openjdk
Copy link

openjdk bot commented Nov 1, 2025

@ruben-arm this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout pr-8365047
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Nov 1, 2025
@ruben-arm
Copy link
Contributor Author

Thank you for the detailed advice, @theRealAph,

I now see how SafeFetch can be valuable independently of whether false-positive matches with the post-call NOP pattern can happen during normal execution. I hadn't considered the stack corruption use case before.

Reviewing the SafeFetch implementation, I believe in general case it relies on sigsetjmp on POSIX systems and exceptions on Windows.
However, for AArch64, the SafeFetch32 has an optimized implementation - avoiding setjmp or exceptions overhead.
On the fast path, it performs just one load, so any extra performance cost would be due to that path cannot currently be inlined.

There indeed seems to be a way to have it inlined, at least on Linux - via creating an extra ELF section containing addresses of all inlined SafeFetch loads and corresponding continuation points, which the signal handler can iterate through. I've not prototyped this, but if feasible, it could make the performance impact of using SafeFetch negligible.

Since there isn't necessarily a consensus at this stage on whether SafeFetch should be added in this PR, I'd propose opening a separate JBS ticket for it to avoid blocking merge of the exception handler stub code cleanup.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Nov 4, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 4, 2025
@ruben-arm
Copy link
Contributor Author

Thank you all for reviewing the PR and helping with testing.

A separate JBS issue has been opened for SafeFetch: https://bugs.openjdk.org/browse/JDK-8371204.

I plan to wait until tomorrow before issuing the /integrate request. Please let me know if you think this should wait longer.

@ruben-arm
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 5, 2025
@openjdk
Copy link

openjdk bot commented Nov 5, 2025

@ruben-arm
Your change (at version 359c2f1) is now ready to be sponsored by a Committer.

@eastig
Copy link
Member

eastig commented Nov 5, 2025

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 5, 2025

Going to push as commit 3e3822a.
Since your change was applied there have been 38 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 5, 2025
@openjdk openjdk bot closed this Nov 5, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 5, 2025
@openjdk
Copy link

openjdk bot commented Nov 5, 2025

@eastig @ruben-arm Pushed as commit 3e3822a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dean-long
Copy link
Member

We are seeing some new crashes (JDK-8371388) trying to access a PC that is just past the end of the nmethod and the page is unmapped because it also happens to be the last page of the CodeHeap. Could it be related to the changes in this PR?

@ruben-arm
Copy link
Contributor Author

We are seeing some new crashes (JDK-8371388) trying to access a PC that is just past the end of the nmethod and the page is unmapped because it also happens to be the last page of the CodeHeap. Could it be related to the changes in this PR?

Yes, I think it could be similar to the case fixed for AArch64 post-call NOP check earlier:

bool check() const { return int_at(0) == 0x841f0f; }
reads a 32-bit integer from the perceived call site. In case of the deoptimization handler, which is potentially located at the end of the code blob, the read would happen past the end of the code blob - which might cause the access to an unmapped page.

It could be replaced with the two-step comparison: first the comparison matching size of the jmp instruction (I believe that's 2 bytes), and if that's successful then comparison of the third byte as the second step. Alternatively, the specific deoptimization stub code could be extended by a nop in the emit_deopt_handler.

Would either of these options be suitable?

@ruben-arm
Copy link
Contributor Author

Indeed, the jmp size is 2 - I had incorrectly assumed it is 5 as specified here

instruction_size = 5,
however that's for a different case. The 10 as size of the deopt handler stub code at
return 10;
is not correct either - it should be 7.

@TobiHartmann
Copy link
Member

Backing out with #28187.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.