Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8321509: False positive in get_trampoline fast path causes crash #19796

Closed
wants to merge 6 commits into from

Conversation

dean-long
Copy link
Member

@dean-long dean-long commented Jun 19, 2024


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8321509: False positive in get_trampoline fast path causes crash (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19796/head:pull/19796
$ git checkout pull/19796

Update a local copy of the PR:
$ git checkout pull/19796
$ git pull https://git.openjdk.org/jdk.git pull/19796/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19796

View PR using the GUI difftool:
$ git pr show -t 19796

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19796.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2024

👋 Welcome back dlong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 19, 2024

@dean-long This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321509: False positive in get_trampoline fast path causes crash

Reviewed-by: kvn, adinn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 195 new commits pushed to the master branch:

  • b32e4a6: 8335356: Shenandoah: Improve concurrent cleanup locking
  • 62cbf70: 8336085: Fix simple -Wzero-as-null-pointer-constant warnings in CDS code
  • 2928753: 8324966: Allow selecting jtreg test case by ID from make
  • 1772a92: 8334457: Test javax/swing/JTabbedPane/bug4666224.java fail on macOS with because pressing the ‘C’ key does not switch the layout to WRAP_TAB_LAYOUT
  • b7d0eff: 8207908: JMXStatusTest.java fails assertion intermittently
  • cf940e1: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
  • b363de8: 8335946: DTrace code snippets should be generated when DTrace flags are enabled
  • d6c6847: 8335743: jhsdb jstack cannot print some information on the waiting thread
  • cad68e0: 8335935: Chained builders not sending transformed models to next transforms
  • 242f113: 8334481: [JVMCI] add LINK_TO_NATIVE to MethodHandleAccessProvider.IntrinsicMethod
  • ... and 185 more: https://git.openjdk.org/jdk/compare/974dca80df71c5cbe492d1e8ca5cee76bcc79358...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Jun 19, 2024

@dean-long The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 19, 2024
@dean-long dean-long marked this pull request as ready for review June 25, 2024 06:17
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 25, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2024

Webrevs

@dean-long
Copy link
Member Author

AArch64 binds some trampoline call-sites early, thanks to its is_always_within_branch_range() check. This allows a false positive match with a trampoline stub during code buffer expansion in rare situations. To fix this, this PR makes the following changes:

  1. Do not call get_trampoline() in Relocation::pd_call_destination or pd_set_call_destination, as they use the destination cannot be trusted during fixup.
  2. Restrict NativeCall::get_trampoline() to only operate on nmethods, not CodeBuffers (or BufferBlob)
  3. Fixup trampoline stub "owners" (call sites) as late as possible, in new trampoline_stub_Relocation::pd_fix_owner_after_move(), and only if destination is an nmethod.
  4. Avoid calling NativeCall::set_destination_mt_safe() during CodeBuffer fixup, which allows assert_lock to also go away
  5. Detect self-calls in NativeCall::destination() to avoid unnecessary call to find_blob()
  6. Add NativeCall fast paths for pd_call_destination/pd_set_call_destination

@dean-long dean-long changed the title 8321509: false positive in get_trampoline fast path causes crash 8321509: False positive in get_trampoline fast path causes crash Jun 25, 2024
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 26, 2024
@dean-long
Copy link
Member Author

Thanks Vladimir.

@dean-long
Copy link
Member Author

I am hoping an AArch64 expert can take a look at this. @theRealAph maybe?

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an AArch64 expert but this fix looks good to me.

@adinn
Copy link
Contributor

adinn commented Jul 2, 2024

This solution looks ok to me as far as jdk mainline is concerned. However, I think there is a problem as far Leyden is concerned.

The code changes Relocation::pd_call_destination to always expect its associated call to be embedded within an nmethod when orig_addr is null (i.e. when it is called with no args as reloc.pd_call_destination()). This is where the problem arises.

Currently, Leyden calls Relocation::pd_call_destination() from CallRelocation::destination() (and also from trampoline_stub_Relocation::destination()) when storing an nmethod to the CDS code cache. It needs to do this in order to be able to track relocs of type virtual_call_type, opt_virtual_call_type, static_call_type and runtime_call_type (also trampoline_stub_type). That is because all these relocs need their call destination to be adjusted when the nmethod is restored from the CDS code cache.

However, we already have prototype code in Leyden to store generated blobs to the CDS code cache. These blobs may legitimately include runtime_call_type relocs which also need tracking and adjusting at restore. For example, shared runtime or compiler stubs may call out to the JVM. Likewise, stubs in a stub generator blob may need to call out to the JVM or to a stub in some earlier generated blob. So, Leyden will need to call CallRelocation::destination() in cases where the associated call is embedded in a non-nmethod. Note that these calls will never employ trampolines.

The obvious fix is to modify Relocation::pd_call_destination so that it drops through to call MacroAssembler::pd_call_destination if the incoming blob is not an nmethod.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jul 2, 2024

@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not.
I applied patch to Leyden/premain repo and hit assert when we generate AOT code because we still processing CodeBuffer before nmethod is created:

#  Internal Error (/work/leyden/open/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:59), pid=15382, tid=28419
#  assert(cb != nullptr && cb->is_nmethod()) failed: nmethod expected
#
# JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-06-26-1746082.vkozlov...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-06-26-1746082.vkozlov..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
...
Current CompileTask:
C1:  F9322 C0 Q0 S6270 2840    b    2       java.util.zip.InflaterInputStream::close (34 bytes)

Stack: [0x000000016cdec000,0x000000016cfef000],  sp=0x000000016cfed1d0,  free space=2052k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x11dd564]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x544  (nativeInst_aarch64.cpp:59)
V  [libjvm.dylib+0x11ddd14]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V  [libjvm.dylib+0x58f220]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0xe3ddb4]  NativeCallTrampolineStub::destination(nmethod*) const+0x0
V  [libjvm.dylib+0x12ac4]  SCCache::write_relocations(CodeBuffer*, unsigned int&)+0x34c

@dean-long
Copy link
Member Author

To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod, and optionally restore the following performance optimization (with a comment as suggested by Vladimir):

// Performance optimization: no need to call find_blob() if it is a self-call
if (destination == addr) {
  return destination;
}

But I don't have a strong opinion on whether it should be fixed here or only in Leyden.

@adinn
Copy link
Contributor

adinn commented Jul 3, 2024

To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod . . .
But I don't have a strong opinion on whether it should be fixed here or only in Leyden.

Yes, I agree that will work as a solution.

I would recommend making this change in main. I think it is reasonable to expect NativeCall::destination() to be able to access the target for any instruction that can be viewed as a NativeCall, irrespective of whether it is embedded in an nmethod or some other blob. Clearly, the assert confirms that the current mainline code does not use it for anything other than an nmethod but there is nothing to say that it has to remain that way. Leyden is just one potential case where we might want to use it for some other blob.

@adinn
Copy link
Contributor

adinn commented Jul 3, 2024

@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not.

@vnkozlov I was more right than I even realized! I was only concerned about generated stubs but, as you point out, we will also call NativeCall::destination() when processing a native call from JITted method code while it is still residing in a CodeBuffer.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jul 3, 2024

Restoring performance check in NativeCall::destination() was enough for leyden to work (PetClininc with one- and five-steps workflows):

  address NativeCall::destination() const {
    address addr = instruction_address();
    address destination = addr + displacement();

+   // Performance optimization: no need to call find_blob() if it is a self-call
+   if (destination == addr) {
+     return destination;
+   }

@dean-long , I think it should be added in your changes.

@adinn, I suggest you to test these changes with leyden changes for stubs.

@adinn
Copy link
Contributor

adinn commented Jul 8, 2024

  address NativeCall::destination() const {
    address addr = instruction_address();
    address destination = addr + displacement();

+   // Performance optimization: no need to call find_blob() if it is a self-call
+   if (destination == addr) {
+     return destination;
+   }

. . .

@adinn, I suggest you to test these changes with leyden changes for stubs.

@vnkozlov I applied @dean-long's patch to my Leyden premain repo that saves and restores generated stubs. Without the above extra patch it crashes. With it everything works fine.

So, @dean-long assuming the above tweak is applied I believe it is good to go.

@dean-long
Copy link
Member Author

Unfortunately, adding the shortcut for self-calls is not enough for Leyden. Trampoline calls to always-reachable targets are bound early to their destination, so there can be NativeCalls that are not self-calls. To see this in a debug build, this line needs to be adjusted:

static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M);

@vnkozlov
Copy link
Contributor

vnkozlov commented Jul 8, 2024

Do we generate trampolines for "always-reachable targets " ?
Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?

@dean-long
Copy link
Member Author

Do we generate trampolines for "always-reachable targets " ?

No, there's no trampoline stub. But we still call destination().

Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?

  static const uint64_t branch_range = 128 * M;

@dean-long
Copy link
Member Author

Looks like for Leyden (in Leyden branch) we need to avoid binding calls even if destination is reachable. So that we only have destination == addr case for trampoline calls when we process CodeBuffer

Any destination == addr call needs a trampoline stub to store the final destination. The benefit of early binding for always reachable calls is we can avoid creating a trampoline stub. An alternative would be to always store the destination in the CallRelocation.

@vnkozlov
Copy link
Contributor

We should be pessimistic in Leyden. When we load AOT code there is no guarantee that destination is reachable.
x86 uses flag ForceUnreachable which we set to true in Leyden. Aarch64 does not use this flag so we have to find all places where there are assumption about reachability.

@dean-long
Copy link
Member Author

So for Leyden it sounds like you need to change is_always_within_branch_range().

@adinn
Copy link
Contributor

adinn commented Jul 10, 2024

So for Leyden it sounds like you need to change is_always_within_branch_range().

Or perhaps just adapt MacroAssembler::far_branches(). It returns false if the code cache max range exceeds branch_range. In Leyden we can make it return false when we are generating AOT code.

@adinn
Copy link
Contributor

adinn commented Jul 10, 2024

Oops, sorry, I got that the wrong way round. We need to change is_always_within_branch_range() as @dean-long suggested.

@eastig
Copy link
Member

eastig commented Jul 10, 2024

Hi @dean-long,
Could you please check if my understanding of the bug is correct?

C2 generates code into CodeBuffer. Some calls have targets always within a branch range. Direct BL instructions are generated for them. Such calls don't have trampoline_stub_Relocation. When the current CodeBuffer is not enough we create a bigger CodeBuffer ("expand") and move the current code into it. Whilst moving the code we are patching instructions. Direct BLs use offsets. In some cases after "expanding" CodeBuffer, code at an offset can be a trampoline: is_NativeCallTrampolineStub_at == true. This invalidates a call because the fast path in get_trampoline is used. The fast path does not iterate over relocations. If a slow path were taken we would have patched the instruction correctly.

My current knowledge of the area:

  • In CodeBuffer, trampolined BLs are calling themselves. This means offsets in them are zeros. CodeBuffer::expand breaks this because CallRelocation::fix_relocation_after_move will finalize call sites.
  • Not-trampolined BLs have non-zero offsets in them which point outside of the current CodeBuffer.
  • In code moved into the final location direct BLs must have non-zero offsets in them. Offsets within CodeBlob mean trampolined calls. Offsets outside CodeBlob mean non-trampolined calls.

IMO we should fix CodeBuffer::expand. It should go through relocations and fix only those which are not trampolines.

I don't think is_always_within_branch_range() needs any changes. As I wrote its return value is based on static CodeCache information.

@eastig
Copy link
Member

eastig commented Jul 10, 2024

We also should somehow guard CodeBuffer::relocate_code_to that it can only work with finalized CodeBuffers.

@vnkozlov
Copy link
Contributor

I don't think is_always_within_branch_range() needs any changes. As I wrote its return value is based on static CodeCache information.

For runtime call inside CodeCache reachable_from_branch_at() can give different answer when loading AOT code. There is no guarantee that CodeCache size will be the same.

@eastig
Copy link
Member

eastig commented Jul 10, 2024

I don't think is_always_within_branch_range() needs any changes. As I wrote its return value is based on static CodeCache information.

For runtime call inside CodeCache reachable_from_branch_at() can give different answer when loading AOT code. There is no guarantee that CodeCache size will be the same.

With my limited knowledge of AOT code, we should always generate trampoline based code for AArch64. is_always_within_branch_range() should either not be used for AOT or should always return false. Trampoline calls are optimized to direct calls if possible when code is move into CodeCache.

@eastig
Copy link
Member

eastig commented Jul 10, 2024

So for Leyden it sounds like you need to change is_always_within_branch_range().

Or perhaps just adapt MacroAssembler::far_branches(). It returns false if the code cache max range exceeds branch_range. In Leyden we can make it return false when we are generating AOT code.

We might need to adapt target_needs_far_branch for AOT code generation.

@vnkozlov
Copy link
Contributor

We already return true from `Leyden's target_needs_far_branch() for AOT code generation.

During this PR testing with Leyden I also found that we need to do the same in codestub_branch_needs_far_jump()

And now in is_always_within_branch_range() too.

@dean-long
Copy link
Member Author

If there is other code calling Assembler::reachable_from_branch_at() directly then you might need to change that function too.

@vnkozlov
Copy link
Contributor

If there is other code calling Assembler::reachable_from_branch_at() directly then you might need to change that function too.

Yes, I will do. But this should not prevent you from pushing your changes. I only request to add "optimization" check (destination == addr) into NativeCall::destination()

@dean-long
Copy link
Member Author

@eastig, your understanding is correct.

IMO we should fix CodeBuffer::expand. It should go through relocations and fix only those which are not trampolines.

That's roughly what this patch does. I detect expand by checking dest->blob() and orig_addr. However, I don't see an easy way to detect trampoline vs non trampoline calls in the shared code iterator. Instead, I removed the fast-path trampoline lookup during expand and find the trampoline call-sites by iterating their stubs to find owners.

We also should somehow guard CodeBuffer::relocate_code_to that it can only work with finalized CodeBuffers.

It is used by expand(). But maybe you meant copy_code_to(). I would like to keep additional changes to a minimum, to make back-ports easier. I suggest a separate RFE for further improvements.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 10, 2024
@dean-long dean-long requested a review from vnkozlov July 10, 2024 21:38
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 10, 2024
@dean-long
Copy link
Member Author

I'll wait until tomorrow to push, in case there are still concerns/questions from @eastig or @adinn .

@adinn
Copy link
Contributor

adinn commented Jul 11, 2024

@dean-long I'm ok with Valdimir's suggestion just to include the "optimization" check. This fixes the problem with processing relocations when saving/restoring AOT code including in generated stub routines.

n.b. unlike nmethods, generated stub code can contain direct pc-rel branches within the buffer which do not target a trampoline. This happens in the arraycopy stub as one example. However, I don't believe this invalidates your assumptions as to how to handle buffer resize events because buffers used for stubs are pre-allocated large enough to avoid the need for resizing.

@dean-long
Copy link
Member Author

generated stub code can contain direct pc-rel branches within the buffer which do not target a trampoline

That sounds fine. In fact, they probably don't need to use a Relocation at all (except maybe in Leyden). If a forward reference needs a fixup, it can use a Label.

What would invalidate current assumptions is trying to support trampoline stubs in non-nmethods. We can cross that bridge when we get to it.

@dean-long
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 11, 2024

Going to push as commit 73e3e0e.
Since your change was applied there have been 202 commits pushed to the master branch:

  • 9eb611e: 8334055: Unhelpful 'required: reference' diagnostics after JDK-8043226
  • 5100303: 8335668: NumberFormat integer only parsing should throw exception for edge case
  • 58c9842: 8336021: Doccheck: valign not allowed for HTML5 in java.xml
  • d06d79c: 8325369: @sealedGraph: Bad link to image for tag on nested classes
  • dea9274: 8332125: [nmt] Totals in diff report should print out total malloc and mmap diffs
  • 5c612c2: 8332689: RISC-V: Use load instead of trampolines
  • 6fcd49f: 8336239: Fix javadoc markup in java.lang.Process
  • b32e4a6: 8335356: Shenandoah: Improve concurrent cleanup locking
  • 62cbf70: 8336085: Fix simple -Wzero-as-null-pointer-constant warnings in CDS code
  • 2928753: 8324966: Allow selecting jtreg test case by ID from make
  • ... and 192 more: https://git.openjdk.org/jdk/compare/974dca80df71c5cbe492d1e8ca5cee76bcc79358...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 11, 2024
@openjdk openjdk bot closed this Jul 11, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 11, 2024
@openjdk
Copy link

openjdk bot commented Jul 11, 2024

@dean-long Pushed as commit 73e3e0e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants