Skip to content

8286058: AArch64: clarify types of calls#8564

Closed
eastig wants to merge 6 commits intoopenjdk:masterfrom
eastig:JDK-8286058
Closed

8286058: AArch64: clarify types of calls#8564
eastig wants to merge 6 commits intoopenjdk:masterfrom
eastig:JDK-8286058

Conversation

@eastig
Copy link
Member

@eastig eastig commented May 5, 2022

The PR clarifies the types of calls AArch64 OpenJDK uses. It cleans up far_call, far_jump and trampoline_call. It removes trampoline_call1 because its use cases are now supported by trampoline_call.
Tested a fastdebug build:

  • gtest: Passed
  • tier1...tier3: Passed

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8564/head:pull/8564
$ git checkout pull/8564

Update a local copy of the PR:
$ git checkout pull/8564
$ git pull https://git.openjdk.java.net/jdk pull/8564/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8564

View PR using the GUI difftool:
$ git pr show -t 8564

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8564.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 5, 2022

👋 Welcome back eastig! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 5, 2022
@openjdk
Copy link

openjdk bot commented May 5, 2022

@eastig The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 5, 2022
@mlbridge
Copy link

mlbridge bot commented May 5, 2022

Webrevs

@eastig
Copy link
Member Author

eastig commented May 5, 2022

@vnkozlov, @theRealAph
Could you please review?

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@vnkozlov
Copy link
Contributor

vnkozlov commented May 6, 2022

I will run tests too to make sure we don't hit asserts.

@openjdk
Copy link

openjdk bot commented May 6, 2022

@eastig This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8286058: AArch64: clarify types of calls

Reviewed-by: kvn, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 135 new commits pushed to the master branch:

  • 6a33974: 8286737: Test vmTestbase/gc/gctests/WeakReference/weak006/weak006.java fails: Last soft reference has not been cleared
  • 295be6f: 8287285: Avoid redundant HashMap.containsKey call in java.util.zip.ZipFile.Source.get
  • 7cb368b: 8286709: (fc) FileChannel/FileChannelImpl cleanup
  • 7eb1559: 8286045: Use ForceGC for cleaner test cases
  • e44465d: 8287336: GHA: Workflows break on patch versions
  • c10749a: 8287187: Utilize HashMap.newHashMap() in CLDRConverter
  • f235955: 8287246: DSAKeyValue should check for missing params instead of relying on KeyFactory provider
  • f58c9a6: 8287244: Add bound check in indexed memory access var handle
  • f710393: 8257810: Only First page are printed in JTable.scrollRectToVisible
  • 704b9a6: 8287107: CgroupSubsystemFactory.setCgroupV2Path asserts with freezer controller
  • ... and 125 more: https://git.openjdk.java.net/jdk/compare/81e4bdbe1358b7feced08ba758ddb66415968036...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 6, 2022
// with the call of the target.
//
// Trampoline_call is most suitable for calls of Java methods. Java calls callees can be changed
// to the interpreter or different versions of a compiled method. Those callees can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. Trampoline calls are needed for calls to the runtime too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? What are the cases? This is not clear from sources.

// The code for runtime calls can also be generated with far_call. For possible far-distant callees
// far_call does not use the stub code section for additional code. It inserts the code at a call site.
// This prevents the call from optimization to a direct call when the code is copied to CodeCache.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't understand any of this. As far as I can tell it is nonsense.
It introduces pointless confusing terminology such as "far-distant" and "near-distant".
"This prevents the call from optimization to a direct cal" is completely wrongl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This prevents the call from optimization to a direct cal" is completely wrongl

If I am wrong, could you please point me at the place where 'adrp, add, bl' is optimized?

As far as I can tell it is nonsense.

Could you please be more constructive?

It introduces pointless confusing terminology such as "far-distant" and "near-distant".

Please, offer your variant. There is no standard terminology:

We can use: short/long like GCC. What do you think?

//
// If a mark of the generated call BL is needed, a pointer to CodeBuffer keeping the generated code
// must be provided.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "If a mark of the generated call BL is needed" mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is done in the function:

address MacroAssembler::trampoline_call(Address entry, CodeBuffer* cbuf = NULL) {
...
  if (cbuf) cbuf->set_insts_mark();
  relocate(entry.rspec());
  if (!far_branches()) {
    bl(entry.target());
  } else {
    bl(pc());
  }

And most of the time NULL is passed.
I don't know how this code passed review. It smells badly.
What written is based on what I see: how the function is used. For example, can cbuf be any CodeBuffer? If not, how is it connected to the current CodeBuffer? If it is the same, why we pass a pointer but not a flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize what "mark of the generated call" meant.

Good point. I think it is likely that the inst_mark is no longer needed by the AArch64 back end. When we were experimenting with trampoline calls we did a lot of experiments, and I believe that inst_mark was needed at one time. However, I do not think that any current user of trampoline_call() uses the insts_mark for anything, and it could usefully be removed as part of a cleanup.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commentary is garbled, misleading, and very confusing. I suspect that anyone confused about how trampoline calls work will be even more confused after trying to read this.
Can you please explain what aspects of trampolines you're trying to clarify? Let's have a conversation about how to document this.

@eastig
Copy link
Member Author

eastig commented May 6, 2022

This commentary is garbled, misleading, and very confusing.

Could you please be more neutral in your words?

I suspect that anyone confused about how trampoline calls work will be even more confused after trying to read this.

This is why we have the review process to make things better.

Can you please explain what aspects of trampolines you're trying to clarify? Let's have a conversation about how to document this.

I am trying to clarify why we have two ways for long calls, pros/cons, which of two can be optimized. This mechanism of long call is very important part but it is not documented.

@theRealAph
Copy link
Contributor

theRealAph commented May 6, 2022 via email

@mlbridge
Copy link

mlbridge bot commented May 6, 2022

Mailing list message from Andrew Haley on hotspot-dev:

On 5/6/22 11:09, Evgeny Astigeevich wrote:

On Fri, 6 May 2022 08:08:48 GMT, Andrew Haley <aph at openjdk.org> wrote:

Can you please explain what aspects of trampolines you're trying to clarify? Let's have a conversation about how to document this.

I am trying to clarify why we have two ways for long calls, pros/cons, which of two can be optimized. This mechanism of long call is very important part but it is not documented.

OK. Please see my other response, which explains the details of how and why
the three types of calls are used. If there's anything missing from there
please ask, and I'll explain more.

@theRealAph
Copy link
Contributor

Could you please be more neutral in your words?

I apologize for my use of words. It was an inappropriate thing to say, and I wish I hadn't said it.

If you would like to continue, perhaps we could work on this together, and come up with something we both like. Would you be happy to try that?

I think it could look something like a list. Here's a rough sketch of what I think might work.


AArch64 OpenJDK uses four different types of call

direct call: bl address

  This is the shortest and the fastest, but only has a reach of ...

far call: adrp; add; bl reg

  This is longer and slower than a direct call, but can reach anywhere
  in the code cache...

trampoline call:

  This is only available in C2-compiled code. It is a combination of a
  direct call, which is used if the destination of a call is in range,
  and a register-indirect call. It has the advantages that it can
  reach anywhere in the AArch64 address space and that it is patchable
  at runtime when the generated code is being executed by other
  threads.

     bl trampoline

   Stub:

   trampoline:
     ldr reg, pc + 8
     br reg
     <64-bit destination address>   

indirect call: move reg, address; blr reg

   This too can reach anywhere in the address space, but it cannot be
   patch while code is running, so it must be used at a safepoint.

@eastig
Copy link
Member Author

eastig commented May 12, 2022

Hi Andrew,

If you would like to continue, perhaps we could work on this together, and come up with something we both like. Would you be happy to try that?

Sorry for the late response. I am on a business trip. Yes, I am keen to finish this.
BTW, we have a new trampoline_call1: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L1101. This is a surprise from the project Loom. I'll try to handle it as well.

I apologize for my use of words. It was an inappropriate thing to say, and I wish I hadn't said it.

Thank you. Apologies accepted.

Thank you for the details. They helped a lot. Most of them are aligned with what I've read in sources and seen in a debugger.

trampoline call:
...
It has the advantages that it can
reach anywhere in the AArch64 address space.

I created a bug: https://bugs.openjdk.java.net/browse/JDK-8286314. With small CodeCache, trampolines are not created for out of range targets which are outside CodeCache.

Ah, I think I see. Are you saying that a far call is not
converted to a direct call when the code is moved into the code
cache, even if a direct call might reach its target? And that far
calls do not need a trampoline.

I wanted to say that trampoline calls support link-time optimization: replacing a trampoline call by a direct one. Link-time optimization are not applied to far calls at the moment. Could I write in this way?

BTW, if we are not going to relocate code (this is true for non-nmethod), we can patch far calls as well. During copying code to CodeCache, we can:

  • keep adrp;add but replace blr with bl
  • replace adrp;add;blr with nop;nop;bl

What do you think?

@theRealAph
Copy link
Contributor

Hi,

Thank you for the details. They helped a lot. Most of them are aligned with what I've read in sources and seen in a debugger.

trampoline call:
...
It has the advantages that it can
reach anywhere in the AArch64 address space.

I created a bug: https://bugs.openjdk.java.net/browse/JDK-8286314. With small CodeCache, trampolines are not created for out of range targets which are outside CodeCache.

Oh! Thank you. I wonder why we never saw that one before. I guess because we don't normally have a small-enough CodeCache.

I wanted to say that trampoline calls support link-time optimization: replacing a trampoline call by a direct one. Link-time optimization are not applied to far calls at the moment. Could I write in this way?

Sounds good.

BTW, if we are not going to relocate code (this is true for non-nmethod), we can patch far calls as well. During copying code to CodeCache, we can:

* keep `adrp;add` but replace `blr` with `bl`

* replace `adrp;add;blr` with  `nop;nop;bl`

What do you think?

Either sounds fine. I guess the latter will be a bit more efficient.

@eastig
Copy link
Member Author

eastig commented May 19, 2022

Hi Andrew,

JFYI

However, I do not think that any current user of trampoline_call() uses the insts_mark for anything, and it could usefully be removed as part of a cleanup.

I have removed this. There were assert crashes in fastdebug because trampoline_call must be connected with CompiledStaticCall::emit_to_interp_stub to have correct relocInfo records. I fixed this.

BTW, we have a new trampoline_call1: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L1101. This is a surprise from the project Loom. I'll try to handle it as well.

I am currently removing trampoline_call1.

@eastig eastig changed the title 8286058: AArch64: clarify uses of MacroAssembler::trampoline_call 8286058: AArch64: clarify uses of calls May 19, 2022
@eastig eastig changed the title 8286058: AArch64: clarify uses of calls 8286058: AArch64: clarify types of calls May 19, 2022
@eastig
Copy link
Member Author

eastig commented May 24, 2022

Hi Andrew,

The final version is here. Could you please review it?

}

// Far_call and far_jump generate a call of/jump to the provided address.
// Emit a direct call/jump if the entry address is always in range,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Emit a direct call/jump if the entry address is always in range,
// Emit a direct call/jump if the entry address will always in be range,

@theRealAph
Copy link
Contributor

Hi Andrew,

The final version is here. Could you please review it?

It might be a good idea to handle the code changes in another patch later. I don't think we should push minor code changes at this point before JDK 19 rampdown, and this is a doc patch.

@eastig
Copy link
Member Author

eastig commented May 25, 2022

Hi Andrew,
The final version is here. Could you please review it?

It might be a good idea to handle the code changes in another patch later. I don't think we should push minor code changes at this point before JDK 19 rampdown, and this is a doc patch.

Completely agree. I'll move them into separate PRs:

  • Remove trampoline_call1
  • Remove the cbuf argument from far_call/far_jump/trampoline_call.

@eastig
Copy link
Member Author

eastig commented May 26, 2022

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 26, 2022
@openjdk
Copy link

openjdk bot commented May 26, 2022

@eastig
Your change (at version 0f1fb9b) is now ready to be sponsored by a Committer.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and understandable.

@eastig
Copy link
Member Author

eastig commented May 27, 2022

  • Remove trampoline_call1

https://bugs.openjdk.java.net/browse/JDK-8287393

  • Remove the cbuf argument from far_call/far_jump/trampoline_call.

https://bugs.openjdk.java.net/browse/JDK-8287394

@eastig
Copy link
Member Author

eastig commented May 27, 2022

@vnkozlov, @theRealAph Thank you for reviewing.
Could you sponsor it?

@vnkozlov
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented May 27, 2022

Going to push as commit 140419f.
Since your change was applied there have been 135 commits pushed to the master branch:

  • 6a33974: 8286737: Test vmTestbase/gc/gctests/WeakReference/weak006/weak006.java fails: Last soft reference has not been cleared
  • 295be6f: 8287285: Avoid redundant HashMap.containsKey call in java.util.zip.ZipFile.Source.get
  • 7cb368b: 8286709: (fc) FileChannel/FileChannelImpl cleanup
  • 7eb1559: 8286045: Use ForceGC for cleaner test cases
  • e44465d: 8287336: GHA: Workflows break on patch versions
  • c10749a: 8287187: Utilize HashMap.newHashMap() in CLDRConverter
  • f235955: 8287246: DSAKeyValue should check for missing params instead of relying on KeyFactory provider
  • f58c9a6: 8287244: Add bound check in indexed memory access var handle
  • f710393: 8257810: Only First page are printed in JTable.scrollRectToVisible
  • 704b9a6: 8287107: CgroupSubsystemFactory.setCgroupV2Path asserts with freezer controller
  • ... and 125 more: https://git.openjdk.java.net/jdk/compare/81e4bdbe1358b7feced08ba758ddb66415968036...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 27, 2022
@openjdk openjdk bot closed this May 27, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 27, 2022
@openjdk
Copy link

openjdk bot commented May 27, 2022

@vnkozlov @eastig Pushed as commit 140419f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

shqking added a commit to shqking/jdk that referenced this pull request Nov 17, 2022
…c_call

1) After the fix of JDK-8287394, there is no need for clear_inst_mark
after trampoline_call. See the discussion in [1].

2) MacroAssembler::ic_call has trampoline_call as the last call.

Hence, clear_inst_mark after MacroAssembler::ic_call can be removed.
There is such a case in aarch64_enc_java_dynamic_call. We conduct the
cleanup in this patch.

Testing: tier1~3 passed with no new failures on Linux/AArch64 platform.

[1] openjdk#8564 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants