8286058: AArch64: clarify types of calls#8564
Conversation
|
👋 Welcome back eastig! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
@vnkozlov, @theRealAph |
|
I will run tests too to make sure we don't hit asserts. |
|
@eastig This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 135 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @theRealAph) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
| // with the call of the target. | ||
| // | ||
| // Trampoline_call is most suitable for calls of Java methods. Java calls callees can be changed | ||
| // to the interpreter or different versions of a compiled method. Those callees can be |
There was a problem hiding this comment.
This is not true. Trampoline calls are needed for calls to the runtime too.
There was a problem hiding this comment.
Why? What are the cases? This is not clear from sources.
| // The code for runtime calls can also be generated with far_call. For possible far-distant callees | ||
| // far_call does not use the stub code section for additional code. It inserts the code at a call site. | ||
| // This prevents the call from optimization to a direct call when the code is copied to CodeCache. | ||
| // |
There was a problem hiding this comment.
I can't understand any of this. As far as I can tell it is nonsense.
It introduces pointless confusing terminology such as "far-distant" and "near-distant".
"This prevents the call from optimization to a direct cal" is completely wrongl
There was a problem hiding this comment.
"This prevents the call from optimization to a direct cal" is completely wrongl
If I am wrong, could you please point me at the place where 'adrp, add, bl' is optimized?
As far as I can tell it is nonsense.
Could you please be more constructive?
It introduces pointless confusing terminology such as "far-distant" and "near-distant".
Please, offer your variant. There is no standard terminology:
- GCC uses: long/short https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/ARM-Function-Attributes.html
- TI uses: relative/absolute/far away/near call https://downloads.ti.com/docs/esd/SPNU118/generate-far-call-trampolines-trampolines-option-stdz0755016.html
- ARM always calls tramopolines veneers: https://developer.arm.com/documentation/dui0803/d/pge1406301797482
We can use: short/long like GCC. What do you think?
| // | ||
| // If a mark of the generated call BL is needed, a pointer to CodeBuffer keeping the generated code | ||
| // must be provided. | ||
| // |
There was a problem hiding this comment.
What does "If a mark of the generated call BL is needed" mean?
There was a problem hiding this comment.
This is what is done in the function:
address MacroAssembler::trampoline_call(Address entry, CodeBuffer* cbuf = NULL) {
...
if (cbuf) cbuf->set_insts_mark();
relocate(entry.rspec());
if (!far_branches()) {
bl(entry.target());
} else {
bl(pc());
}
And most of the time NULL is passed.
I don't know how this code passed review. It smells badly.
What written is based on what I see: how the function is used. For example, can cbuf be any CodeBuffer? If not, how is it connected to the current CodeBuffer? If it is the same, why we pass a pointer but not a flag.
There was a problem hiding this comment.
Sorry, I didn't realize what "mark of the generated call" meant.
Good point. I think it is likely that the inst_mark is no longer needed by the AArch64 back end. When we were experimenting with trampoline calls we did a lot of experiments, and I believe that inst_mark was needed at one time. However, I do not think that any current user of trampoline_call() uses the insts_mark for anything, and it could usefully be removed as part of a cleanup.
theRealAph
left a comment
There was a problem hiding this comment.
This commentary is garbled, misleading, and very confusing. I suspect that anyone confused about how trampoline calls work will be even more confused after trying to read this.
Can you please explain what aspects of trampolines you're trying to clarify? Let's have a conversation about how to document this.
Could you please be more neutral in your words?
This is why we have the review process to make things better.
I am trying to clarify why we have two ways for long calls, pros/cons, which of two can be optimized. This mechanism of long call is very important part but it is not documented. |
|
On 5/6/22 10:42, Evgeny Astigeevich wrote:
***@***.**** commented on this pull request.
--------
In src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp <#8564 (comment)>:
> + // If the distance to the address can exceed the branch range
+ // (128M for the release build, 2M for the debug build; see branch_range definition)
+ // for direct calls(BL), a special code with BR (trampoline) is put in the stub code section.
+ // The call is redirected to it. When CodeBuffer is copied to CodeCache, the distance to
+ // callee's address is checked to bypass the trampoline by replacing the call of the trampoline
+ // with the call of the target.
+ //
+ // Trampoline_call is most suitable for calls of Java methods. Java calls callees can be changed
+ // to the interpreter or different versions of a compiled method. Those callees can be
+ // near-distant or far-distant. Trampoline_call supports switching between near-distant callees
+ // and far-distant callees by having a reserved trampoline. The trampoline is only used if needed.
+ //
+ // The code for runtime calls can also be generated with far_call. For possible far-distant callees
+ // far_call does not use the stub code section for additional code. It inserts the code at a call site.
+ // This prevents the call from optimization to a direct call when the code is copied to CodeCache.
+ //
"This prevents the call from optimization to a direct cal" is completely wrongl
If I am wrong, could you please point me at the place where 'adrp, add, bl' is optimized?
Ah, I think I see. Are you saying that a far call is not
converted to a direct call when the code is moved into the code
cache, even if a direct call might reach its target? And that far
calls do not need a trampoline. OK.
As far as I can tell it is nonsense.
Could you please be more constructive?
I did not understand what you were saying. And I wrote all of the
code you're trying to describe.
It introduces pointless confusing terminology such as "far-distant" and "near-distant".
Please, offer your variant. There is no standard terminology:
The names we already use are direct call, far call, and
trampoline call. Direct calls can reach +/- 128M, so unless our
code cache is < 128M we can't always use them. Far calls have a
range of 4G, so can be reach anything in the code cache, but not
C++ code in a shared library. Trampoline calls can reach anywhere
in the address space.
When we're generating code in C2, we can always generate a
trampoline, and trampolines can reach anywhere. Outside C2 we may
not have a stub section so we can't generate a trampoline.
However, we can generate a far call, which can reach anywhere in
the code cache, but not the entire address space.
When patching code at runtime we are restricted by the rules in
the Arm ARM. We can replace a call with another call, but we
can't, for example, replace an ADRP with a CALL, or a CALL with
an ADRP.
|
|
Mailing list message from Andrew Haley on hotspot-dev: On 5/6/22 11:09, Evgeny Astigeevich wrote:
OK. Please see my other response, which explains the details of how and why |
I apologize for my use of words. It was an inappropriate thing to say, and I wish I hadn't said it. If you would like to continue, perhaps we could work on this together, and come up with something we both like. Would you be happy to try that? I think it could look something like a list. Here's a rough sketch of what I think might work. |
|
Hi Andrew,
Sorry for the late response. I am on a business trip. Yes, I am keen to finish this.
Thank you. Apologies accepted. Thank you for the details. They helped a lot. Most of them are aligned with what I've read in sources and seen in a debugger.
I created a bug: https://bugs.openjdk.java.net/browse/JDK-8286314. With small CodeCache, trampolines are not created for out of range targets which are outside CodeCache.
I wanted to say that trampoline calls support link-time optimization: replacing a trampoline call by a direct one. Link-time optimization are not applied to far calls at the moment. Could I write in this way? BTW, if we are not going to relocate code (this is true for non-nmethod), we can patch far calls as well. During copying code to CodeCache, we can:
What do you think? |
|
Hi,
Oh! Thank you. I wonder why we never saw that one before. I guess because we don't normally have a small-enough CodeCache.
Sounds good.
Either sounds fine. I guess the latter will be a bit more efficient. |
|
Hi Andrew, JFYI
I have removed this. There were assert crashes in fastdebug because trampoline_call must be connected with CompiledStaticCall::emit_to_interp_stub to have correct relocInfo records. I fixed this.
I am currently removing |
|
Hi Andrew, The final version is here. Could you please review it? |
| } | ||
|
|
||
| // Far_call and far_jump generate a call of/jump to the provided address. | ||
| // Emit a direct call/jump if the entry address is always in range, |
There was a problem hiding this comment.
| // Emit a direct call/jump if the entry address is always in range, | |
| // Emit a direct call/jump if the entry address will always in be range, |
It might be a good idea to handle the code changes in another patch later. I don't think we should push minor code changes at this point before JDK 19 rampdown, and this is a doc patch. |
Completely agree. I'll move them into separate PRs:
|
|
/integrate |
vnkozlov
left a comment
There was a problem hiding this comment.
Looks good and understandable.
https://bugs.openjdk.java.net/browse/JDK-8287393
|
|
@vnkozlov, @theRealAph Thank you for reviewing. |
|
/sponsor |
|
Going to push as commit 140419f.
Your commit was automatically rebased without conflicts. |
…c_call 1) After the fix of JDK-8287394, there is no need for clear_inst_mark after trampoline_call. See the discussion in [1]. 2) MacroAssembler::ic_call has trampoline_call as the last call. Hence, clear_inst_mark after MacroAssembler::ic_call can be removed. There is such a case in aarch64_enc_java_dynamic_call. We conduct the cleanup in this patch. Testing: tier1~3 passed with no new failures on Linux/AArch64 platform. [1] openjdk#8564 (comment)
The PR clarifies the types of calls AArch64 OpenJDK uses. It cleans up far_call, far_jump and trampoline_call. It removes trampoline_call1 because its use cases are now supported by trampoline_call.
Tested a fastdebug build:
gtest: Passedtier1...tier3: PassedProgress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8564/head:pull/8564$ git checkout pull/8564Update a local copy of the PR:
$ git checkout pull/8564$ git pull https://git.openjdk.java.net/jdk pull/8564/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8564View PR using the GUI difftool:
$ git pr show -t 8564Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8564.diff