New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8274795: AArch64: avoid spilling and restoring r18 in macro assembler #5828
Conversation
r18 should not be used as it is reserved as platform register. Linux is fine with userspace using it, but Windows and also recently macOS ( openjdk/jdk11u-dev#301 (comment) ) are actually using it on the kernel side. The macro assembler uses the bit pattern `0x7fffffff` (== `r0-r30`) to specify which registers to spill; fortunately this helper is only used here: https://github.com/openjdk/jdk/blob/c05dc268acaf87236f30cf700ea3ac778e3b20e5/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#L1400-L1404 I haven't seen causing this particular instance any issues in practice _yet_, presumably because it looks hard to align the stars in order to trigger a problem (between stp and ldp of r18 a transition to kernel space must happen *and* the kernel needs to do something with r18). But jdk11u-dev has more usages of the `::pusha`/`::popa` macro and that causes troubles as explained in the link above. Output of `-XX:+PrintInterpreter` before this change: ``` ---------------------------------------------------------------------- method entry point (kind = native) [0x0000000138809b00, 0x000000013880a280] 1920 bytes -------------------------------------------------------------------------------- 0x0000000138809b00: ldr x2, [x12, openjdk#16] 0x0000000138809b04: ldrh w2, [x2, openjdk#44] 0x0000000138809b08: add x24, x20, x2, uxtx openjdk#3 0x0000000138809b0c: sub x24, x24, #0x8 [...] 0x0000000138809fa4: stp x16, x17, [sp, openjdk#128] 0x0000000138809fa8: stp x18, x19, [sp, openjdk#144] 0x0000000138809fac: stp x20, x21, [sp, openjdk#160] [...] 0x0000000138809fc0: stp x30, xzr, [sp, openjdk#240] 0x0000000138809fc4: mov x0, x28 ;; 0x10864ACCC 0x0000000138809fc8: mov x9, #0xaccc // #44236 0x0000000138809fcc: movk x9, #0x864, lsl openjdk#16 0x0000000138809fd0: movk x9, #0x1, lsl openjdk#32 0x0000000138809fd4: blr x9 0x0000000138809fd8: ldp x2, x3, [sp, openjdk#16] [...] 0x0000000138809ff4: ldp x16, x17, [sp, openjdk#128] 0x0000000138809ff8: ldp x18, x19, [sp, openjdk#144] 0x0000000138809ffc: ldp x20, x21, [sp, openjdk#160] ``` After: ``` ---------------------------------------------------------------------- method entry point (kind = native) [0x0000000108e4db00, 0x0000000108e4e280] 1920 bytes -------------------------------------------------------------------------------- 0x0000000108e4db00: ldr x2, [x12, openjdk#16] 0x0000000108e4db04: ldrh w2, [x2, openjdk#44] 0x0000000108e4db08: add x24, x20, x2, uxtx openjdk#3 0x0000000108e4db0c: sub x24, x24, #0x8 [...] 0x0000000108e4dfa4: stp x16, x17, [sp, openjdk#128] 0x0000000108e4dfa8: stp x19, x20, [sp, openjdk#144] 0x0000000108e4dfac: stp x21, x22, [sp, openjdk#160] [...] 0x0000000108e4dfbc: stp x29, x30, [sp, openjdk#224] 0x0000000108e4dfc0: mov x0, x28 ;; 0x107E4A06C 0x0000000108e4dfc4: mov x9, #0xa06c // #41068 0x0000000108e4dfc8: movk x9, #0x7e4, lsl openjdk#16 0x0000000108e4dfcc: movk x9, #0x1, lsl openjdk#32 0x0000000108e4dfd0: blr x9 0x0000000108e4dfd4: ldp x2, x3, [sp, openjdk#16] [...] 0x0000000108e4dff0: ldp x16, x17, [sp, openjdk#128] 0x0000000108e4dff4: ldp x19, x20, [sp, openjdk#144] 0x0000000108e4dff8: ldp x21, x22, [sp, openjdk#160] [...] ```
|
Webrevs
|
Skip printing R18 in MacroAssembler::debug64() too |
Better to delete |
is it possible then to predict what to print in debug64 call ? |
Please leave register printing alone.
This is explicit, and doesn't touch anything unnecessarily. |
it's not, right? This PR changes jdk/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp Lines 599 to 610 in df125f6
That said, I think |
Restore looks like this now: ``` 0x0000000106e4dfcc: movk x9, #0x5e4, lsl openjdk#16 0x0000000106e4dfd0: movk x9, #0x1, lsl openjdk#32 0x0000000106e4dfd4: blr x9 0x0000000106e4dfd8: ldp x2, x3, [sp, openjdk#16] 0x0000000106e4dfdc: ldp x4, x5, [sp, openjdk#32] 0x0000000106e4dfe0: ldp x6, x7, [sp, openjdk#48] 0x0000000106e4dfe4: ldp x8, x9, [sp, openjdk#64] 0x0000000106e4dfe8: ldp x10, x11, [sp, openjdk#80] 0x0000000106e4dfec: ldp x12, x13, [sp, openjdk#96] 0x0000000106e4dff0: ldp x14, x15, [sp, openjdk#112] 0x0000000106e4dff4: ldp x16, x17, [sp, openjdk#128] 0x0000000106e4dff8: ldp x0, x1, [sp], openjdk#144 0x0000000106e4dffc: ldp xzr, x19, [sp], openjdk#16 0x0000000106e4e000: ldp x22, x23, [sp, openjdk#16] 0x0000000106e4e004: ldp x24, x25, [sp, openjdk#32] 0x0000000106e4e008: ldp x26, x27, [sp, openjdk#48] 0x0000000106e4e00c: ldp x28, x29, [sp, openjdk#64] 0x0000000106e4e010: ldp x30, xzr, [sp, openjdk#80] 0x0000000106e4e014: ldp x20, x21, [sp], openjdk#96 0x0000000106e4e018: ldur x12, [x29, #-24] 0x0000000106e4e01c: ldr x22, [x12, openjdk#16] 0x0000000106e4e020: add x22, x22, #0x30 0x0000000106e4e024: ldr x8, [x28, openjdk#8] ```
Thanks for the review Andrew. Restore looks like this now:
|
Sorry, stupid mistake: the pop is done in reverse, of course! Please try this (still untested) code.
|
hmm, I think it's fine. Checkout the full listing:
In terms of testing, it seems hard to trigger, so I forced the code path via: diff --git a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
index d069345b9c0..046c325faa7 100644
--- a/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
@@ -1392,11 +1392,14 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) {
{
Label no_reguard;
+ Label skip_check;
+ __ b(skip_check);
__ lea(rscratch1, Address(rthread, in_bytes(JavaThread::stack_guard_state_offset())));
__ ldrw(rscratch1, Address(rscratch1));
__ cmp(rscratch1, (u1)StackOverflow::stack_guard_yellow_reserved_disabled);
__ br(Assembler::NE, no_reguard);
+ __ bind(skip_check);
__ push(RegSet::range(r0, r30), sp);
__ mov(c_rarg0, rthread);
__ mov(rscratch2, CAST_FROM_FN_PTR(address, SharedRuntime::reguard_yellow_pages)); which crashes with your most recent suggestion btw, but it's fine with what's in this PR right now. Any better suggestion on how to test it? |
You're right, it looks fine. It's a bit odd but it's really OK. I'd check it simply by stopping at the start of the push, printing all registers, and stepping through. |
So we have fixed the only use of |
Thanks for the suggestion @theRealAph. By looking through the code I discovered another buggy set of helpers, Does it look fine to you, or should the code around |
Sure, so we can do the same thing there, if we stil need it.
Yes, very much so. |
src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
Outdated
Show resolved
Hide resolved
src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Andrew Haley <aph-open@littlepinkcloud.com>
done, thanks! |
@lewurm This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 103 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph) but any other Committer may sponsor as well.
|
/integrate |
@theRealAph would you mind to sponsor this change? |
/sponsor |
Going to push as commit ede3f4e.
Your commit was automatically rebased without conflicts. |
@theRealAph @lewurm Pushed as commit ede3f4e. |
Thank you Andrew! |
r18 should not be used as it is reserved as platform register. Linux is fine with userspace using it, but Windows and also recently macOS (openjdk/jdk11u-dev#301 (comment) ) are actually using it on the kernel side.
The macro assembler uses the bit pattern
0x7fff_ffff
(==r0-r30
) to specify which registers to spill; fortunately this helper is only used here:jdk/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp
Lines 1400 to 1404 in c05dc26
I haven't seen causing this particular instance any issues in practice yet, presumably because it looks hard to align the stars in order to trigger a problem (between
stp
andldp
ofr18
a transition to kernel space must happen and the kernel needs to do something withr18
). But jdk11u-dev has more usages of the::pusha
/::popa
macro and that causes troubles as explained in the link above.Output of
-XX:+PrintInterpreter
before this change:After:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5828/head:pull/5828
$ git checkout pull/5828
Update a local copy of the PR:
$ git checkout pull/5828
$ git pull https://git.openjdk.java.net/jdk pull/5828/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5828
View PR using the GUI difftool:
$ git pr show -t 5828
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5828.diff