8260355: AArch64: deoptimization stub should save vector registers#2279
8260355: AArch64: deoptimization stub should save vector registers#2279nick-arm wants to merge 4 commits intoopenjdk:masterfrom
Conversation
This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
doesn't save vector registers on x86". The problem is that a vector
produced by the Vector API may be stored in a register when the deopt
blob is called. Because the deopt blob only stores the lower half of
vector registers, the full vector object cannot be rematerialized during
deoptimization. So the following will crash on AArch64 with current JDK:
make test TEST="jdk/incubator/vector" \
JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
The fix is to store the full vector registers by passing
save_vectors=true to save_live_registers() in the deopt blob. Because
save_live_registers() places the integer registers above the floating
registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
to calculate the SP offset based on whether full vectors were saved, and
whether those vectors were NEON or SVE, rather than using a static
offset as it does currently.
The change to VectorSupport::allocate_vector_payload_helper() is
required because we only store the lowest VMReg slot in the oop map.
However unlike x86 the vector registers are always saved in a contiguous
region of memory, so we can calculate the address of each vector element
as an offset from the address of the first slot. X86 handles this in
RegisterMap::pd_location() but that won't work on AArch64 because with
SVE there isn't a unique VMReg corresponding to each four-byte physical
slot in the vector (there are always exactly eight logical VMRegs
regardless of the actual vector length).
Tested hotspot_all_no_apps and jdk_core.
|
👋 Welcome back ngasson! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
Mailing list message from Andrew Haley on hotspot-dev: On 1/28/21 8:31 AM, Nick Gasson wrote:
It seems to me that save_vectors is only set here: bool save_vectors = COMPILER2_OR_JVMCI != 0; which means that save_vectors is a static property of a build, not something Also, I'm wondering how much all of this complexity gains us for the sake -- |
| for (int i = 0; i < num_elem; i++) { | ||
| int vslot = (i * elem_size) / VMRegImpl::stack_slot_size; | ||
| int off = (i * elem_size) % VMRegImpl::stack_slot_size; | ||
| bool contiguous = X86_ONLY(false) NOT_X86(true); |
There was a problem hiding this comment.
I don't like this change. It's not x86-specific, but SVE-specific code. What is broken here is VMReg::next() doesn't work properly for VecA registers. And, as a result, it makes RegisterMap::location(VMReg) unusable as well.
So, a proper fix should address that instead. If there's no way to save VMReg::next() and RegisterMap::location(VMReg), then new cross-platform API should be introduced and VectorSupport::allocate_vector_payload_helper() migrated to it.
There was a problem hiding this comment.
For Arm NEON (and PPC) we don't set VMReg::next() in oopmap either, and their vector slots are contiguous, so that's x86-specific? But yes, NEON can also generate correct full oopmap as for fixed vector size. For SVE, I have no idea to have proper VMReg::next() support, so Nick's solution looks good to me. Regarding to introducing new cross-platform API, which API do you mean? If we could have some better api, that would be perfect. Currently, allocate_vector_payload_helper() is the only one I can see that is vector related for RegisterMap::location() call.
There was a problem hiding this comment.
Probably, x86 is unique in using non-contiguous representation for vector values, but it doesn't make the code in question x86-specific. AArch64 is the only user ofVecA and VecA is the only register type that has a mismatch in size between in-memory and RegMask representation. So, I conclude it is AArch64/SVE-specific.
On x86 RegisterMap isn't fully populated for vector registers as well, but there'sRegisterMap::pd_location() to cover that.
Regarding new API, I mean the alternative to VMReg::next()/RegisterMap::location(VMReg) which is able to handle VecA case well. As Nick pointed out earlier, the problem with VecA is that there's no VMReg representation for all the slots which comprise the register value.
Either enhancing VMReg::next(int) to produce special values for VecA case or introducing RegisterMap::location(VMReg base_reg, int slot) is a better way to handle the problem.
There was a problem hiding this comment.
@iwanowww please take a look at the latest set of changes and let me know what you think. There's now a RegisterMap::location(VMReg base_reg, int slot) method as you suggest. That in turn uses a new method VMReg::is_expressible(int slot_delta) which is true if offsetting a VMReg by slot_delta slots gives another valid VMReg which is also a slot of the same physical register (i.e. reg->next(slot_delta) is valid). We can use this to fall back to pd_location if a slot of a vector is not expressible as a VMReg (i.e. for SVE). Unfortunately it touches a lot of files but that seems unavoidable.
|
Mailing list message from Nick Gasson on hotspot-dev: On 01/28/21 17:48 pm, Andrew Haley wrote:
RegisterSaver is also used by generate_resolve_blob (which never saves RegisterSaver reg_save(COMPILER2_OR_JVMCI != 0 /* save_vectors */); Which avoids passing save_vectors around everywhere.
For NEON the difference is 768 bytes vs 512, but SVE could be a lot 83 // FIXME -- this is used by C1 Do you remember what this is referring to? That it's duplicating -- |
|
Mailing list message from Andrew Haley on hotspot-dev: On 1/29/21 7:53 AM, Nick Gasson wrote:
That sounds like a great improvement.
OK, so it probably wouldn't be worth doing on NEON. But A64FX vectors are 64
Probably, yes. -- |
|
Much better, thanks. I suggest the following changes:
Or, as an alternative (since all the registers are stored contiguously on AArch64 anyway):
|
I've changed it as suggested. This way seems much simpler, thanks. |
iwanowww
left a comment
There was a problem hiding this comment.
RegisterMap-related changes look good.
|
@nick-arm This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 107 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@theRealAph are the sharedRuntime_aarch64.cpp changes ok? |
|
Mailing list message from Andrew Haley on hotspot-dev: On 2/3/21 7:02 AM, Nick Gasson wrote:
I guess so, but the code changes are so complex and delicate it's extremely What have you done about stress testing? I guess we need some code that's -- |
|
Mailing list message from Nick Gasson on hotspot-dev: On 02/03/21 17:36 pm, Andrew Haley wrote:
I tried make bootcycle-images as you suggest with -XX:+DeoptimizeALot I've also previously run the tier1 and java/incubator/vector/* tests -- |
|
Mailing list message from Andrew Haley on hotspot-dev: On 2/4/21 7:21 AM, Nick Gasson wrote:
Yeah. The problem here is that safepoints with live vectors aren't so -- |
|
Mailing list message from Nick Gasson on hotspot-dev: On 02/04/21 16:18 pm, Andrew Haley wrote:
You can test that situation quite readily with: make test TEST="jdk/incubator/vector" \ Which will segfault with current JDK. I guess the difficulty is showing -- |
|
Mailing list message from Vladimir Ivanov on hotspot-dev:
FTR jdk/java/incubator/vector tests w/ -XX:+DeoptimizeALot are very good Best regards, |
|
Mailing list message from Andrew Haley on hotspot-dev: On 2/4/21 10:03 AM, Vladimir Ivanov wrote:
Great, thanks. -- |
|
@theRealAph Is this one ok to push now? |
|
/integrate |
|
@nick-arm Since your change was applied there have been 123 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 5183d8a. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
doesn't save vector registers on x86". The problem is that a vector
produced by the Vector API may be stored in a register when the deopt
blob is called. Because the deopt blob only stores the lower half of
vector registers, the full vector object cannot be rematerialized during
deoptimization. So the following will crash on AArch64 with current JDK:
make test TEST="jdk/incubator/vector"
JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
The fix is to store the full vector registers by passing
save_vectors=true to save_live_registers() in the deopt blob. Because
save_live_registers() places the integer registers above the floating
registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
to calculate the SP offset based on whether full vectors were saved, and
whether those vectors were NEON or SVE, rather than using a static
offset as it does currently.
The change to VectorSupport::allocate_vector_payload_helper() is
required because we only store the lowest VMReg slot in the oop map.
However unlike x86 the vector registers are always saved in a contiguous
region of memory, so we can calculate the address of each vector element
as an offset from the address of the first slot. X86 handles this in
RegisterMap::pd_location() but that won't work on AArch64 because with
SVE there isn't a unique VMReg corresponding to each four-byte physical
slot in the vector (there are always exactly eight logical VMRegs
regardless of the actual vector length).
Tested hotspot_all_no_apps and jdk_core.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279$ git checkout pull/2279