-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support macOS Aarch64 ABI for compiled wrappers. #3
Support macOS Aarch64 ABI for compiled wrappers. #3
Conversation
This PR adds support for Apple's aarch64 C ABI, that requires stack arguments to be sized according to its data type, and considering padding properly. https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms
👋 Welcome back gonzalolarralde! A progress list of the required criteria for merging this PR into |
@gonzalolarralde The label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello.
I think you need to get rid of macros
handle_padding(bytes_used, 1);
advance_offset(bytes_used, 1);
you just put platform dependent size of type on stack in generic code
handle_padding can be replaced with openjdk's align_up() function
and advance_offset is simple bytes_used += size of type on stack;
size_of_type_on_stack can be made under #ifdefs in the beginning of the file
check example in our interp patch - 368f06e
_stack_offset = align_up(_stack_offset, nativeDoubleSpace);
__ ldr(r0, src);
__ str(r0, Address(to(), _stack_offset));
_stack_offset += nativeDoubleSpace;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think there is a correctness issue that I would like to discuss
// abi-dependant move for required types | ||
#ifdef __APPLE__ | ||
# define abi_dep_word_move(masm, src, dst, size) move_merge_and_shift(masm, src, dst, size); | ||
# define abi_dep_float_move(masm, src, dst, size) move_merge_and_shift(masm, src, dst, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move_merge_and_shift
is unable to load to a float register (i.e. in stack->reg move), only to a general purpose one. I wonder why this was not caught by tests, but it looks like a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I overlooked that. it seems that test cases might not be causing a displacement of float parameters, therefore never making this evaluate true
: (src.first() != dst.first())
. I will try to force the error and add a test if I confirm this to be the case.
// On type size dependant ABIs types smaller than 8 bytes will use the second word | ||
// to store the position relative to the doubleword where the value needs to be | ||
// written. This allows padding to be pre-calculated. As a result of this it is | ||
// to be expected that some contiguous arguments share the same stack offset. | ||
int SharedRuntime::c_calling_convention(const BasicType *sig_bt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c_calling_convention
is used by the shared VM code, which may be confused by non-VMRegs in regs. Although I assume shared code asks convention for short signatures only, some form of assert will be useful anyway, so we'll be sure shared code won't produce unexpected result.
For example, it is possible to make a private c_calling_convention, which will be called only from the aarch64 code, which can deal with unaliged offsets on stack. And to create another "public" c_calling_convention that does not try to encode anything unusual in regs. As a side benefit, the first "private" function can encode unaligned offsets in a more straightforward way, like a a separate bitmap or array argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, will address this as you're describing. thanks!
@@ -856,7 +906,10 @@ int SharedRuntime::c_calling_convention(const BasicType *sig_bt, | |||
} | |||
} | |||
|
|||
return stk_args; | |||
// Ensure padding when finished parsing arguments | |||
bytes_used += bytes_used % 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align_up
will be a bit more straightforward
// Ensure padding when finished parsing arguments | ||
bytes_used += bytes_used % 8; | ||
|
||
return bytes_used / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using VMRegImpl::stack_slot_size (4, a fixed size slot equal sizeof(jint)
), wordSize (8, native word size) and VMRegImpl::slots_per_word (wordSize / stack_slot_size)
if (src.first()->is_stack()) { | ||
__ ldr(rscratch1, Address(rfp, reg2offset_in(src.first()))); | ||
} else { | ||
__ mov(rscratch1, src.first()->as_Register()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it's better operate directly on the src.first()->as_Register()
__ andr(rscratch1, rscratch1, (1L << bit_size) - 1); | ||
__ lsl(rscratch1, rscratch1, bit_offset); | ||
|
||
// load destination register | ||
__ ldr(rscratch2, Address(sp, reg2offset_out(dst.first()))); | ||
|
||
// clear bits to be overwritten by shift | ||
if (bit_offset > 0) { | ||
__ ror(rscratch2, rscratch2, bit_offset); | ||
} | ||
__ lsr(rscratch2, rscratch2, bit_size); | ||
__ ror(rscratch2, rscratch2, 64 - bit_size - bit_offset); | ||
|
||
// combine | ||
__ orr(rscratch1, rscratch1, rscratch2); | ||
|
||
// save | ||
__ str(rscratch1, Address(sp, reg2offset_out(dst.first()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be replaced with just lsr
and strw/strh/strb
?
default: | ||
move32_64(masm, in_regs[i], out_regs[c_arg]); | ||
int_args++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we have processed all necessary types above, shouldn't the default case be ShouldNotReachHere()
?
Thanks for the comments @AntonKozlov, I'll try to rebuild using the latest changes on the branch that add support for MAP_JIT and get back with the feedback addressed. Due to the lack of a workaround my build + test runs were blocked, but now that those patches are public that should unblock me. |
@gonzalolarralde This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
This PR adds support for Apple's aarch64 C ABI, that requires stack arguments to be sized according to its data type, and considering padding properly.
Reference: https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms
/cc @AntonKozlov
Ref: JDK-8253818
Progress
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/aarch64-port.git pull/3/head:pull/3
$ git checkout pull/3
Update a local copy of the PR:
$ git checkout pull/3
$ git pull https://git.openjdk.org/aarch64-port.git pull/3/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3
View PR using the GUI difftool:
$ git pr show -t 3
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/aarch64-port/pull/3.diff