-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clang ABI mismatch with Arm64 MSVC (HVA rules) #62223
Comments
@llvm/issue-subscribers-backend-aarch64 |
Yes, the ABI rules are different on Windows vs. Linux. See https://reviews.llvm.org/D134688 . Why do you think clang is behaving incorrectly? |
Thanks. Please see https://godbolt.org/z/hPEfhc1n7 also for MSVC code gen using the latest MSVC. The code generation has been the same for all Arm64 MSVC versions since 16.9
|
CC @rnk @dwblaikie |
Is this reduced example still representative: https://godbolt.org/z/e79z3vq8j ? I don't know enough about the intrinsics to do more here, probably - I wrote the non-intrinsic version to see whether that was relevant, but it looks like we produce matching code in that case. So I guess maybe this is about how the float32x4_t is passed, rather than how the vint4 is returned. I'd need to figure out how to read/write these values (ideally as directly/simply as possible) to further explore where the bug is, etc. @zmodem got someone interested in this? |
Slightly modified testcase: #include <arm_neon.h>
template<typename T> struct wrap {
#ifdef EXPLICIT_CTOR
wrap(T a) { m = a; }
#endif
T m;
static wrap dowrap(T a, T b) { return wrap{b}; }
};
template wrap<int> wrap<int>::dowrap(int a, int b);
template wrap<double> wrap<double>::dowrap(double a, double b);
template wrap<int32x4_t> wrap<int32x4_t>::dowrap(int32x4_t a, int32x4_t b);
template wrap<int32x4x2_t> wrap<int32x4x2_t>::dowrap(int32x4x2_t a, int32x4x2_t b); If |
@llvm/issue-subscribers-clang-codegen |
https://godbolt.org/z/65vYW989f (godbolt of the last comment, #62223 (comment) ) Fascinating. Does the ABI spec/Windows/anyone have wording that covers this? I guess the non-windows ARM ABI as implemented by clang at least passes all these cases in registers... so that's another thing entirely. Anyway. So if I add another member to the struct - if that member is the same (another T), then the results are the same, but if it's an int, then the results tip back to clang/msvc agreenig and everything indirect. So I guess it's if all the members are neon this behavior needs to change? Wonder if it's some other more general/different rule. I think MSVC published some things about their ABI somewhere - maybe there's some documentation out there about what rule we've missed? |
The ABI is supposedly described at https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170#return-values . It doesn't distinguish between HFAs and HVAs, same as the ARM ABI document. The implementation just has additional undocumented rules, apparently. |
This appears to be a bug in the initial implementation of the calling convention for ARM64 in MSVC, which at this point is a documentation bug instead. The behavior for HVAs is what the documentation describes. I've opened a pull request to correct the documentation: MicrosoftDocs/cpp-docs#4527 |
It would be good to close this issue if it is a bug with MSVC rather than LLVM. |
I don't think @sigatrev is suggesting this is an MSVC bug, as such, but that it /was/ a bug, but now it's part of the MSVC ABI (ie: they aren't likely to go back and fix it to match the documentation) - so they've filed a bug to update the documentation to match their current implementation. So ultimately we'd end up needing to change clang to match the MSVC behavior - easier once it's documented, but either way (documented or not) we probably need to make a clang change to match MSVC. Is that right @sigatrev ? |
That is exactly right @dwblaikie. The current behavior is not what we intended, but it is now very much part of the MSVC ARM64 ABI, and it is not going to be changing. The documentation has been updated to reflect this behavior. The difference applies only to HFAs, and not to HVAs. If there are additional questions about MSVC's handling in specific situations, feel free to ping me. |
I see, thanks for clearing up my misunderstanding. |
I am investigating this issue and I intend to start looking into a fix for this issue in few weeks time. If someone wants to fix it before I start feel free to assign it to yourself. |
Posted https://reviews.llvm.org/D153179 . Found #63360 while trying to make sure I understood all the relevant edge cases. |
MSVC normally has a bunch of restrictions on returning values directly which don't apply to passing values directly. (This roughly corresponds to the definition of a C++14 aggregate.) However, these restrictions don't apply to HVAs; make sure we check for that. Fixes llvm/llvm-project#62223 Differential Revision: https://reviews.llvm.org/D153179
The code generation difference can be seen using this simple program below. (or https://godbolt.org/z/rrjvjfcz9)
With
-O2 --target=aarch64-none-windows
, the code gen isThe corresponding bitcode for the function is below. Note that it returns void, which means that the compiler doesn't treat
vint4
as a vector.while with
-O2 --target=aarch64-none-linux
, the code is insteadAnd the corresponding bitcode for the function is the following, which returns
%struct.vint4 %8
.The text was updated successfully, but these errors were encountered: