Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang ABI mismatch with Arm64 MSVC (HVA rules) #62223

Closed
mcfi opened this issue Apr 18, 2023 · 16 comments
Closed

clang ABI mismatch with Arm64 MSVC (HVA rules) #62223

mcfi opened this issue Apr 18, 2023 · 16 comments
Assignees
Labels
ABI Application Binary Interface clang:codegen platform:windows

Comments

@mcfi
Copy link

mcfi commented Apr 18, 2023

The code generation difference can be seen using this simple program below. (or https://godbolt.org/z/rrjvjfcz9)

#include <arm_neon.h>

struct vfloat4
{
  inline vfloat4() = default;

  inline explicit vfloat4(float a)
 {
  m = vdupq_n_f32(a);
 }

  inline explicit vfloat4(float32x4_t a)
 {
  m = a;
 }

 float32x4_t m;
};

struct vint4
{
  inline vint4() = default;

  inline explicit vint4(int a)
 {
  m = vdupq_n_s32(a);
 }

  inline explicit vint4(int32x4_t a)
 {
  m = a;
 }

 int32x4_t m;
};

vint4 float_as_int(vfloat4 a)
{
 return vint4(vreinterpretq_s32_f32(a.m));
}

vfloat4 float_as_float(vfloat4 a)
{
 return a;
}

With -O2 --target=aarch64-none-windows, the code gen is

"?float_as_int@@YA?AUvint4@@Uvfloat4@@@Z": // @"?float_as_int@@YA?AUvint4@@Uvfloat4@@@Z"
        str     q0, [x0]
        ret

The corresponding bitcode for the function is below. Note that it returns void, which means that the compiler doesn't treat vint4 as a vector.

; Function Attrs: mustprogress noinline optnone uwtable
define dso_local void @"?float_as_int@@YA?AUvint4@@Uvfloat4@@@Z"(ptr inreg noalias sret(%struct.vint4) align 16 %0, [1 x <4 x float>] %1) #0 {
  %3 = alloca ptr, align 8
  %4 = alloca %struct.vfloat4, align 16
  store ptr %0, ptr %3, align 8
  %5 = getelementptr inbounds %struct.vfloat4, ptr %4, i32 0, i32 0
  store [1 x <4 x float>] %1, ptr %5, align 16
  %6 = getelementptr inbounds %struct.vfloat4, ptr %4, i32 0, i32 0
  %7 = load <4 x float>, ptr %6, align 16
  %8 = call noundef <4 x i32> @"?vreinterpretq_s32_f32@@YA?AT?$__vector@H$03@__clang@@T?$__vector@M$03@2@@Z"(<4 x float> noundef %7)
  %9 = call noundef ptr @"??0vint4@@QEAA@T?$__vector@H$03@__clang@@@Z"(ptr noundef nonnull align 16 dereferenceable(16) %0, <4 x i32> noundef %8)
  ret void
}

while with -O2 --target=aarch64-none-linux, the code is instead

float_as_int(vfloat4):               // @float_as_int(vfloat4)
        ret

And the corresponding bitcode for the function is the following, which returns %struct.vint4 %8.

define dso_local %struct.vint4 @_Z12float_as_int7vfloat4([1 x <4 x float>] %0) #0 {
  %2 = alloca %struct.vint4, align 16
  %3 = alloca %struct.vfloat4, align 16
  %4 = getelementptr inbounds %struct.vfloat4, ptr %3, i32 0, i32 0
  store [1 x <4 x float>] %0, ptr %4, align 16
  %5 = getelementptr inbounds %struct.vfloat4, ptr %3, i32 0, i32 0
  %6 = load <4 x float>, ptr %5, align 16
  %7 = call noundef <4 x i32> @_ZL21vreinterpretq_s32_f3213__Float32x4_t(<4 x float> noundef %6)
  call void @_ZN5vint4C2E11__Int32x4_t(ptr noundef nonnull align 16 dereferenceable(16) %2, <4 x i32> noundef %7)
  %8 = load %struct.vint4, ptr %2, align 16
  ret %struct.vint4 %8
}
@mcfi mcfi changed the title clang generates different code when targeting Windows and Linux clang generates different code when targeting Windows and Linux on Arm64 Apr 18, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2023

@llvm/issue-subscribers-backend-aarch64

@efriedma-quic
Copy link
Collaborator

Yes, the ABI rules are different on Windows vs. Linux. See https://reviews.llvm.org/D134688 . Why do you think clang is behaving incorrectly?

@mcfi
Copy link
Author

mcfi commented Apr 19, 2023

Thanks. Please see https://godbolt.org/z/hPEfhc1n7 also for MSVC code gen using the latest MSVC. The code generation has been the same for all Arm64 MSVC versions since 16.9

|vint4 float_as_int(vfloat4)| PROC      ; float_as_int
        ret

@efriedma-quic efriedma-quic changed the title clang generates different code when targeting Windows and Linux on Arm64 clang ABI mismatch with Arm64 MSVC (HVA rules) Apr 19, 2023
@efriedma-quic
Copy link
Collaborator

CC @rnk @dwblaikie

@EugeneZelenko EugeneZelenko added the ABI Application Binary Interface label Apr 19, 2023
@dwblaikie
Copy link
Collaborator

Is this reduced example still representative: https://godbolt.org/z/e79z3vq8j ?

I don't know enough about the intrinsics to do more here, probably - I wrote the non-intrinsic version to see whether that was relevant, but it looks like we produce matching code in that case. So I guess maybe this is about how the float32x4_t is passed, rather than how the vint4 is returned. I'd need to figure out how to read/write these values (ideally as directly/simply as possible) to further explore where the bug is, etc.

@zmodem got someone interested in this?

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Apr 19, 2023

Slightly modified testcase:

#include <arm_neon.h>
template<typename T> struct wrap {
#ifdef EXPLICIT_CTOR
    wrap(T a) { m = a; }
#endif
    T m;
    static wrap dowrap(T a, T b) { return wrap{b}; }
};
template wrap<int> wrap<int>::dowrap(int a, int b);
template wrap<double> wrap<double>::dowrap(double a, double b);
template wrap<int32x4_t> wrap<int32x4_t>::dowrap(int32x4_t a, int32x4_t b);
template wrap<int32x4x2_t> wrap<int32x4x2_t>::dowrap(int32x4x2_t a, int32x4x2_t b);

If EXPLICIT_CTOR is isn't defined, all the values are returned in registers. If EXPLICIT_CTOR is defined, only the classes with NEON member types are returned in registers.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 19, 2023

@llvm/issue-subscribers-clang-codegen

@dwblaikie
Copy link
Collaborator

https://godbolt.org/z/65vYW989f (godbolt of the last comment, #62223 (comment) )

Fascinating. Does the ABI spec/Windows/anyone have wording that covers this?

I guess the non-windows ARM ABI as implemented by clang at least passes all these cases in registers... so that's another thing entirely. Anyway.

So if I add another member to the struct - if that member is the same (another T), then the results are the same, but if it's an int, then the results tip back to clang/msvc agreenig and everything indirect. So I guess it's if all the members are neon this behavior needs to change? Wonder if it's some other more general/different rule.

I think MSVC published some things about their ABI somewhere - maybe there's some documentation out there about what rule we've missed?

@efriedma-quic
Copy link
Collaborator

The ABI is supposedly described at https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170#return-values . It doesn't distinguish between HFAs and HVAs, same as the ARM ABI document. The implementation just has additional undocumented rules, apparently.

@sigatrev
Copy link

This appears to be a bug in the initial implementation of the calling convention for ARM64 in MSVC, which at this point is a documentation bug instead.

The behavior for HVAs is what the documentation describes.
The behavior for HFAs differs from the current documentation when it comes to return values, specifically depending on whether or not the type is an aggregate by the C++14 standard definition.

I've opened a pull request to correct the documentation: MicrosoftDocs/cpp-docs#4527

@SamTebbs33
Copy link
Collaborator

It would be good to close this issue if it is a bug with MSVC rather than LLVM.

@dwblaikie
Copy link
Collaborator

It would be good to close this issue if it is a bug with MSVC rather than LLVM.

I don't think @sigatrev is suggesting this is an MSVC bug, as such, but that it /was/ a bug, but now it's part of the MSVC ABI (ie: they aren't likely to go back and fix it to match the documentation) - so they've filed a bug to update the documentation to match their current implementation.

So ultimately we'd end up needing to change clang to match the MSVC behavior - easier once it's documented, but either way (documented or not) we probably need to make a clang change to match MSVC.

Is that right @sigatrev ?

@sigatrev
Copy link

sigatrev commented May 9, 2023

That is exactly right @dwblaikie. The current behavior is not what we intended, but it is now very much part of the MSVC ARM64 ABI, and it is not going to be changing.

The documentation has been updated to reflect this behavior. The difference applies only to HFAs, and not to HVAs.

If there are additional questions about MSVC's handling in specific situations, feel free to ping me.

@SamTebbs33
Copy link
Collaborator

I see, thanks for clearing up my misunderstanding.

@omjavaid omjavaid self-assigned this May 26, 2023
@omjavaid
Copy link
Contributor

I am investigating this issue and I intend to start looking into a fix for this issue in few weeks time. If someone wants to fix it before I start feel free to assign it to yourself.
In any case I will notify once I start working on a fix.

@efriedma-quic
Copy link
Collaborator

Posted https://reviews.llvm.org/D153179 . Found #63360 while trying to make sure I understood all the relevant edge cases.

Chenyang-L pushed a commit to intel/llvm that referenced this issue Jul 11, 2023
MSVC normally has a bunch of restrictions on returning values directly
which don't apply to passing values directly.  (This roughly corresponds
to the definition of a C++14 aggregate.)  However, these restrictions
don't apply to HVAs; make sure we check for that.

Fixes llvm/llvm-project#62223

Differential Revision: https://reviews.llvm.org/D153179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface clang:codegen platform:windows
Projects
None yet
Development

No branches or pull requests

8 participants