Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][HIP] Fix record layout on Windows #87651

Merged
merged 1 commit into from
Apr 18, 2024
Merged

Conversation

yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented Apr 4, 2024

On windows, record layout should be consistent with host side, otherwise host code is not able to access fields of the record correctly.

Fixes: #51031

Fixes: SWDEV-446010

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 4, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 4, 2024

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

On windows, record layout should be consistent with host side, otherwise host code is not able to access fields of the record correctly.

Fixes: #51031

Fixes: SWDEV-446010


Full diff: https://github.com/llvm/llvm-project/pull/87651.diff

2 Files Affected:

  • (modified) clang/lib/AST/RecordLayoutBuilder.cpp (+5)
  • (added) clang/test/CodeGenCUDA/record-layout.cu (+65)
diff --git a/clang/lib/AST/RecordLayoutBuilder.cpp b/clang/lib/AST/RecordLayoutBuilder.cpp
index a3b7431f7ffd6d..d9bf62c2bbb04a 100644
--- a/clang/lib/AST/RecordLayoutBuilder.cpp
+++ b/clang/lib/AST/RecordLayoutBuilder.cpp
@@ -2458,6 +2458,11 @@ static bool mustSkipTailPadding(TargetCXXABI ABI, const CXXRecordDecl *RD) {
 }
 
 static bool isMsLayout(const ASTContext &Context) {
+  // Check if it's CUDA device compilation; ensure layout consistency with host.
+  if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice &&
+      Context.getAuxTargetInfo())
+    return Context.getAuxTargetInfo()->getCXXABI().isMicrosoft();
+
   return Context.getTargetInfo().getCXXABI().isMicrosoft();
 }
 
diff --git a/clang/test/CodeGenCUDA/record-layout.cu b/clang/test/CodeGenCUDA/record-layout.cu
new file mode 100644
index 00000000000000..8c8fc368b6b72f
--- /dev/null
+++ b/clang/test/CodeGenCUDA/record-layout.cu
@@ -0,0 +1,65 @@
+// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fdump-record-layouts \
+// RUN:   -emit-llvm -o %t -xhip %s 2>&1 | FileCheck %s --check-prefix=AST
+// RUN: cat %t | FileCheck %s
+// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -target-cpu gfx1100 \
+// RUN:   -emit-llvm -fdump-record-layouts -aux-triple x86_64-pc-windows-msvc \
+// RUN:   -o %t -xhip %s | FileCheck %s --check-prefix=AST
+// RUN: cat %t | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// AST: *** Dumping AST Record Layout
+// AST-LABEL:         0 | struct C
+// AST-NEXT:          0 |   struct A (base) (empty)
+// AST-NEXT:          1 |   struct B (base) (empty)
+// AST-NEXT:          4 |   int i
+// AST-NEXT:            | [sizeof=8, align=4,
+// AST-NEXT:            |  nvsize=8, nvalign=4]
+
+// CHECK: %struct.C = type { [4 x i8], i32 }
+
+struct A {};
+struct B {};
+struct C : A, B {
+    int i;
+};
+
+__device__ C c;
+__global__ void test_C(C c)
+{}
+ 
+// AST: *** Dumping AST Record Layout
+// AST-LABEL:          0 | struct I
+// AST-NEXT:           0 |   (I vftable pointer)
+// AST-NEXT:           8 |   int i
+// AST-NEXT:             | [sizeof=16, align=8,
+// AST-NEXT:             |  nvsize=16, nvalign=8]
+
+// AST: *** Dumping AST Record Layout
+// AST-LABEL:          0 | struct J
+// AST-NEXT:           0 |   struct I (primary base)
+// AST-NEXT:           0 |     (I vftable pointer)
+// AST-NEXT:           8 |     int i
+// AST-NEXT:          16 |   int j
+// AST-NEXT:             | [sizeof=24, align=8,
+// AST-NEXT:             |  nvsize=24, nvalign=8]
+
+// CHECK: %struct.J = type { %struct.I, i32 }
+// CHECK: %struct.I = type { ptr, i32 }
+
+struct I {
+    virtual void f() = 0;
+    int i;
+};
+struct J : I {
+    void f() override {}
+    int j;
+};
+
+__global__ void test_J(J j)
+{}
+
+void test(C c, J j) {
+  test_C<<<1, 1>>>(c);
+  test_J<<<1, 1>>>(j); 
+}

@Artem-B
Copy link
Member

Artem-B commented Apr 4, 2024

Keeping layout in sync makes sense to me, but I'm completely unfamiliar with the windows side.

@rnk is there anything else we need to worry about?

@rnk
Copy link
Collaborator

rnk commented Apr 4, 2024

Keeping layout in sync makes sense to me, but I'm completely unfamiliar with the windows side.

@rnk is there anything else we need to worry about?

I checked, and I think this routes everything over to the MS record layout builder, so it should be comprehensive:

if (isMsLayout(*this)) {

I would augment the test a bit, but otherwise this looks good to me.

@rnk
Copy link
Collaborator

rnk commented Apr 4, 2024

In general, having different C++ ABIs between the host and device seems like an ongoing source of tension and bugs.

@yxsamliu
Copy link
Collaborator Author

yxsamliu commented Apr 5, 2024

Keeping layout in sync makes sense to me, but I'm completely unfamiliar with the windows side.
@rnk is there anything else we need to worry about?

I checked, and I think this routes everything over to the MS record layout builder, so it should be comprehensive:

if (isMsLayout(*this)) {

I would augment the test a bit, but otherwise this looks good to me.

will add more tests about field access and virtual function calls

On windows, record layout should be consistent with
host side, otherwise host code is no able to access
fields of the record correctly.

Fixes: llvm#51031

Fixes: SWDEV-446010
Change-Id: Id590a7d3bc0b6fd0ea745cf2a049e1f89ae134fa
@yxsamliu
Copy link
Collaborator Author

In general, having different C++ ABIs between the host and device seems like an ongoing source of tension and bugs.

I agree. However completely switching to Microsoft ABI on device side does not work with existing device libraries since they assume Itanium mangling. Therefore I only changes record layout to be compatible with host, in the hope that the generated LLVM IR is correct for such a combination.

I added more tests about member accessing and virtual function calls. It seems the IR is correct. I think clang codegen is generic enough to handle Itanium ABI with Microsoft record layout.

@yxsamliu
Copy link
Collaborator Author

ping

It passes our internal Windows CI.

@yxsamliu yxsamliu merged commit 748ef7e into llvm:main Apr 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn if mismatching struct layout used on CUDA host / device boundary on Windows.
4 participants