[AMDGPU] Allow any linkage for dynlds #84742

Pierre-vh · 2024-03-11T10:48:43Z

Solves SWDEV-449592

llvmbot · 2024-03-11T10:49:17Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Solves SWDEV-449592

Full diff: https://github.com/llvm/llvm-project/pull/84742.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp (+2-1)
(added) llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll (+31)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
index b85cb26fdc9565..621a1fe20ed467 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
@@ -1398,8 +1398,9 @@ class AMDGPULowerModuleLDS {
           LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
 
       for (GlobalVariable *GV : Sorted) {
+        unsigned Size = DL.getTypeAllocSize(GV->getValueType());
         OptimizedStructLayoutField F(GV,
-                                     DL.getTypeAllocSize(GV->getValueType()),
+                                     Size ? Size : 1,
                                      AMDGPU::getAlign(DL, GV));
         LayoutFields.emplace_back(F);
       }
diff --git a/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
new file mode 100644
index 00000000000000..db5ba1154acca8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+
+; Check that we don't crash when lowering an internal zero-sized LDS GV.
+; This specifically tests the LDS module lowering path, not the
+
+@var0 = internal addrspace(3) global [0 x float] poison, align 4
+
+define void @func() {
+; CHECK-LABEL: define void @func() {
+; CHECK-NEXT:    [[IDX:%.*]] = getelementptr inbounds [0 x float], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1
+; CHECK-NEXT:    [[V:%.*]] = load float, ptr addrspace(3) [[IDX]], align 4
+; CHECK-NEXT:    ret void
+;
+  %idx = getelementptr inbounds [0 x float], ptr addrspace(3) @var0, i32 0, i32 1
+  %v = load float, ptr addrspace(3) %idx
+  ret void
+}
+
+define amdgpu_kernel void @kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @kernel() {
+; CHECK-NEXT:    call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
+; CHECK-NEXT:    call void @func()
+; CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4
+; CHECK-NEXT:    ret void
+;
+  call void @func()
+  load float, ptr addrspace(3) @var0
+  ret void
+}

github-actions · 2024-03-11T10:51:03Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

JonChesterfield · 2024-03-12T15:39:52Z

It's not obvious to me that a zero sized type is necessarily the same thing as a dynamic LDS allocation.

I don't think variable length arrays can be globals. Tried a few variations with clang and they were all rejected..

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different. Mapping different things onto the same address is bad. The extern array[] thing from cuda gets special cased.

I think the baseline is that an unsized static is a nonsense and we should reject it at sema, and fatal_error on one if it makes it to LDS lowering.

If there are some useful semantics to give one then we need to work out what those semantics are before choosing a lowering.

arsenm · 2024-03-12T16:41:15Z

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different.

This isn't the case with 0 sized allocations, which I believe are all required to get the same address

Pierre-vh · 2024-03-13T11:57:46Z

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different.

This isn't the case with 0 sized allocations, which I believe are all required to get the same address

I think zero-sized allocations are an extension so they're not really required to do anything, especially for address spaces like LDS. Rejecting it should be fine I think.
The original ticket just complained about the error (an assert failure) being cryptic. We could just fatal instead and if we have an actual request to support it with a good example later, we can revisit it.

Internal zero-sized-types in LDS seem tricky to handle. When they're external we can probably justify giving them all the same address, but when they're internal it's expected that they're different objects, but then the question is: how do you support more than one variable-sized LDS variable?

arsenm · 2024-03-14T04:49:28Z

Internal zero-sized-types in LDS seem tricky to handle. When they're external we can probably justify giving them all the same address, but when they're internal it's expected that they're different objects,

I don't think this is the case. I think all 0 sized allocations (within the same address space) should always get the same address regardless of the linkage

Pierre-vh · 2024-03-20T09:16:35Z

I just added a fatal error for now while we figure this out. I think it's better to just avoid crashing in the meantime.

arsenm · 2024-03-20T10:20:21Z

I just added a fatal error for now while we figure this out. I think it's better to just avoid crashing in the meantime.

If this appeared in the wild, making this an error is a lateral move. I don't think there's anything to figure out. Just ignore adding it to the OptimizedStructLayout, record there was a 0 sized object, and then handle it along with the external case

Pierre-vh · 2024-03-25T08:46:03Z

The issue reporter said that this was a coding error, their issue wasn't about it not being handled but about the fact that the compiler crashed in such cases. Reporting an error here is good enough. It should ideally be reported in Clang but I'd need help from a more knowledgeable contributor to understand why Clang doesn't report this yet and where the fix should be

arsenm · 2024-03-26T09:11:19Z

The issue reporter said that this was a coding error, their issue wasn't about it not being handled but about the fact that the compiler crashed in such cases. Reporting an error here is good enough. It should ideally be reported in Clang but I'd need help from a more knowledgeable contributor to understand why Clang doesn't report this yet and where the fix should be

If we're going to make this an error, it should be a proper diagnostic in the context instead of report_fatal_error. But we already handle this situation in the external case, so I don't see the difficulty in just extending that to ignore the linkage, which shouldn't be a factor

Pierre-vh · 2024-03-27T10:55:34Z

@JonChesterfield Do you remember why dynlds needs external linkage? Can we remove that requirement?

JonChesterfield · 2024-04-04T07:39:01Z

Dynamically allocated LDS that can't be accessed outside of the current module is probably a programmer or compiler error. I haven't been able to guess a reproducer that gets past sema.

Cuda rejects it, as I think it should. "error: incomplete type is not allowed"

__shared__
static
// extern 
double nonsense[];
__device__ double * wot(unsigned idx) {
    return &nonsense[idx];
}

HIP is really fighting me on type annotations but ultimately raises a similar error, https://godbolt.org/z/5z35qx5bz "definition of variable with array type needs an explicit size or an initializer". Pierre pointed at a repro which has a division in the array definition but doesn't have any of the context.

However, we've already implemented "zero size type means absolute symbol at end of allocated region" and left the question of whether the aliasing metadata reflects language semantics to the front end.

Let's say "any zero sized addrspace(3) value gets an abs symbol pointing to the end of the non-zero sized addrspace(3) values. To the extent that this means anything for a given language, it's on the front end to annotate or diagnose. As far as the backend lowering goes, all these things alias anyway which seems independent of visibility.

Which is to say, let's put static zero sized LDS variables at the same place as all other zero sized LDS variables, and it's on HIP to diagnose that if they want to. That meets Matt's belief that all zero sized objects go at the same place and my belief that zero sized objects are an extension anyway.

llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-size-arr.ll

Solves SWDEV-449592

Pierre-vh requested review from arsenm and JonChesterfield March 11, 2024 10:48

llvmbot added the backend:AMDGPU label Mar 11, 2024

arsenm reviewed Mar 11, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp Outdated Show resolved Hide resolved

Pierre-vh force-pushed the fix-lowerlds-zerotype branch from ca66dda to 7700e6c Compare March 20, 2024 09:10

Pierre-vh changed the title ~~[AMDGPU][LowerModuleLDS] Handle non-dynlds zero sized type~~ [AMDGPU][LowerModuleLDS] Avoid crash on zero-sized types Mar 20, 2024

Pierre-vh requested a review from arsenm March 20, 2024 09:43

Pierre-vh force-pushed the fix-lowerlds-zerotype branch from 7700e6c to dd3bdff Compare April 4, 2024 08:31

Pierre-vh changed the title ~~[AMDGPU][LowerModuleLDS] Avoid crash on zero-sized types~~ [AMDGPU] Allow any linkage for dynlds Apr 4, 2024

arsenm reviewed Apr 6, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-size-arr.ll Outdated Show resolved Hide resolved

Pierre-vh requested a review from arsenm April 8, 2024 06:06

arsenm approved these changes Apr 17, 2024

View reviewed changes

arsenm reviewed Apr 17, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-size-arr.ll Outdated Show resolved Hide resolved

Pierre-vh added 3 commits April 19, 2024 10:50

[AMDGPU] Allow any linkage for dynlds

763931b

Solves SWDEV-449592

undef -> poison

fa8974b

fix test

5cacaad

Pierre-vh force-pushed the fix-lowerlds-zerotype branch from a52b1cf to 5cacaad Compare April 19, 2024 08:50

Pierre-vh merged commit 7c7704c into llvm:main Apr 19, 2024
3 of 4 checks passed

Pierre-vh deleted the fix-lowerlds-zerotype branch April 19, 2024 09:48

aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024

[AMDGPU] Allow any linkage for dynlds (llvm#84742)

87afab4

Solves SWDEV-449592

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Allow any linkage for dynlds #84742

[AMDGPU] Allow any linkage for dynlds #84742

Pierre-vh commented Mar 11, 2024

llvmbot commented Mar 11, 2024

github-actions bot commented Mar 11, 2024 •

edited

JonChesterfield commented Mar 12, 2024 •

edited

arsenm commented Mar 12, 2024

Pierre-vh commented Mar 13, 2024

arsenm commented Mar 14, 2024

Pierre-vh commented Mar 20, 2024

arsenm commented Mar 20, 2024

Pierre-vh commented Mar 25, 2024

arsenm commented Mar 26, 2024

Pierre-vh commented Mar 27, 2024

JonChesterfield commented Apr 4, 2024

[AMDGPU] Allow any linkage for dynlds #84742

[AMDGPU] Allow any linkage for dynlds #84742

Conversation

Pierre-vh commented Mar 11, 2024

llvmbot commented Mar 11, 2024

github-actions bot commented Mar 11, 2024 • edited

JonChesterfield commented Mar 12, 2024 • edited

arsenm commented Mar 12, 2024

Pierre-vh commented Mar 13, 2024

arsenm commented Mar 14, 2024

Pierre-vh commented Mar 20, 2024

arsenm commented Mar 20, 2024

Pierre-vh commented Mar 25, 2024

arsenm commented Mar 26, 2024

Pierre-vh commented Mar 27, 2024

JonChesterfield commented Apr 4, 2024

github-actions bot commented Mar 11, 2024 •

edited

JonChesterfield commented Mar 12, 2024 •

edited