-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Allow any linkage for dynlds #84742
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesSolves SWDEV-449592 Full diff: https://github.com/llvm/llvm-project/pull/84742.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
index b85cb26fdc9565..621a1fe20ed467 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
@@ -1398,8 +1398,9 @@ class AMDGPULowerModuleLDS {
LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
for (GlobalVariable *GV : Sorted) {
+ unsigned Size = DL.getTypeAllocSize(GV->getValueType());
OptimizedStructLayoutField F(GV,
- DL.getTypeAllocSize(GV->getValueType()),
+ Size ? Size : 1,
AMDGPU::getAlign(DL, GV));
LayoutFields.emplace_back(F);
}
diff --git a/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
new file mode 100644
index 00000000000000..db5ba1154acca8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+
+; Check that we don't crash when lowering an internal zero-sized LDS GV.
+; This specifically tests the LDS module lowering path, not the
+
+@var0 = internal addrspace(3) global [0 x float] poison, align 4
+
+define void @func() {
+; CHECK-LABEL: define void @func() {
+; CHECK-NEXT: [[IDX:%.*]] = getelementptr inbounds [0 x float], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1
+; CHECK-NEXT: [[V:%.*]] = load float, ptr addrspace(3) [[IDX]], align 4
+; CHECK-NEXT: ret void
+;
+ %idx = getelementptr inbounds [0 x float], ptr addrspace(3) @var0, i32 0, i32 1
+ %v = load float, ptr addrspace(3) %idx
+ ret void
+}
+
+define amdgpu_kernel void @kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @kernel() {
+; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
+; CHECK-NEXT: call void @func()
+; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4
+; CHECK-NEXT: ret void
+;
+ call void @func()
+ load float, ptr addrspace(3) @var0
+ ret void
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
It's not obvious to me that a zero sized type is necessarily the same thing as a dynamic LDS allocation. I don't think variable length arrays can be globals. Tried a few variations with clang and they were all rejected.. The default stance is usually that we're compiling something like C in which case the address of different objects need to be different. Mapping different things onto the same address is bad. The extern array[] thing from cuda gets special cased. I think the baseline is that an unsized static is a nonsense and we should reject it at sema, and fatal_error on one if it makes it to LDS lowering. If there are some useful semantics to give one then we need to work out what those semantics are before choosing a lowering. |
This isn't the case with 0 sized allocations, which I believe are all required to get the same address |
I think zero-sized allocations are an extension so they're not really required to do anything, especially for address spaces like LDS. Rejecting it should be fine I think. Internal zero-sized-types in LDS seem tricky to handle. When they're external we can probably justify giving them all the same address, but when they're internal it's expected that they're different objects, but then the question is: how do you support more than one variable-sized LDS variable? |
I don't think this is the case. I think all 0 sized allocations (within the same address space) should always get the same address regardless of the linkage |
ca66dda
to
7700e6c
Compare
I just added a fatal error for now while we figure this out. I think it's better to just avoid crashing in the meantime. |
If this appeared in the wild, making this an error is a lateral move. I don't think there's anything to figure out. Just ignore adding it to the OptimizedStructLayout, record there was a 0 sized object, and then handle it along with the external case |
The issue reporter said that this was a coding error, their issue wasn't about it not being handled but about the fact that the compiler crashed in such cases. Reporting an error here is good enough. It should ideally be reported in Clang but I'd need help from a more knowledgeable contributor to understand why Clang doesn't report this yet and where the fix should be |
If we're going to make this an error, it should be a proper diagnostic in the context instead of report_fatal_error. But we already handle this situation in the external case, so I don't see the difficulty in just extending that to ignore the linkage, which shouldn't be a factor |
@JonChesterfield Do you remember why dynlds needs external linkage? Can we remove that requirement? |
Dynamically allocated LDS that can't be accessed outside of the current module is probably a programmer or compiler error. I haven't been able to guess a reproducer that gets past sema. Cuda rejects it, as I think it should. "error: incomplete type is not allowed"
HIP is really fighting me on type annotations but ultimately raises a similar error, https://godbolt.org/z/5z35qx5bz "definition of variable with array type needs an explicit size or an initializer". Pierre pointed at a repro which has a division in the array definition but doesn't have any of the context. However, we've already implemented "zero size type means absolute symbol at end of allocated region" and left the question of whether the aliasing metadata reflects language semantics to the front end. Let's say "any zero sized addrspace(3) value gets an abs symbol pointing to the end of the non-zero sized addrspace(3) values. To the extent that this means anything for a given language, it's on the front end to annotate or diagnose. As far as the backend lowering goes, all these things alias anyway which seems independent of visibility. Which is to say, let's put static zero sized LDS variables at the same place as all other zero sized LDS variables, and it's on HIP to diagnose that if they want to. That meets Matt's belief that all zero sized objects go at the same place and my belief that zero sized objects are an extension anyway. |
7700e6c
to
dd3bdff
Compare
Solves SWDEV-449592
a52b1cf
to
5cacaad
Compare
Solves SWDEV-449592
Solves SWDEV-449592