Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Allow any linkage for dynlds #84742

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

Pierre-vh
Copy link
Contributor

Solves SWDEV-449592

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Solves SWDEV-449592


Full diff: https://github.com/llvm/llvm-project/pull/84742.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp (+2-1)
  • (added) llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll (+31)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
index b85cb26fdc9565..621a1fe20ed467 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
@@ -1398,8 +1398,9 @@ class AMDGPULowerModuleLDS {
           LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
 
       for (GlobalVariable *GV : Sorted) {
+        unsigned Size = DL.getTypeAllocSize(GV->getValueType());
         OptimizedStructLayoutField F(GV,
-                                     DL.getTypeAllocSize(GV->getValueType()),
+                                     Size ? Size : 1,
                                      AMDGPU::getAlign(DL, GV));
         LayoutFields.emplace_back(F);
       }
diff --git a/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
new file mode 100644
index 00000000000000..db5ba1154acca8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lower-module-lds-zero-sized-type.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s | FileCheck %s
+
+; Check that we don't crash when lowering an internal zero-sized LDS GV.
+; This specifically tests the LDS module lowering path, not the
+
+@var0 = internal addrspace(3) global [0 x float] poison, align 4
+
+define void @func() {
+; CHECK-LABEL: define void @func() {
+; CHECK-NEXT:    [[IDX:%.*]] = getelementptr inbounds [0 x float], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1
+; CHECK-NEXT:    [[V:%.*]] = load float, ptr addrspace(3) [[IDX]], align 4
+; CHECK-NEXT:    ret void
+;
+  %idx = getelementptr inbounds [0 x float], ptr addrspace(3) @var0, i32 0, i32 1
+  %v = load float, ptr addrspace(3) %idx
+  ret void
+}
+
+define amdgpu_kernel void @kernel() {
+; CHECK-LABEL: define amdgpu_kernel void @kernel() {
+; CHECK-NEXT:    call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
+; CHECK-NEXT:    call void @func()
+; CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4
+; CHECK-NEXT:    ret void
+;
+  call void @func()
+  load float, ptr addrspace(3) @var0
+  ret void
+}

Copy link

github-actions bot commented Mar 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@JonChesterfield
Copy link
Collaborator

JonChesterfield commented Mar 12, 2024

It's not obvious to me that a zero sized type is necessarily the same thing as a dynamic LDS allocation.

I don't think variable length arrays can be globals. Tried a few variations with clang and they were all rejected..

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different. Mapping different things onto the same address is bad. The extern array[] thing from cuda gets special cased.

I think the baseline is that an unsized static is a nonsense and we should reject it at sema, and fatal_error on one if it makes it to LDS lowering.

If there are some useful semantics to give one then we need to work out what those semantics are before choosing a lowering.

@arsenm
Copy link
Contributor

arsenm commented Mar 12, 2024

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different.

This isn't the case with 0 sized allocations, which I believe are all required to get the same address

@Pierre-vh
Copy link
Contributor Author

The default stance is usually that we're compiling something like C in which case the address of different objects need to be different.

This isn't the case with 0 sized allocations, which I believe are all required to get the same address

I think zero-sized allocations are an extension so they're not really required to do anything, especially for address spaces like LDS. Rejecting it should be fine I think.
The original ticket just complained about the error (an assert failure) being cryptic. We could just fatal instead and if we have an actual request to support it with a good example later, we can revisit it.

Internal zero-sized-types in LDS seem tricky to handle. When they're external we can probably justify giving them all the same address, but when they're internal it's expected that they're different objects, but then the question is: how do you support more than one variable-sized LDS variable?

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2024

Internal zero-sized-types in LDS seem tricky to handle. When they're external we can probably justify giving them all the same address, but when they're internal it's expected that they're different objects,

I don't think this is the case. I think all 0 sized allocations (within the same address space) should always get the same address regardless of the linkage

@Pierre-vh Pierre-vh changed the title [AMDGPU][LowerModuleLDS] Handle non-dynlds zero sized type [AMDGPU][LowerModuleLDS] Avoid crash on zero-sized types Mar 20, 2024
@Pierre-vh
Copy link
Contributor Author

I just added a fatal error for now while we figure this out. I think it's better to just avoid crashing in the meantime.

@Pierre-vh Pierre-vh requested a review from arsenm March 20, 2024 09:43
@arsenm
Copy link
Contributor

arsenm commented Mar 20, 2024

I just added a fatal error for now while we figure this out. I think it's better to just avoid crashing in the meantime.

If this appeared in the wild, making this an error is a lateral move. I don't think there's anything to figure out. Just ignore adding it to the OptimizedStructLayout, record there was a 0 sized object, and then handle it along with the external case

@Pierre-vh
Copy link
Contributor Author

The issue reporter said that this was a coding error, their issue wasn't about it not being handled but about the fact that the compiler crashed in such cases. Reporting an error here is good enough. It should ideally be reported in Clang but I'd need help from a more knowledgeable contributor to understand why Clang doesn't report this yet and where the fix should be

@arsenm
Copy link
Contributor

arsenm commented Mar 26, 2024

The issue reporter said that this was a coding error, their issue wasn't about it not being handled but about the fact that the compiler crashed in such cases. Reporting an error here is good enough. It should ideally be reported in Clang but I'd need help from a more knowledgeable contributor to understand why Clang doesn't report this yet and where the fix should be

If we're going to make this an error, it should be a proper diagnostic in the context instead of report_fatal_error. But we already handle this situation in the external case, so I don't see the difficulty in just extending that to ignore the linkage, which shouldn't be a factor

@Pierre-vh
Copy link
Contributor Author

@JonChesterfield Do you remember why dynlds needs external linkage? Can we remove that requirement?

@JonChesterfield
Copy link
Collaborator

Dynamically allocated LDS that can't be accessed outside of the current module is probably a programmer or compiler error. I haven't been able to guess a reproducer that gets past sema.

Cuda rejects it, as I think it should. "error: incomplete type is not allowed"

__shared__
static
// extern 
double nonsense[];
__device__ double * wot(unsigned idx) {
    return &nonsense[idx];
}

HIP is really fighting me on type annotations but ultimately raises a similar error, https://godbolt.org/z/5z35qx5bz "definition of variable with array type needs an explicit size or an initializer". Pierre pointed at a repro which has a division in the array definition but doesn't have any of the context.

However, we've already implemented "zero size type means absolute symbol at end of allocated region" and left the question of whether the aliasing metadata reflects language semantics to the front end.

Let's say "any zero sized addrspace(3) value gets an abs symbol pointing to the end of the non-zero sized addrspace(3) values. To the extent that this means anything for a given language, it's on the front end to annotate or diagnose. As far as the backend lowering goes, all these things alias anyway which seems independent of visibility.

Which is to say, let's put static zero sized LDS variables at the same place as all other zero sized LDS variables, and it's on HIP to diagnose that if they want to. That meets Matt's belief that all zero sized objects go at the same place and my belief that zero sized objects are an extension anyway.

@Pierre-vh Pierre-vh changed the title [AMDGPU][LowerModuleLDS] Avoid crash on zero-sized types [AMDGPU] Allow any linkage for dynlds Apr 4, 2024
@Pierre-vh Pierre-vh requested a review from arsenm April 8, 2024 06:06
@Pierre-vh Pierre-vh merged commit 7c7704c into llvm:main Apr 19, 2024
3 of 4 checks passed
@Pierre-vh Pierre-vh deleted the fix-lowerlds-zerotype branch April 19, 2024 09:48
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants