[AMDGPU] Add module flag to control TBUFFER combining #156454

harrisonGPU · 2025-09-02T12:56:54Z

Some graphics APIs require stricter out-of-bounds (OOB) semantics that are
incompatible with TBUFFER instruction combining.

When vertex attribute robustness is enabled, unbound vertex attributes must
return (0,0,0,0) or (0,0,0,1). For typed buffers, if any component of a thread
is out of bounds, the entire thread is considered out of bounds and must
return zero. If multiple TBUFFER loads are merged into a wider load, a single
OOB component would incorrectly force all components to return zero.

This patch introduces a module flag, amdgpu.oob.mode, to control this:

bit 0x2 → RelaxedTBufferOOBMode (Enable TBUFFER combining)

Reference:
https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_vertex_attribute_robustness.html

arsenm

This is not a robust feature design, it is hacking out a singular transformation. You need to define a semantic meaning, have the default be conservatively correct

harrisonGPU · 2025-09-03T02:49:20Z

This is not a robust feature design, it is hacking out a singular transformation. You need to define a semantic meaning, have the default be conservatively correct

Hi, Matt, I don't know how to define a semantic meaning, could you please give me some suggestions?

arsenm · 2025-09-03T02:58:41Z

This is not a robust feature design, it is hacking out a singular transformation. You need to define a semantic meaning, have the default be conservatively correct

Hi, Matt, I don't know how to define a semantic meaning, could you please give me some suggestions?

Is this the same as relaxed-buffer-oob-mod?

harrisonGPU · 2025-09-03T04:22:40Z

This is not a robust feature design, it is hacking out a singular transformation. You need to define a semantic meaning, have the default be conservatively correct

Hi, Matt, I don't know how to define a semantic meaning, could you please give me some suggestions?

Is this the same as relaxed-buffer-oob-mod?

I think it is not same as relaxed-buffer-oob-mod, this attribute is aims to the Vulkan semantics for typed-buffers and unbound vertex attributes, where if any component is out of bounds, the entire result must return zero.So the attribute is not enabling relaxed OOB handling, but instead enforcing the stricter API-level semantics which prohibit TBUFFER combine.

arsenm · 2025-09-03T08:28:37Z

I think it is not same as relaxed-buffer-oob-mod, this attribute is aims to the Vulkan semantics for typed-buffers and unbound vertex attributes, where if any component is out of bounds, the entire result must return zero.So the attribute is not enabling relaxed OOB handling, but instead enforcing the stricter API-level semantics which prohibit TBUFFER combine.

I mean what you proposed sounds like the opposite of relaxed-buffer-oob-mod. i.e. you don't need something new, the pass is just bugged and needs to check if relaxed-buffer-oob-mod is enabled to perform the merge

piotrAMD · 2025-09-03T10:13:37Z

Agreed that we should be correct by default, so tbuffer merges should not happen unless a certain function attribute is set that allows it.

The use case is slightly different than relaxed-buffer-oob-mode though. In this case it is due to a recent Vulkan spec hardening extension https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_maintenance9.html that says: "Add a property to indicate the implementation will return (0,0,0,0) or (0,0,0,1) to vertex shaders that read unassigned attributes.".
We can't merge tbuffers in this case, because tbuffers are not checked by component, so we can't selectively return the oob value.

The driver property that drives this is different than the property that drives relaxed-buffer-oob-mode, so there should be two different mechanisms to avoid code pessimization.

Also note, the middle-end merging never happens for tbuffers, because they are intrinsincs all the way - at no point are they regular llvm loads.

Having said that, I agree si-load-store-optimizer should also be updated to check for relaxed-buffer-oob-mode to disallow buffer merging, but this is a separate change.

arsenm · 2025-09-03T10:16:06Z

Also note, the middle-end merging never happens for tbuffers, because they are intrinsincs all the way - at no point are they regular llvm loads.

We should fix that. We should not have any of this MI level merging

llvmbot · 2025-09-03T16:03:39Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

Some graphics features do not allow TBUFFER combine.

When vertex attribute robustness is enabled, unbound vertex attributes must
return (0,0,0,0) or (0,0,0,1).

For typed-buffers, if any component of a thread is out of bounds, the entire
thread is considered out of bounds and must return zero.
If we combine multiple TBUFFER instructions into a wider load, a single out-of-bounds
component would incorrectly cause all components to return zero.

Therefore, TBUFFER combine must be disabled in this case.

Reference:
https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_vertex_attribute_robustness.html

Full diff: https://github.com/llvm/llvm-project/pull/156454.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (+9)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+3)
(modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (+10)
(added) llvm/test/CodeGen/AMDGPU/tbuffer-combine-disable-attr.mir (+65)

diff --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
index 6f2ea8ad1ff01..740e574bf70c9 100644
--- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -62,6 +62,7 @@
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "SIDefines.h"
+#include "SIMachineFunctionInfo.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/InitializePasses.h"
@@ -2456,6 +2457,14 @@ SILoadStoreOptimizer::collectMergeableInsts(
         LLVM_DEBUG(dbgs() << "Skip tbuffer with unknown format: " << MI);
         continue;
       }
+
+      const MachineFunction *MF = MI.getParent()->getParent();
+      const auto *MFI = MF->getInfo<SIMachineFunctionInfo>();
+      if (MFI->isTBufferCombineDisabled()) {
+        LLVM_DEBUG(
+            dbgs() << "Skip TBUFFER combine: disabled by function attribute\n");
+        continue;
+      }
     }
 
     CombineInfo CI;
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index 8a1120321af9f..717517e83adfd 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -195,6 +195,9 @@ SIMachineFunctionInfo::SIMachineFunctionInfo(const Function &F,
     VGPRForAGPRCopy =
         AMDGPU::VGPR_32RegClass.getRegister(ST.getMaxNumVGPRs(F) - 1);
   }
+
+  if (F.hasFnAttribute("amdgpu-disable-tbuffer-combine"))
+    setDisableTBufferCombine(true);
 }
 
 MachineFunctionInfo *SIMachineFunctionInfo::clone(
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index ca8f8033a2d54..0e8a0b75c0491 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -293,6 +293,8 @@ struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {
   unsigned PSInputEnable = 0;
   unsigned MaxMemoryClusterDWords = DefaultMemoryClusterDWordsLimit;
 
+  bool DisableTBufferCombine = false;
+
   SIMode Mode;
   std::optional<FrameIndex> ScavengeFI;
   StringValue VGPRForAGPRCopy;
@@ -525,6 +527,9 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
   // scheduler stage.
   unsigned MaxMemoryClusterDWords = DefaultMemoryClusterDWordsLimit;
 
+  // Disable combining of TBUFFER instructions.
+  bool DisableTBufferCombine = false;
+
   MCPhysReg getNextUserSGPR() const;
 
   MCPhysReg getNextSystemSGPR() const;
@@ -1207,6 +1212,11 @@ class SIMachineFunctionInfo final : public AMDGPUMachineFunction,
   unsigned getMaxNumWorkGroupsX() const { return MaxNumWorkGroups[0]; }
   unsigned getMaxNumWorkGroupsY() const { return MaxNumWorkGroups[1]; }
   unsigned getMaxNumWorkGroupsZ() const { return MaxNumWorkGroups[2]; }
+
+  bool isTBufferCombineDisabled() const { return DisableTBufferCombine; }
+  void setDisableTBufferCombine(bool IsDisabled) {
+    DisableTBufferCombine = IsDisabled;
+  }
 };
 
 } // end namespace llvm
diff --git a/llvm/test/CodeGen/AMDGPU/tbuffer-combine-disable-attr.mir b/llvm/test/CodeGen/AMDGPU/tbuffer-combine-disable-attr.mir
new file mode 100644
index 0000000000000..cd119aacf7496
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/tbuffer-combine-disable-attr.mir
@@ -0,0 +1,65 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=si-load-store-opt -verify-machineinstrs %s -o - | FileCheck %s
+
+--- |
+  target triple = "amdgcn"
+
+  define float @disable-tbuffer-combine(<4 x i32> %vec, i32 %index) #0 {
+    %1 = call i32 @llvm.amdgcn.struct.tbuffer.load.i32(<4 x i32> %vec, i32 %index, i32 0, i32 0, i32 22, i32 0)
+    %2 = call i32 @llvm.amdgcn.struct.tbuffer.load.i32(<4 x i32> %vec, i32 %index, i32 4, i32 0, i32 22, i32 0)
+    %3 = call i32 @llvm.amdgcn.struct.tbuffer.load.i32(<4 x i32> %vec, i32 %index, i32 8, i32 0, i32 22, i32 0)
+    %4 = call i32 @llvm.amdgcn.struct.tbuffer.load.i32(<4 x i32> %vec, i32 %index, i32 12, i32 0, i32 22, i32 0)
+    %5 = bitcast i32 %1 to float
+    %6 = bitcast i32 %2 to float
+    %7 = bitcast i32 %3 to float
+    %8 = bitcast i32 %4 to float
+    %add = fadd float %5, %6
+    %mul = fmul float %7, %8
+    %res = fadd float %add, %mul
+    ret float %res
+  }
+
+  attributes #0 = {"amdgpu-disable-tbuffer-combine"}
+...
+---
+name:            disable-tbuffer-combine
+body:             |
+  bb.0 (%ir-block.0):
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
+
+    ; CHECK-LABEL: name: disable-tbuffer-combine
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr4
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr3
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY4]], %subreg.sub0, [[COPY3]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[COPY1]], %subreg.sub3
+    ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+    ; CHECK-NEXT: [[TBUFFER_LOAD_FORMAT_X_IDXEN:%[0-9]+]]:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN [[COPY]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    ; CHECK-NEXT: [[TBUFFER_LOAD_FORMAT_X_IDXEN1:%[0-9]+]]:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN [[COPY]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 4, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    ; CHECK-NEXT: [[TBUFFER_LOAD_FORMAT_X_IDXEN2:%[0-9]+]]:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN [[COPY]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 8, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    ; CHECK-NEXT: [[TBUFFER_LOAD_FORMAT_X_IDXEN3:%[0-9]+]]:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN [[COPY]], [[REG_SEQUENCE]], [[S_MOV_B32_]], 12, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    ; CHECK-NEXT: [[V_ADD_F32_e64_:%[0-9]+]]:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed [[TBUFFER_LOAD_FORMAT_X_IDXEN]], 0, killed [[TBUFFER_LOAD_FORMAT_X_IDXEN1]], 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[V_MUL_F32_e64_:%[0-9]+]]:vgpr_32 = nofpexcept V_MUL_F32_e64 0, killed [[TBUFFER_LOAD_FORMAT_X_IDXEN2]], 0, killed [[TBUFFER_LOAD_FORMAT_X_IDXEN3]], 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: [[V_ADD_F32_e64_1:%[0-9]+]]:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed [[V_ADD_F32_e64_]], 0, killed [[V_MUL_F32_e64_]], 0, 0, implicit $mode, implicit $exec
+    ; CHECK-NEXT: $vgpr0 = COPY [[V_ADD_F32_e64_1]]
+    ; CHECK-NEXT: SI_RETURN implicit $vgpr0
+    %12:vgpr_32 = COPY $vgpr4
+    %11:vgpr_32 = COPY $vgpr3
+    %10:vgpr_32 = COPY $vgpr2
+    %9:vgpr_32 = COPY $vgpr1
+    %8:vgpr_32 = COPY $vgpr0
+    %13:sgpr_128 = REG_SEQUENCE %8, %subreg.sub0, %9, %subreg.sub1, %10, %subreg.sub2, %11, %subreg.sub3
+    %14:sreg_32 = S_MOV_B32 0
+    %15:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN %12, %13, %14, 0, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    %16:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN %12, %13, %14, 4, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    %17:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN %12, %13, %14, 8, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    %18:vgpr_32 = TBUFFER_LOAD_FORMAT_X_IDXEN %12, %13, %14, 12, 22, 0, 0, implicit $exec :: (dereferenceable load (s32), align 1, addrspace 8)
+    %19:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed %15, 0, killed %16, 0, 0, implicit $mode, implicit $exec
+    %20:vgpr_32 = nofpexcept V_MUL_F32_e64 0, killed %17, 0, killed %18, 0, 0, implicit $mode, implicit $exec
+    %21:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed %19, 0, killed %20, 0, 0, implicit $mode, implicit $exec
+    $vgpr0 = COPY %21
+    SI_RETURN implicit $vgpr0
+...

krzysz00 · 2025-09-03T16:49:52Z

I'd also like to take the opportunity to flag that relaxed-buffer-oob-mode should either be a module-level or a function-level property ... and should actually have its default flipped (to strict-oob-mode on a module, for example - or maybe that'd be an HSA-specific thing)

krzysz00 · 2025-09-03T18:02:03Z

Also, should this be a function-level or module-level property? It feels module-level, in that your entire program either does or doesn't want these semantics?

piotrAMD · 2025-09-04T15:18:39Z

Also note, the middle-end merging never happens for tbuffers, because they are intrinsincs all the way - at no point are they regular llvm loads.

We should fix that. We should not have any of this MI level merging

Is the suggestion to add an extra ME pass that merges tbuffers? load-store-vectorizer works on the regular load/stores, so it is not easily possible to get that working with tbuffer representation which has extra information (format) on the instruction.

piotrAMD · 2025-09-04T15:20:21Z

Also, should this be a function-level or module-level property? It feels module-level, in that your entire program either does or doesn't want these semantics?

Yes, this is more like module-level property. But function attribute is a standard way to pass this kind of information from the front-end to the backend. Is there a better option?

piotrAMD · 2025-09-04T15:24:27Z

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

+  if (F.hasFnAttribute("amdgpu-disable-tbuffer-combine"))
+    setDisableTBufferCombine(true);


I think Matt's suggestion was that the conservative behaviour should be the default, so the tbuffer merging should be disabled by default, and only enabled when an attribute is set.

I think we should enable tbuffer combine by default, because its logic is generally correct, and only certain features disallow it. What do you think?

I see where you're coming from on this, and I had similar thoughts in #115479. But being always correct by default has some clear advantages. The price to pay for this is to always set a certain flag to enable relaxed mode.

And I have another question: if we disable tbuffer combine by default, will many lit tests need to be updated? Does that mean all the 8-bit, 16-bit, and 32-bit tests would need to change?

krzysz00 · 2025-09-04T16:29:26Z

Yes, this is more like module-level property. But function attribute is a standard way to pass this kind of information from the front-end to the backend. Is there a better option?

Module flags?

(Which, well, relaxed-oob-mode - or its inverse - should also be)

krzysz00 · 2025-09-04T16:30:30Z

Is the suggestion to add an extra ME pass that merges tbuffers? load-store-vectorizer works on the regular load/stores, so it is not easily possible to get that working with tbuffer representation which has extra information (format) on the instruction.

Yeah - and you might want the same for buffer_load/buffer_store more generally, in theory?

shiltian · 2025-09-04T16:37:39Z

Yes, this is more like module-level property. But function attribute is a standard way to pass this kind of information from the front-end to the backend. Is there a better option?

If this is a module level thingy, please just use module flag. We already have something "bad" that is supposed to be a module-level config but ends up being a function level attribute, which causes many trouble.

arsenm · 2025-09-05T01:42:22Z

Yes, this is more like module-level property. But function attribute is a standard way to pass this kind of information from the front-end to the backend. Is there a better option?

Making it module level turns it into ABI, which is bad. You should be able to perform 1 way linking from strict to relaxed use contexts

krzysz00 · 2025-09-05T22:44:37Z

Making it module level turns it into ABI, which is bad. You should be able to perform 1 way linking from strict to relaxed use contexts

ModFlagBehavior is a thing and would like you max-combine these things? That is, if you linked relaxed mode into strict mode you get strict mode.

krzysz00 · 2025-09-08T19:38:16Z

That is, I think it's possible - and correct - to make the strict behavior infectious when you do a module-to-module link at the LLVM IR level

At the ld level ... none of this matters, each .o will generate instructions according to the module flags it was generated with.

piotrAMD · 2025-09-10T10:23:17Z

It appears the only remaining point of contention is whether to use a module flag or a function attribute. I agree with Matt that making it module level could introduce new problems. I prefer using a function attribute, which is also more flexible if we decide to set it on a per-function basis in the future.

harrisonGPU · 2025-09-12T08:32:03Z

I’ve switched to a function attribute to control TBUFFER combining and renamed it to amdgpu-relaxed-tbuffer-oob-mod for clearer semantics. This attribute is false by default, so TBUFFER combining is disabled for all bit-widths unless explicitly enabled. What do you think?

jayfoad · 2025-09-12T11:07:48Z

This attribute is false by default, so TBUFFER combining is disabled for all bit-widths unless explicitly enabled.

I am not completely up to speed on this, but I thought combining dword loads into multi-dword loads was always OK, because bounds checking checks each dword individually. Hopefully @piotrAMD will correct me if I am wrong.

krzysz00 · 2025-09-12T14:34:24Z

Can I get more clarity on what the problem with a module-level flag is?

arsenm · 2025-09-15T05:21:10Z

I think the only issue with a module flag would be if you care about having different entry points in the same module with different behavior. If this is an attribute this needs more work to properly handling inlining with mixed modes

Also I'm still not clear how this differs from the the relaxed-oob-mode feature. Can these be merged into one? In any case I'd like to see those use identical mechanisms before introducing another knob

piotrAMD · 2025-09-15T16:06:24Z

This attribute is false by default, so TBUFFER combining is disabled for all bit-widths unless explicitly enabled.

I am not completely up to speed on this, but I thought combining dword loads into multi-dword loads was always OK, because bounds checking checks each dword individually. Hopefully @piotrAMD will correct me if I am wrong.

Typed buffer loads are not range-checked per dword, they are range-checked "all or nothing".

piotrAMD · 2025-09-15T16:21:07Z

Also I'm still not clear how this differs from the the relaxed-oob-mode feature. Can these be merged into one? In any case I'd like to see those use identical mechanisms before introducing another knob

This is similar to the existing relaxed-oob-mode feature in that it controls oob merging behaviour. The behavior for untyped buffer merging (in llvm-ir) and this tbuffer merging in mir are driven by different language features, so we need to be able to select them individually.
When I added relaxed-oob-mode I was not aware about the tbuffer problem that Harrison found.

I guess having a single mechanism like multistate function attribute/module flag would be ok, but there is this problem that the function that uses relaxed-oob-mode currently (AllowsMisalignedMemoryAccesses()) has no access to "Function", and I am not sure what the best solution for that one is.

krzysz00 · 2025-09-15T16:25:32Z

@piotrAMD More fundamental question: is the strictness/relaxed-ness something that's enabled per-compilation (that is, per-module) or, at the SPIR-V level, can in differ per-function ?

If you can't have functions with different strictness requirements in ore program ... both of these should be module flags, and AllowsMisalignedMemoryAccesses() should be changed to take a Module* (or a Function*) as appropriate.

harrisonGPU · 2025-09-20T10:18:10Z

Hi, I’ve switched to using a module flag to control RelaxedTBufferOOBMode. When the mask has bit 0x2 set, RelaxedTBufferOOBMode is enabled.

piotrAMD · 2025-09-26T16:40:58Z

Proposing a change to the existing oob mode (relaxed-buffer-oob-mode) to use module flag instead of subtarget feature - just a draft to canvas opinions: #160922.

harrisonGPU requested review from jayfoad, arsenm, perlfu, shiltian, ruiling, dstutt and piotrAMD September 2, 2025 12:56

harrisonGPU self-assigned this Sep 2, 2025

arsenm requested changes Sep 2, 2025

View reviewed changes

arsenm requested a review from krzysz00 September 3, 2025 02:56

harrisonGPU added the backend:AMDGPU label Sep 3, 2025

piotrAMD reviewed Sep 4, 2025

View reviewed changes

harrisonGPU changed the title ~~[AMDGPU] Add function attribute to disable TBUFFER combine~~ [AMDGPU] Add function attribute to control TBUFFER combining Sep 12, 2025

harrisonGPU requested review from arsenm and piotrAMD September 12, 2025 08:34

harrisonGPU force-pushed the amdgpu/tbuffer-att branch from 0c7a9b6 to 5476248 Compare September 12, 2025 09:30

harrisonGPU added 3 commits September 15, 2025 11:27

[AMDGPU] Add function attribute to disable TBUFFER combine

1a60070

[AMDGPU] Use amdgpu-relaxed-tbuffer-oob-mod

c3a256b

[AMDGPU] Fix lit test fail

e702821

harrisonGPU force-pushed the amdgpu/tbuffer-att branch from 5476248 to e702821 Compare September 15, 2025 03:29

[AMDGPU] Use module flag

7e493ca

harrisonGPU changed the title ~~[AMDGPU] Add function attribute to control TBUFFER combining~~ [AMDGPU] Add module flag to control TBUFFER combining Sep 20, 2025

		if (F.hasFnAttribute("amdgpu-disable-tbuffer-combine"))
		setDisableTBufferCombine(true);

[AMDGPU] Add module flag to control TBUFFER combining #156454

Are you sure you want to change the base?

[AMDGPU] Add module flag to control TBUFFER combining #156454

Conversation

harrisonGPU commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

harrisonGPU commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

harrisonGPU commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

piotrAMD commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

krzysz00 commented Sep 3, 2025

Uh oh!

krzysz00 commented Sep 3, 2025

Uh oh!

piotrAMD commented Sep 4, 2025

Uh oh!

piotrAMD commented Sep 4, 2025

Uh oh!

piotrAMD Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

harrisonGPU Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

piotrAMD Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

harrisonGPU Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Sep 4, 2025

Uh oh!

krzysz00 commented Sep 4, 2025

Uh oh!

shiltian commented Sep 4, 2025

Uh oh!

arsenm commented Sep 5, 2025

Uh oh!

krzysz00 commented Sep 5, 2025

Uh oh!

krzysz00 commented Sep 8, 2025

Uh oh!

piotrAMD commented Sep 10, 2025

Uh oh!

harrisonGPU commented Sep 12, 2025

Uh oh!

jayfoad commented Sep 12, 2025

Uh oh!

krzysz00 commented Sep 12, 2025

Uh oh!

arsenm commented Sep 15, 2025

Uh oh!

piotrAMD commented Sep 15, 2025

Uh oh!

piotrAMD commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 commented Sep 15, 2025

Uh oh!

harrisonGPU commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piotrAMD commented Sep 26, 2025

Uh oh!

Uh oh!

harrisonGPU commented Sep 2, 2025 •

edited

Loading

harrisonGPU commented Sep 3, 2025 •

edited

Loading

piotrAMD commented Sep 15, 2025 •

edited

Loading

harrisonGPU commented Sep 20, 2025 •

edited

Loading