Skip to content

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Oct 3, 2025

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161838.diff

1 Files Affected:

  • (modified) llvm/test/CodeGen/AMDGPU/wait-xcnt.mir (+40)
diff --git a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
index 6fe99d895c7bb..1b8e126f19ae1 100644
--- a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
+++ b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
@@ -945,6 +945,46 @@ body: |
     $vgpr0 = V_MOV_B32_e32 0, implicit $exec
 ...
 
+# FIXME: Missing S_WAIT_XCNT before overwriting vgpr0.
+---
+name: wait_kmcnt_with_outstanding_vmem_2
+tracksRegLiveness: true
+machineFunctionInfo:
+  isEntryFunction: true
+body: |
+  ; GCN-LABEL: name: wait_kmcnt_with_outstanding_vmem_2
+  ; GCN: bb.0:
+  ; GCN-NEXT:   successors: %bb.2(0x40000000), %bb.1(0x40000000)
+  ; GCN-NEXT:   liveins: $vgpr0_vgpr1, $sgpr0_sgpr1, $scc
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+  ; GCN-NEXT:   S_CBRANCH_SCC1 %bb.2, implicit $scc
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.1:
+  ; GCN-NEXT:   successors: %bb.2(0x80000000)
+  ; GCN-NEXT:   liveins: $vgpr0_vgpr1, $sgpr2
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.2:
+  ; GCN-NEXT:   liveins: $sgpr2
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   S_WAIT_KMCNT 0
+  ; GCN-NEXT:   $sgpr2 = S_MOV_B32 $sgpr2
+  ; GCN-NEXT:   $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+  bb.0:
+    liveins: $vgpr0_vgpr1, $sgpr0_sgpr1, $scc
+    $sgpr2 = S_LOAD_DWORD_IMM $sgpr0_sgpr1, 0, 0
+    S_CBRANCH_SCC1 %bb.2, implicit $scc
+  bb.1:
+    liveins: $vgpr0_vgpr1, $sgpr2
+    $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+  bb.2:
+    liveins: $sgpr2
+    $sgpr2 = S_MOV_B32 $sgpr2
+    $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+...
+
 ---
 name: wait_loadcnt_with_outstanding_smem
 tracksRegLiveness: true

@jayfoad jayfoad merged commit 0ce7cbd into llvm:main Oct 4, 2025
11 checks passed
@jayfoad jayfoad deleted the test-xcnt-merge branch October 4, 2025 06:18
bb.1:
liveins: $vgpr0_vgpr1, $sgpr2
$vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
bb.2:
Copy link
Contributor

@easyonaadit easyonaadit Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the hardware insert an implicit Xcnt between VMEM and SMEM which span basic blocks?
In this case, does the GLOBAL_LOAD imply an implicit XCNT for the S_LOAD has been inserted?
What about cases where control flow doesn't end up in bb.1? Should the compiler be conservative and insert an Xcnt at the start of a basic block with outstanding VMEM/SMEM events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardware doesn't know anything about basic blocks, and I think even branches don't have any effect on xcnt insertion.

Yes the compiler has to be conservative and assume that control flow may or may not have gone through bb.1. The way to handle this is for the merged state at bb.2 to have both SMEM_GROUP and VMEM_GROUP pending - even though a normal unmerged state would never have both event pending at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants