Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Check MIR after SIMemoryLegalizer instead of final ISA #90601

Closed
wants to merge 1 commit into from

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Apr 30, 2024

Since #72830 the memory legalizer tests have not shown s_waitcnt
instructions inserted by SIMemoryLegalizer because they have mostly been
removed by SIInsertWaitcnts.

Checking the MIR immediately after SIMemoryLegalizer runs fixes this so
you can see exactly what the pass has inserted.

Since llvm#72830 the memory legalizer tests have not shown s_waitcnt
instructions inserted by SIMemoryLegalizer because they have mostly been
removed by SIInsertWaitcnts.

Checking the MIR immediately after SIMemoryLegalizer runs fixes this so
you can see exactly what the pass has inserted.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 30, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Since #72830 the memory legalizer tests have not shown s_waitcnt
instructions inserted by SIMemoryLegalizer because they have mostly been
removed by SIInsertWaitcnts.

Checking the MIR immediately after SIMemoryLegalizer runs fixes this so
you can see exactly what the pass has inserted.


Patch is 29.32 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90601.diff

27 Files Affected:

  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll (+2824-2324)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll (+16777-14466)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-lastuse.ll (+48-47)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll (+804-806)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-singlethread.ll (+13169-12705)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll (+17029-14718)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll (+691-649)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-wavefront.ll (+13014-12550)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-workgroup.ll (+14373-12888)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll (+17099-15030)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-lastuse.ll (+42-47)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll (+766-789)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll (+13610-13834)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll (+16535-14620)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-volatile.ll (+714-693)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll (+13610-13834)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll (+15232-14206)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-agent.ll (+14316-12924)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll (+803-853)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-singlethread.ll (+13202-12578)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-system.ll (+14316-12924)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-volatile.ll (+682-665)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-wavefront.ll (+13202-12578)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-local-workgroup.ll (+14316-12924)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-lastuse.ll (+46-54)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll (+940-973)
  • (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-volatile.ll (+590-580)
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
index e13542f61474e2..a169753f532b57 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
@@ -1,2563 +1,3063 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx600 < %s | FileCheck --check-prefixes=GFX6 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck --check-prefixes=GFX7 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s | FileCheck --check-prefixes=GFX10-WGP %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX10-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx700 -amdgcn-skip-cache-invalidations < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+tgsplit < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -mattr=+tgsplit < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 < %s | FileCheck --check-prefixes=GFX11-WGP %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx600 < %s | FileCheck --check-prefixes=GFX6 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx700 < %s | FileCheck --check-prefixes=GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1010 < %s | FileCheck --check-prefixes=GFX10-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1010 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX10-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -stop-after=si-memory-legalizer -mcpu=gfx700 -amdgcn-skip-cache-invalidations < %s | FileCheck --check-prefixes=SKIP-CACHE-INV %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX90A-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx90a -mattr=+tgsplit < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx940 < %s | FileCheck -check-prefixes=GFX940-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx940 -mattr=+tgsplit < %s | FileCheck -check-prefixes=GFX940-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1100 < %s | FileCheck --check-prefixes=GFX11-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1100 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX11-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=si-memory-legalizer -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
 
 define amdgpu_kernel void @singlethread_acquire_fence() {
-; GFX6-LABEL: singlethread_acquire_fence:
-; GFX6:       ; %bb.0: ; %entry
-; GFX6-NEXT:    s_endpgm
-;
-; GFX7-LABEL: singlethread_acquire_fence:
-; GFX7:       ; %bb.0: ; %entry
-; GFX7-NEXT:    s_endpgm
-;
-; GFX10-WGP-LABEL: singlethread_acquire_fence:
-; GFX10-WGP:       ; %bb.0: ; %entry
-; GFX10-WGP-NEXT:    s_endpgm
-;
-; GFX10-CU-LABEL: singlethread_acquire_fence:
-; GFX10-CU:       ; %bb.0: ; %entry
-; GFX10-CU-NEXT:    s_endpgm
-;
-; SKIP-CACHE-INV-LABEL: singlethread_acquire_fence:
-; SKIP-CACHE-INV:       ; %bb.0: ; %entry
-; SKIP-CACHE-INV-NEXT:    s_endpgm
-;
-; GFX90A-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
-; GFX90A-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX90A-TGSPLIT-LABEL: singlethread_acquire_fence:
-; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
-; GFX940-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX940-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-TGSPLIT-LABEL: singlethread_acquire_fence:
-; GFX940-TGSPLIT:       ; %bb.0: ; %entry
-; GFX940-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX11-WGP-LABEL: singlethread_acquire_fence:
-; GFX11-WGP:       ; %bb.0: ; %entry
-; GFX11-WGP-NEXT:    s_endpgm
-;
-; GFX11-CU-LABEL: singlethread_acquire_fence:
-; GFX11-CU:       ; %bb.0: ; %entry
-; GFX11-CU-NEXT:    s_endpgm
-;
-; GFX12-WGP-LABEL: singlethread_acquire_fence:
-; GFX12-WGP:       ; %bb.0: ; %entry
-; GFX12-WGP-NEXT:    s_endpgm
-;
-; GFX12-CU-LABEL: singlethread_acquire_fence:
-; GFX12-CU:       ; %bb.0: ; %entry
-; GFX12-CU-NEXT:    s_endpgm
+  ; GFX6-LABEL: name: singlethread_acquire_fence
+  ; GFX6: bb.0.entry:
+  ; GFX6-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX7-LABEL: name: singlethread_acquire_fence
+  ; GFX7: bb.0.entry:
+  ; GFX7-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-WGP-LABEL: name: singlethread_acquire_fence
+  ; GFX10-WGP: bb.0.entry:
+  ; GFX10-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-CU-LABEL: name: singlethread_acquire_fence
+  ; GFX10-CU: bb.0.entry:
+  ; GFX10-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; SKIP-CACHE-INV-LABEL: name: singlethread_acquire_fence
+  ; SKIP-CACHE-INV: bb.0.entry:
+  ; SKIP-CACHE-INV-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-NOTTGSPLIT-LABEL: name: singlethread_acquire_fence
+  ; GFX90A-NOTTGSPLIT: bb.0.entry:
+  ; GFX90A-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-TGSPLIT-LABEL: name: singlethread_acquire_fence
+  ; GFX90A-TGSPLIT: bb.0.entry:
+  ; GFX90A-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-NOTTGSPLIT-LABEL: name: singlethread_acquire_fence
+  ; GFX940-NOTTGSPLIT: bb.0.entry:
+  ; GFX940-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-TGSPLIT-LABEL: name: singlethread_acquire_fence
+  ; GFX940-TGSPLIT: bb.0.entry:
+  ; GFX940-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-WGP-LABEL: name: singlethread_acquire_fence
+  ; GFX11-WGP: bb.0.entry:
+  ; GFX11-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-CU-LABEL: name: singlethread_acquire_fence
+  ; GFX11-CU: bb.0.entry:
+  ; GFX11-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-WGP-LABEL: name: singlethread_acquire_fence
+  ; GFX12-WGP: bb.0.entry:
+  ; GFX12-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-CU-LABEL: name: singlethread_acquire_fence
+  ; GFX12-CU: bb.0.entry:
+  ; GFX12-CU-NEXT:   S_ENDPGM 0
 entry:
   fence syncscope("singlethread") acquire
   ret void
 }
 
 define amdgpu_kernel void @singlethread_release_fence() {
-; GFX6-LABEL: singlethread_release_fence:
-; GFX6:       ; %bb.0: ; %entry
-; GFX6-NEXT:    s_endpgm
-;
-; GFX7-LABEL: singlethread_release_fence:
-; GFX7:       ; %bb.0: ; %entry
-; GFX7-NEXT:    s_endpgm
-;
-; GFX10-WGP-LABEL: singlethread_release_fence:
-; GFX10-WGP:       ; %bb.0: ; %entry
-; GFX10-WGP-NEXT:    s_endpgm
-;
-; GFX10-CU-LABEL: singlethread_release_fence:
-; GFX10-CU:       ; %bb.0: ; %entry
-; GFX10-CU-NEXT:    s_endpgm
-;
-; SKIP-CACHE-INV-LABEL: singlethread_release_fence:
-; SKIP-CACHE-INV:       ; %bb.0: ; %entry
-; SKIP-CACHE-INV-NEXT:    s_endpgm
-;
-; GFX90A-NOTTGSPLIT-LABEL: singlethread_release_fence:
-; GFX90A-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX90A-TGSPLIT-LABEL: singlethread_release_fence:
-; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-NOTTGSPLIT-LABEL: singlethread_release_fence:
-; GFX940-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX940-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-TGSPLIT-LABEL: singlethread_release_fence:
-; GFX940-TGSPLIT:       ; %bb.0: ; %entry
-; GFX940-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX11-WGP-LABEL: singlethread_release_fence:
-; GFX11-WGP:       ; %bb.0: ; %entry
-; GFX11-WGP-NEXT:    s_endpgm
-;
-; GFX11-CU-LABEL: singlethread_release_fence:
-; GFX11-CU:       ; %bb.0: ; %entry
-; GFX11-CU-NEXT:    s_endpgm
-;
-; GFX12-WGP-LABEL: singlethread_release_fence:
-; GFX12-WGP:       ; %bb.0: ; %entry
-; GFX12-WGP-NEXT:    s_endpgm
-;
-; GFX12-CU-LABEL: singlethread_release_fence:
-; GFX12-CU:       ; %bb.0: ; %entry
-; GFX12-CU-NEXT:    s_endpgm
+  ; GFX6-LABEL: name: singlethread_release_fence
+  ; GFX6: bb.0.entry:
+  ; GFX6-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX7-LABEL: name: singlethread_release_fence
+  ; GFX7: bb.0.entry:
+  ; GFX7-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-WGP-LABEL: name: singlethread_release_fence
+  ; GFX10-WGP: bb.0.entry:
+  ; GFX10-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-CU-LABEL: name: singlethread_release_fence
+  ; GFX10-CU: bb.0.entry:
+  ; GFX10-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; SKIP-CACHE-INV-LABEL: name: singlethread_release_fence
+  ; SKIP-CACHE-INV: bb.0.entry:
+  ; SKIP-CACHE-INV-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-NOTTGSPLIT-LABEL: name: singlethread_release_fence
+  ; GFX90A-NOTTGSPLIT: bb.0.entry:
+  ; GFX90A-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-TGSPLIT-LABEL: name: singlethread_release_fence
+  ; GFX90A-TGSPLIT: bb.0.entry:
+  ; GFX90A-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-NOTTGSPLIT-LABEL: name: singlethread_release_fence
+  ; GFX940-NOTTGSPLIT: bb.0.entry:
+  ; GFX940-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-TGSPLIT-LABEL: name: singlethread_release_fence
+  ; GFX940-TGSPLIT: bb.0.entry:
+  ; GFX940-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-WGP-LABEL: name: singlethread_release_fence
+  ; GFX11-WGP: bb.0.entry:
+  ; GFX11-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-CU-LABEL: name: singlethread_release_fence
+  ; GFX11-CU: bb.0.entry:
+  ; GFX11-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-WGP-LABEL: name: singlethread_release_fence
+  ; GFX12-WGP: bb.0.entry:
+  ; GFX12-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-CU-LABEL: name: singlethread_release_fence
+  ; GFX12-CU: bb.0.entry:
+  ; GFX12-CU-NEXT:   S_ENDPGM 0
 entry:
   fence syncscope("singlethread") release
   ret void
 }
 
 define amdgpu_kernel void @singlethread_acq_rel_fence() {
-; GFX6-LABEL: singlethread_acq_rel_fence:
-; GFX6:       ; %bb.0: ; %entry
-; GFX6-NEXT:    s_endpgm
-;
-; GFX7-LABEL: singlethread_acq_rel_fence:
-; GFX7:       ; %bb.0: ; %entry
-; GFX7-NEXT:    s_endpgm
-;
-; GFX10-WGP-LABEL: singlethread_acq_rel_fence:
-; GFX10-WGP:       ; %bb.0: ; %entry
-; GFX10-WGP-NEXT:    s_endpgm
-;
-; GFX10-CU-LABEL: singlethread_acq_rel_fence:
-; GFX10-CU:       ; %bb.0: ; %entry
-; GFX10-CU-NEXT:    s_endpgm
-;
-; SKIP-CACHE-INV-LABEL: singlethread_acq_rel_fence:
-; SKIP-CACHE-INV:       ; %bb.0: ; %entry
-; SKIP-CACHE-INV-NEXT:    s_endpgm
-;
-; GFX90A-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
-; GFX90A-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX90A-TGSPLIT-LABEL: singlethread_acq_rel_fence:
-; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
-; GFX940-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX940-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-TGSPLIT-LABEL: singlethread_acq_rel_fence:
-; GFX940-TGSPLIT:       ; %bb.0: ; %entry
-; GFX940-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX11-WGP-LABEL: singlethread_acq_rel_fence:
-; GFX11-WGP:       ; %bb.0: ; %entry
-; GFX11-WGP-NEXT:    s_endpgm
-;
-; GFX11-CU-LABEL: singlethread_acq_rel_fence:
-; GFX11-CU:       ; %bb.0: ; %entry
-; GFX11-CU-NEXT:    s_endpgm
-;
-; GFX12-WGP-LABEL: singlethread_acq_rel_fence:
-; GFX12-WGP:       ; %bb.0: ; %entry
-; GFX12-WGP-NEXT:    s_endpgm
-;
-; GFX12-CU-LABEL: singlethread_acq_rel_fence:
-; GFX12-CU:       ; %bb.0: ; %entry
-; GFX12-CU-NEXT:    s_endpgm
+  ; GFX6-LABEL: name: singlethread_acq_rel_fence
+  ; GFX6: bb.0.entry:
+  ; GFX6-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX7-LABEL: name: singlethread_acq_rel_fence
+  ; GFX7: bb.0.entry:
+  ; GFX7-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-WGP-LABEL: name: singlethread_acq_rel_fence
+  ; GFX10-WGP: bb.0.entry:
+  ; GFX10-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-CU-LABEL: name: singlethread_acq_rel_fence
+  ; GFX10-CU: bb.0.entry:
+  ; GFX10-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; SKIP-CACHE-INV-LABEL: name: singlethread_acq_rel_fence
+  ; SKIP-CACHE-INV: bb.0.entry:
+  ; SKIP-CACHE-INV-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-NOTTGSPLIT-LABEL: name: singlethread_acq_rel_fence
+  ; GFX90A-NOTTGSPLIT: bb.0.entry:
+  ; GFX90A-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-TGSPLIT-LABEL: name: singlethread_acq_rel_fence
+  ; GFX90A-TGSPLIT: bb.0.entry:
+  ; GFX90A-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-NOTTGSPLIT-LABEL: name: singlethread_acq_rel_fence
+  ; GFX940-NOTTGSPLIT: bb.0.entry:
+  ; GFX940-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-TGSPLIT-LABEL: name: singlethread_acq_rel_fence
+  ; GFX940-TGSPLIT: bb.0.entry:
+  ; GFX940-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-WGP-LABEL: name: singlethread_acq_rel_fence
+  ; GFX11-WGP: bb.0.entry:
+  ; GFX11-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-CU-LABEL: name: singlethread_acq_rel_fence
+  ; GFX11-CU: bb.0.entry:
+  ; GFX11-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-WGP-LABEL: name: singlethread_acq_rel_fence
+  ; GFX12-WGP: bb.0.entry:
+  ; GFX12-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-CU-LABEL: name: singlethread_acq_rel_fence
+  ; GFX12-CU: bb.0.entry:
+  ; GFX12-CU-NEXT:   S_ENDPGM 0
 entry:
   fence syncscope("singlethread") acq_rel
   ret void
 }
 
 define amdgpu_kernel void @singlethread_seq_cst_fence() {
-; GFX6-LABEL: singlethread_seq_cst_fence:
-; GFX6:       ; %bb.0: ; %entry
-; GFX6-NEXT:    s_endpgm
-;
-; GFX7-LABEL: singlethread_seq_cst_fence:
-; GFX7:       ; %bb.0: ; %entry
-; GFX7-NEXT:    s_endpgm
-;
-; GFX10-WGP-LABEL: singlethread_seq_cst_fence:
-; GFX10-WGP:       ; %bb.0: ; %entry
-; GFX10-WGP-NEXT:    s_endpgm
-;
-; GFX10-CU-LABEL: singlethread_seq_cst_fence:
-; GFX10-CU:       ; %bb.0: ; %entry
-; GFX10-CU-NEXT:    s_endpgm
-;
-; SKIP-CACHE-INV-LABEL: singlethread_seq_cst_fence:
-; SKIP-CACHE-INV:       ; %bb.0: ; %entry
-; SKIP-CACHE-INV-NEXT:    s_endpgm
-;
-; GFX90A-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
-; GFX90A-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX90A-TGSPLIT-LABEL: singlethread_seq_cst_fence:
-; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
-; GFX940-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX940-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-TGSPLIT-LABEL: singlethread_seq_cst_fence:
-; GFX940-TGSPLIT:       ; %bb.0: ; %entry
-; GFX940-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX11-WGP-LABEL: singlethread_seq_cst_fence:
-; GFX11-WGP:       ; %bb.0: ; %entry
-; GFX11-WGP-NEXT:    s_endpgm
-;
-; GFX11-CU-LABEL: singlethread_seq_cst_fence:
-; GFX11-CU:       ; %bb.0: ; %entry
-; GFX11-CU-NEXT:    s_endpgm
-;
-; GFX12-WGP-LABEL: singlethread_seq_cst_fence:
-; GFX12-WGP:       ; %bb.0: ; %entry
-; GFX12-WGP-NEXT:    s_endpgm
-;
-; GFX12-CU-LABEL: singlethread_seq_cst_fence:
-; GFX12-CU:       ; %bb.0: ; %entry
-; GFX12-CU-NEXT:    s_endpgm
+  ; GFX6-LABEL: name: singlethread_seq_cst_fence
+  ; GFX6: bb.0.entry:
+  ; GFX6-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX7-LABEL: name: singlethread_seq_cst_fence
+  ; GFX7: bb.0.entry:
+  ; GFX7-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-WGP-LABEL: name: singlethread_seq_cst_fence
+  ; GFX10-WGP: bb.0.entry:
+  ; GFX10-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-CU-LABEL: name: singlethread_seq_cst_fence
+  ; GFX10-CU: bb.0.entry:
+  ; GFX10-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; SKIP-CACHE-INV-LABEL: name: singlethread_seq_cst_fence
+  ; SKIP-CACHE-INV: bb.0.entry:
+  ; SKIP-CACHE-INV-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-NOTTGSPLIT-LABEL: name: singlethread_seq_cst_fence
+  ; GFX90A-NOTTGSPLIT: bb.0.entry:
+  ; GFX90A-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-TGSPLIT-LABEL: name: singlethread_seq_cst_fence
+  ; GFX90A-TGSPLIT: bb.0.entry:
+  ; GFX90A-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-NOTTGSPLIT-LABEL: name: singlethread_seq_cst_fence
+  ; GFX940-NOTTGSPLIT: bb.0.entry:
+  ; GFX940-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-TGSPLIT-LABEL: name: singlethread_seq_cst_fence
+  ; GFX940-TGSPLIT: bb.0.entry:
+  ; GFX940-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-WGP-LABEL: name: singlethread_seq_cst_fence
+  ; GFX11-WGP: bb.0.entry:
+  ; GFX11-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-CU-LABEL: name: singlethread_seq_cst_fence
+  ; GFX11-CU: bb.0.entry:
+  ; GFX11-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-WGP-LABEL: name: singlethread_seq_cst_fence
+  ; GFX12-WGP: bb.0.entry:
+  ; GFX12-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-CU-LABEL: name: singlethread_seq_cst_fence
+  ; GFX12-CU: bb.0.entry:
+  ; GFX12-CU-NEXT:   S_ENDPGM 0
 entry:
   fence syncscope("singlethread") seq_cst
   ret void
 }
 
 define amdgpu_kernel void @singlethread_one_as_acquire_fence() {
-; GFX6-LABEL: singlethread_one_as_acquire_fence:
-; GFX6:       ; %bb.0: ; %entry
-; GFX6-NEXT:    s_endpgm
-;
-; GFX7-LABEL: singlethread_one_as_acquire_fence:
-; GFX7:       ; %bb.0: ; %entry
-; GFX7-NEXT:    s_endpgm
-;
-; GFX10-WGP-LABEL: singlethread_one_as_acquire_fence:
-; GFX10-WGP:       ; %bb.0: ; %entry
-; GFX10-WGP-NEXT:    s_endpgm
-;
-; GFX10-CU-LABEL: singlethread_one_as_acquire_fence:
-; GFX10-CU:       ; %bb.0: ; %entry
-; GFX10-CU-NEXT:    s_endpgm
-;
-; SKIP-CACHE-INV-LABEL: singlethread_one_as_acquire_fence:
-; SKIP-CACHE-INV:       ; %bb.0: ; %entry
-; SKIP-CACHE-INV-NEXT:    s_endpgm
-;
-; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
-; GFX90A-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
-; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
-; GFX90A-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
-; GFX940-NOTTGSPLIT:       ; %bb.0: ; %entry
-; GFX940-NOTTGSPLIT-NEXT:    s_endpgm
-;
-; GFX940-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
-; GFX940-TGSPLIT:       ; %bb.0: ; %entry
-; GFX940-TGSPLIT-NEXT:    s_endpgm
-;
-; GFX11-WGP-LABEL: singlethread_one_as_acquire_fence:
-; GFX11-WGP:       ; %bb.0: ; %entry
-; GFX11-WGP-NEXT:    s_endpgm
-;
-; GFX11-CU-LABEL: singlethread_one_as_acquire_fence:
-; GFX11-CU:       ; %bb.0: ; %entry
-; GFX11-CU-NEXT:    s_endpgm
-;
-; GFX12-WGP-LABEL: singlethread_one_as_acquire_fence:
-; GFX12-WGP:       ; %bb.0: ; %entry
-; GFX12-WGP-NEXT:    s_endpgm
-;
-; GFX12-CU-LABEL: singlethread_one_as_acquire_fence:
-; GFX12-CU:       ; %bb.0: ; %entry
-; GFX12-CU-NEXT:    s_endpgm
+  ; GFX6-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX6: bb.0.entry:
+  ; GFX6-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX7-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX7: bb.0.entry:
+  ; GFX7-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-WGP-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX10-WGP: bb.0.entry:
+  ; GFX10-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX10-CU-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX10-CU: bb.0.entry:
+  ; GFX10-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; SKIP-CACHE-INV-LABEL: name: singlethread_one_as_acquire_fence
+  ; SKIP-CACHE-INV: bb.0.entry:
+  ; SKIP-CACHE-INV-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-NOTTGSPLIT-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX90A-NOTTGSPLIT: bb.0.entry:
+  ; GFX90A-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX90A-TGSPLIT-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX90A-TGSPLIT: bb.0.entry:
+  ; GFX90A-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-NOTTGSPLIT-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX940-NOTTGSPLIT: bb.0.entry:
+  ; GFX940-NOTTGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX940-TGSPLIT-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX940-TGSPLIT: bb.0.entry:
+  ; GFX940-TGSPLIT-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-WGP-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX11-WGP: bb.0.entry:
+  ; GFX11-WGP-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX11-CU-LABEL: name: singlethread_one_as_acquire_fence
+  ; GFX11-CU: bb.0.entry:
+  ; GFX11-CU-NEXT:   S_ENDPGM 0
+  ;
+  ; GFX12-WGP-...
[truncated]

; GFX12-CU-NEXT: s_endpgm
; GFX6-LABEL: name: workgroup_acquire_fence
; GFX6: bb.0.entry:
; GFX6-NEXT: S_WAITCNT_soft 127
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example here of where all the waitcnts were removed.

@jayfoad jayfoad requested a review from rampitec April 30, 2024 12:56
@rampitec
Copy link
Collaborator

I am not a fan of the idea to convert all memory legalizer tests to mir. Maybe you can add some specific mir tests instead?

@jayfoad
Copy link
Contributor Author

jayfoad commented Apr 30, 2024

I am not a fan of the idea to convert all memory legalizer tests to mir. Maybe you can add some specific mir tests instead?

Well all of memory-legalizer-fence.ll seems pretty useless at the moment since it doesn't show any of the waitcnts that were inserted. (OK it does still show cache invalidate instructions.)

As an alternative, how about changing SIInsertWaitcnts so that it does not remove redundant waitcnts at -O0 and using that for all the memory legalizer tests?

@jayfoad jayfoad requested a review from arsenm April 30, 2024 19:33
@rampitec
Copy link
Collaborator

I am not a fan of the idea to convert all memory legalizer tests to mir. Maybe you can add some specific mir tests instead?

Well all of memory-legalizer-fence.ll seems pretty useless at the moment since it doesn't show any of the waitcnts that were inserted. (OK it does still show cache invalidate instructions.)

As an alternative, how about changing SIInsertWaitcnts so that it does not remove redundant waitcnts at -O0 and using that for all the memory legalizer tests?

I like this much more: we will see what is produced without decoding and it shall work faster too. Now these tests are very slow.

@jayfoad jayfoad closed this May 1, 2024
@jayfoad
Copy link
Contributor Author

jayfoad commented May 1, 2024

I am not a fan of the idea to convert all memory legalizer tests to mir. Maybe you can add some specific mir tests instead?

Well all of memory-legalizer-fence.ll seems pretty useless at the moment since it doesn't show any of the waitcnts that were inserted. (OK it does still show cache invalidate instructions.)
As an alternative, how about changing SIInsertWaitcnts so that it does not remove redundant waitcnts at -O0 and using that for all the memory legalizer tests?

I like this much more: we will see what is produced without decoding and it shall work faster too. Now these tests are very slow.

#90716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants