Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 17, 2025

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

Copy link
Contributor Author

arsenm commented Sep 17, 2025

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

This breaks the assembler test which enables both, but this
behavior is not really useful. Maybe it's better to just delete
the test.


Patch is 24.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159234.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp (+14-2)
  • (modified) llvm/test/MC/AMDGPU/wave_any.s (+62-60)
  • (added) llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s (+23)
  • (added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s (+13)
  • (added) llvm/test/MC/Disassembler/AMDGPU/gfx9_wave32_feature.txt (+13)
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {
   MCSubtargetInfo *STI =
       createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS);
 
+  bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64);
+  bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32);
+
   // FIXME: We should error for the default target.
   if (STI->getFeatureBits().none())
     STI->ToggleFeature(AMDGPU::FeatureSouthernIslands);
 
-  if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) &&
-      !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) {
+  if (!IsWave64 && !IsWave32) {
     // If there is no default wave size it must be a generation before gfx10,
     // these have FeatureWavefrontSize64 in their definition already. For gfx10+
     // set wave32 as a default.
     STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI)
                            ? AMDGPU::FeatureWavefrontSize32
                            : AMDGPU::FeatureWavefrontSize64);
+  } else if (IsWave64 && IsWave32) {
+    // The wave size is mutually exclusive. If both somehow end up set, wave64
+    // wins.
+    //
+    // FIXME: This should really just be an error.
+    STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32);
   }
 
+  assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
+          STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
+         "wavesize features are mutually exclusive");
+
   return STI;
 }
 
diff --git a/llvm/test/MC/AMDGPU/wave_any.s b/llvm/test/MC/AMDGPU/wave_any.s
index 27502eff89bfc..15b235a92d68e 100644
--- a/llvm/test/MC/AMDGPU/wave_any.s
+++ b/llvm/test/MC/AMDGPU/wave_any.s
@@ -1,229 +1,231 @@
-// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 6
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefixes=GFX10 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -filetype=null %s 2>&1 | FileCheck -implicit-check-not=error: --check-prefixes=GFX10-ERR %s
 
 v_cmp_ge_i32_e32 s0, v0
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v0            ; encoding: [0x00,0x00,0x0c,0x7d]
 
 v_cmp_ge_i32_e32 vcc_lo, s0, v1
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v1 ; encoding: [0x00,0x02,0x0c,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_cmp_ge_i32_e32 vcc, s0, v2
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v2 ; encoding: [0x00,0x04,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v2            ; encoding: [0x00,0x04,0x0c,0x7d]
 
 v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD
-// GFX10: v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
+// GFX10-ERR: :[[@LINE-1]]:19: error: invalid operand for instruction
 
 v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD
 // GFX10: v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
 
 v_cmp_class_f32_e32 vcc_lo, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_cmp_class_f32_e32 vcc, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10: v_cmp_class_f32_e32 vcc, s0, v0         ; encoding: [0x00,0x00,0x10,0x7d]
 
 v_cmp_class_f16_sdwa vcc_lo, v1, v2 src0_sel:DWORD src1_sel:DWORD
-// GFX10: v_cmp_class_f16_sdwa vcc_lo, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x00,0x06,0x06]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
 
 v_cmp_class_f16_sdwa vcc, v1, v2 src0_sel:DWORD src1_sel:DWORD
 // GFX10: v_cmp_class_f16_sdwa vcc, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x00,0x06,0x06]
 
 v_cmp_class_f16_sdwa s0, v1, v2 src0_sel:DWORD src1_sel:DWORD
-// GFX10: v_cmp_class_f16_sdwa s0, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x80,0x06,0x06]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
 
 v_cmp_class_f16_sdwa s[0:1], v1, v2 src0_sel:DWORD src1_sel:DWORD
 // GFX10: v_cmp_class_f16_sdwa s[0:1], v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x80,0x06,0x06]
 
 v_cndmask_b32_e32 v1, v2, v3,
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc       ; encoding: [0x02,0x07,0x02,0x02]
 
 v_cndmask_b32_e32 v1, v2, v3, vcc_lo
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_cndmask_b32_e32 v1, v2, v3, vcc
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc       ; encoding: [0x02,0x07,0x02,0x02]
 
 v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10: v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x50]
 
 v_add_co_ci_u32_e32 v3, v3, v4
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10: v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x50]
 
 v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10: v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x52]
 
 v_sub_co_ci_u32_e32 v3, v3, v4
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10: v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x52]
 
 v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10: v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc ; encoding: [0x80,0x02,0x02,0x54]
 
 v_subrev_co_ci_u32_e32 v1, 0, v1
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10: v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc ; encoding: [0x80,0x02,0x02,0x54]
 
 v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 // GFX10: v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
 
 v_add_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
+// GFX10: v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
 
 v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 // GFX10: v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
 
 v_sub_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
+// GFX10: v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
 
 v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 // GFX10: v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
 
 v_subrev_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
+// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
 
 v_add_co_ci_u32 v1, sext(v1), sext(v4) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
+// GFX10: v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
 
 v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 // GFX10: v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
 
 v_add_co_ci_u32_dpp v5, v1, v2 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
+// GFX10: v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
 
 v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
 // GFX10: v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
 
 v_sub_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_sub_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x52,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_sub_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
 // GFX10: v_sub_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x52,0x01,0xe4,0x00,0x00]
 
 v_subrev_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_subrev_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x54,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
 
 v_subrev_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
 // GFX10: v_subrev_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x54,0x01,0xe4,0x00,0x00]
 
 v_add_co_u32 v0, s0, v0, v2
-// GFX10: v_add_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
 
 v_add_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_add_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
 
 v_add_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_sub_co_u32 v0, s0, v0, v2
-// GFX10: v_sub_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
 
 v_sub_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_sub_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
 
 v_sub_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_sub_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x29,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_subrev_co_u32 v0, s0, v0, v2
-// GFX10: v_subrev_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:21: error: invalid operand for instruction
 
 v_subrev_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_subrev_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_subrev_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_subrev_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x2a,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:28: error: invalid operand for instruction
 
 v_add_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, s[0:1], v0, v2         ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
 
 v_add_co_u32 v0, exec, v0, v2
-// GFX10: v_add_co_u32 v0, exec, v0, v2 ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, exec, v0, v2           ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
 
 v_add_co_u32 v0, exec_lo, v0, v2
-// GFX10: v_add_co_u32 v0, exec_lo, v0, v2 ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
 
 v_add_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, s[0:1], v0, v2         ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
 
 v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
 // GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0x0a,0x00]
 
 v_sub_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2         ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
 
 v_sub_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2         ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
 
 v_sub_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
 // GFX10: v_sub_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x29,0xd5,0x01,0x0b,0x0a,0x00]
 
 v_subrev_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2      ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
 
 v_subrev_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2      ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
 
 v_subrev_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
 // GFX10: v_subrev_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x2a,0xd5,0x01,0x0b,0x0a,0x00]
 
 v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2
-// GFX10: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2 ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s[2:3]
-// GFX10: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s[2:3] ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo
-// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc
 // GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]
 
 v_div_scale_f32 v2, s2, v0, v0, v2
-// GFX10: v_div_scale_f32 v2, s2, v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:21: error: invalid operand for instruction
 
 v_div_scale_f32 v2, s[2:3], v0, v0, v2
-// GFX10: v_div_scale_f32 v2, s[2:3], v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10: v_div_scale_f32 v2, s[2:3], v0, v0, v2  ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
 
 v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3]
-// GFX10: v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
 
 v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3]
 // GFX10: v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]
 
 v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]
-// GFX10: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:23: error: invalid operand for instruction
 
 v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]
 // GFX10: v_mad_i64_...
[truncated]

@arsenm arsenm marked this pull request as ready for review September 17, 2025 03:08
Base automatically changed from users/arsenm/amdgpu/remove-subtarget-hacking-asmparser to main September 17, 2025 03:09
@kosarev kosarev requested a review from Flakebi September 17, 2025 07:14
@jayfoad
Copy link
Contributor

jayfoad commented Sep 17, 2025

Does this make GCNSubtarget::checkSubtargetFeatures redundant?

@jayfoad
Copy link
Contributor

jayfoad commented Sep 17, 2025

This breaks the assembler test which enables both, but this
behavior is not really useful.

It's useful for assembling a single source file containing wave32 and wave64 code. This is common in graphics where you can have a single pipeline containing two shaders, like a vertex shader using wave32 and a pixel shader using wave64.

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

It's useful for assembling a single source file containing wave32 and wave64 code. This is common in graphics where you can have a single pipeline containing two shaders, like a vertex shader using wave32 and a pixel shader using wave64.

To actually support this you would need to have MCSubtargetInfo specified per function, and not globally. Just turning on both modes globally is not the same thing

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

This is common in graphics where you can have a single pipeline containing two shaders, like a vertex shader using wave32 and a pixel shader using wave64.

Also do you really need to do that? We could have a much simpler ABI if we just banned this and required those to inhabit separate modules, with different triple arches

@jayfoad
Copy link
Contributor

jayfoad commented Sep 17, 2025

This is common in graphics where you can have a single pipeline containing two shaders, like a vertex shader using wave32 and a pixel shader using wave64.

Also do you really need to do that? We could have a much simpler ABI if we just banned this and required those to inhabit separate modules, with different triple arches

@nhaehnle @trenouf

All I know is we have been doing it pretty much forever in graphics, and we do rely on being able to disassemble and reassemble these modules.

Copy link
Collaborator

@kosarev kosarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In amdgpu-dis we used to hint an llvm-mc command line with -mattr=+wavefrontsize32,+wavefrontsize64 as part of it to accept both variants of instructions, which in the review of the change adding that -mattr option was said to be useful. Tagging @Flakebi.

Comment on lines 107 to 113
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
"wavesize features are mutually exclusive");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
"wavesize features are mutually exclusive");
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) !=
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
"wavesize features are mutually exclusive");

@trenouf
Copy link
Collaborator

trenouf commented Sep 17, 2025

I agree we do use both waveness in the same module in PAL graphics, and we do not want to change that.

To actually support this you would need to have MCSubtargetInfo specified per function, and not globally. Just turning on both modes globally is not the same thing

Why is that? How come we have been able to use this? Is it maybe limited to compute and the way you write a kernel header or info with wave size? In PAL, we do not do that; instead, the wave size for each graphics shader stage is in the PAL metadata.

How about only adding this assert when not in PAL? And changing the test for both together to specify PAL?

Or making the assert only fire if the two functions with different wave size have the same calling convention?

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

I agree we do use both waveness in the same module in PAL graphics, and we do not want to change that.

I would like to change your mind about that. This is unrelated to this particular PR, but I actively want this to be changed. Or at least some firm reasons that this system is desirable, and we should continue with something resembling the status quo. I'm working towards supporting libcalls in the compiler, and want to make a definitive determination that this is how wavesize will work forever. For the compiler to support library calls, the status quo mandates shipping 2 copies of every symbol, which is not ideal for toolchain complexity or linking times.

To actually support this you would need to have MCSubtargetInfo specified per function, and not globally. Just turning on both modes globally is not the same thing

Why is that? How come we have been able to use this? Is it maybe limited to compute and the way you write a kernel header or info with wave size? In PAL, we do not do that; instead, the wave size for each graphics shader stage is in the PAL metadata.

In the end the encoding is the same, the register value is just the low half of the register tuple. For assembly, enabling both modes is unduly permissive would permit wave64 and wave32 usage anywhere in the module.

How about only adding this assert when not in PAL?

Definitely not. The compiler should never key core behaviors off of the target like that. Plus that will not address my immediate issue which requires these to be mutually exclusive states.

Or making the assert only fire if the two functions with different wave size have the same calling convention?

The assembler has no knowledge of the function (and there's often no function, like this test is just random instructions). MC does not have function level subtargets like codegen does, which is pretty broken. In theory we could have some kind of directive to switch the wavesize

In amdgpu-dis we used to hint an llvm-mc command line with -mattr=+wavefrontsize32,+wavefrontsize64 as part of it to accept both variants of instructions

There's no reason to specify both modes for disassembly. One will definitely win (and it appears wave32 wins).

we do rely on being able to disassemble and reassemble these modules.

If it's disassemble and reassemble, and not assemble there shouldn't be an issue. Disassembly implicitly ignores wave64 for the wave32 targets

@jayfoad
Copy link
Contributor

jayfoad commented Sep 17, 2025

In theory we could have some kind of directive to switch the wavesize

Maybe a job for mapping symbols?

In amdgpu-dis we used to hint an llvm-mc command line with -mattr=+wavefrontsize32,+wavefrontsize64 as part of it to accept both variants of instructions

There's no reason to specify both modes for disassembly. One will definitely win (and it appears wave32 wins).

we do rely on being able to disassemble and reassemble these modules.

If it's disassemble and reassemble, and not assemble there shouldn't be an issue. Disassembly implicitly ignores wave64 for the wave32 targets

You're right, it was not necessary for pure disassembly + reassembly. It only became an issue for disassembly + hand-editing + reassembly. People would naturally write code like this:

// ... in a wave64 shader:
v_cmp_gt_u32 vcc, v0, v1
// ... in a wave32 shader:
v_cmp_gt_u32 vcc_lo, v0, v1

which you can only reassemble with -mattr=+wavefrontsize32,+wavefrontsize64.

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

Maybe a job for mapping symbols?

Yes, this is essentially the same problem

@arsenm arsenm force-pushed the users/arsenm/gfx1250-wave64-error branch from 50bdbb4 to 0524124 Compare September 17, 2025 11:04
@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

I found a way to hack in the old assembler behavior with both wave sizes. We're really not handling assembler predicates correctly. Essentially none of the manual validation code is correct, and shouldn't be necessary

@arsenm
Copy link
Contributor Author

arsenm commented Sep 24, 2025

ping

Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.

This breaks the assembler test which enables both, but this
behavior is not really useful. Maybe it's better to just delete
the test.
@jayfoad
Copy link
Contributor

jayfoad commented Sep 25, 2025

This breaks the assembler test

Description needs updating.

@arsenm arsenm force-pushed the users/arsenm/gfx1250-wave64-error branch from 8860688 to 8e65760 Compare September 25, 2025 08:26
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comment fix.

// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+wavefrontsize64 -o - %s | FileCheck -check-prefix=GFX1250 %s
// RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -mattr=+wavefrontsize32 -o - %s | FileCheck -check-prefix=GFX900 %s

// Both sure setting both modes is supported at the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Both sure setting"???

@arsenm arsenm enabled auto-merge (squash) September 25, 2025 09:16
@arsenm arsenm merged commit 0a80631 into main Sep 25, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/gfx1250-wave64-error branch September 25, 2025 09:46
ckoparkar added a commit to ckoparkar/llvm-project that referenced this pull request Sep 25, 2025
* main: (502 commits)
  GlobalISel: Adjust insert point when expanding G_[SU]DIVREM  (llvm#160683)
  [LV] Add coverage for fixing-up scalar resume values (llvm#160492)
  AMDGPU: Convert wave_any test to use update_mc_test_checks
  [LV] Add partial reduction tests multiplying extend with constants.
  Revert "[MLIR] Implement remark emitting policies in MLIR" (llvm#160681)
  [NFC][InstSimplify] Refactor fminmax-folds.ll test (llvm#160504)
  [LoongArch] Pre-commit tests for [x]vldi instructions with special constant splats (llvm#159228)
  [BOLT] Fix dwarf5-dwoid-no-dwoname.s (llvm#160676)
  [lldb][test] Refactor and expand TestMemoryRegionDirtyPages.py (llvm#156035)
  [gn build] Port 833d5f0
  AMDGPU: Ensure both wavesize features are not set (llvm#159234)
  [LoopInterchange] Bail out when finding a dependency with all `*` elements (llvm#149049)
  [libc++] Avoid constructing additional objects when using map::at (llvm#157866)
  [lldb][test] Make hex prefix optional in DWARF union types test
  [X86] Add missing prefixes to trunc-sat tests (llvm#160662)
  [AMDGPU] Fix vector legalization for bf16 valu ops (llvm#158439)
  [LoongArch][NFC] Pre-commit tests for `[x]vadda.{b/h/w/d}`
  [mlir][tosa] Relax constraint on matmul verifier requiring equal operand types (llvm#155799)
  [clang][Sema] Accept gnu format attributes (llvm#160255)
  [LoongArch][NFC] Add tests for element extraction from binary add operation (llvm#159725)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants