-
Notifications
You must be signed in to change notification settings - Fork 15.1k
AMDGPU: Ensure both wavesize features are not set #159234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesMake sure we cannot be in a mode with both wavesizes. This This breaks the assembler test which enables both, but this Patch is 24.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159234.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp
@@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {
MCSubtargetInfo *STI =
createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS);
+ bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64);
+ bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32);
+
// FIXME: We should error for the default target.
if (STI->getFeatureBits().none())
STI->ToggleFeature(AMDGPU::FeatureSouthernIslands);
- if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) &&
- !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) {
+ if (!IsWave64 && !IsWave32) {
// If there is no default wave size it must be a generation before gfx10,
// these have FeatureWavefrontSize64 in their definition already. For gfx10+
// set wave32 as a default.
STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI)
? AMDGPU::FeatureWavefrontSize32
: AMDGPU::FeatureWavefrontSize64);
+ } else if (IsWave64 && IsWave32) {
+ // The wave size is mutually exclusive. If both somehow end up set, wave64
+ // wins.
+ //
+ // FIXME: This should really just be an error.
+ STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32);
}
+ assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^
+ STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) &&
+ "wavesize features are mutually exclusive");
+
return STI;
}
diff --git a/llvm/test/MC/AMDGPU/wave_any.s b/llvm/test/MC/AMDGPU/wave_any.s
index 27502eff89bfc..15b235a92d68e 100644
--- a/llvm/test/MC/AMDGPU/wave_any.s
+++ b/llvm/test/MC/AMDGPU/wave_any.s
@@ -1,229 +1,231 @@
-// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefix=GFX10 %s
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 6
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefixes=GFX10 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -filetype=null %s 2>&1 | FileCheck -implicit-check-not=error: --check-prefixes=GFX10-ERR %s
v_cmp_ge_i32_e32 s0, v0
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v0 ; encoding: [0x00,0x00,0x0c,0x7d]
v_cmp_ge_i32_e32 vcc_lo, s0, v1
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v1 ; encoding: [0x00,0x02,0x0c,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_cmp_ge_i32_e32 vcc, s0, v2
-// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v2 ; encoding: [0x00,0x04,0x0c,0x7d]
+// GFX10: v_cmp_ge_i32_e32 vcc, s0, v2 ; encoding: [0x00,0x04,0x0c,0x7d]
v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD
-// GFX10: v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
+// GFX10-ERR: :[[@LINE-1]]:19: error: invalid operand for instruction
v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD
// GFX10: v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06]
v_cmp_class_f32_e32 vcc_lo, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_cmp_class_f32_e32 vcc, s0, v0
-// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
+// GFX10: v_cmp_class_f32_e32 vcc, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d]
v_cmp_class_f16_sdwa vcc_lo, v1, v2 src0_sel:DWORD src1_sel:DWORD
-// GFX10: v_cmp_class_f16_sdwa vcc_lo, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x00,0x06,0x06]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
v_cmp_class_f16_sdwa vcc, v1, v2 src0_sel:DWORD src1_sel:DWORD
// GFX10: v_cmp_class_f16_sdwa vcc, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x00,0x06,0x06]
v_cmp_class_f16_sdwa s0, v1, v2 src0_sel:DWORD src1_sel:DWORD
-// GFX10: v_cmp_class_f16_sdwa s0, v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x80,0x06,0x06]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
v_cmp_class_f16_sdwa s[0:1], v1, v2 src0_sel:DWORD src1_sel:DWORD
// GFX10: v_cmp_class_f16_sdwa s[0:1], v1, v2 src0_sel:DWORD src1_sel:DWORD ; encoding: [0xf9,0x04,0x1e,0x7d,0x01,0x80,0x06,0x06]
v_cndmask_b32_e32 v1, v2, v3,
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x02]
v_cndmask_b32_e32 v1, v2, v3, vcc_lo
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_cndmask_b32_e32 v1, v2, v3, vcc
-// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc_lo ; encoding: [0x02,0x07,0x02,0x02]
+// GFX10: v_cndmask_b32_e32 v1, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x02]
v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10: v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x50]
v_add_co_ci_u32_e32 v3, v3, v4
-// GFX10: v_add_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x50]
+// GFX10: v_add_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x50]
v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10: v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x52]
v_sub_co_ci_u32_e32 v3, v3, v4
-// GFX10: v_sub_co_ci_u32_e32 v3, vcc_lo, v3, v4, vcc_lo ; encoding: [0x03,0x09,0x06,0x52]
+// GFX10: v_sub_co_ci_u32_e32 v3, vcc, v3, v4, vcc ; encoding: [0x03,0x09,0x06,0x52]
v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10: v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc ; encoding: [0x80,0x02,0x02,0x54]
v_subrev_co_ci_u32_e32 v1, 0, v1
-// GFX10: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo ; encoding: [0x80,0x02,0x02,0x54]
+// GFX10: v_subrev_co_ci_u32_e32 v1, vcc, 0, v1, vcc ; encoding: [0x80,0x02,0x02,0x54]
v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
// GFX10: v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
v_add_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
+// GFX10: v_add_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x00,0x06]
v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
// GFX10: v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
v_sub_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_sub_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
+// GFX10: v_sub_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x52,0x01,0x06,0x00,0x06]
v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
v_subrev_co_ci_u32_sdwa v1, v1, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc_lo, v1, v4, vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
+// GFX10: v_subrev_co_ci_u32_sdwa v1, vcc, v1, v4, vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x54,0x01,0x06,0x00,0x06]
v_add_co_ci_u32 v1, sext(v1), sext(v4) dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
+// GFX10: v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
-// GFX10: v_add_co_ci_u32_sdwa v1, vcc_lo, sext(v1), sext(v4), vcc_lo dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
// GFX10: v_add_co_ci_u32_sdwa v1, vcc, sext(v1), sext(v4), vcc dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD ; encoding: [0xf9,0x08,0x02,0x50,0x01,0x06,0x08,0x0e]
v_add_co_ci_u32_dpp v5, v1, v2 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
+// GFX10: v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_add_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
// GFX10: v_add_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x50,0x01,0xe4,0x00,0x00]
v_sub_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_sub_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x52,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_sub_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
// GFX10: v_sub_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x52,0x01,0xe4,0x00,0x00]
v_subrev_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
-// GFX10: v_subrev_co_ci_u32_dpp v5, vcc_lo, v1, v2, vcc_lo quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x54,0x01,0xe4,0x00,0x00]
+// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode
v_subrev_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
// GFX10: v_subrev_co_ci_u32_dpp v5, vcc, v1, v2, vcc quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 ; encoding: [0xfa,0x04,0x0a,0x54,0x01,0xe4,0x00,0x00]
v_add_co_u32 v0, s0, v0, v2
-// GFX10: v_add_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
v_add_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_add_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
v_add_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_sub_co_u32 v0, s0, v0, v2
-// GFX10: v_sub_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
v_sub_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_sub_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:22: error: invalid operand for instruction
v_sub_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_sub_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x29,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_subrev_co_u32 v0, s0, v0, v2
-// GFX10: v_subrev_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:21: error: invalid operand for instruction
v_subrev_co_u32_e64 v0, s0, v0, v2
-// GFX10: v_subrev_co_u32 v0, s0, v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_subrev_co_ci_u32_e64 v4, s0, v1, v5, s2
-// GFX10: v_subrev_co_ci_u32_e64 v4, s0, v1, v5, s2 ; encoding: [0x04,0x00,0x2a,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:28: error: invalid operand for instruction
v_add_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
v_add_co_u32 v0, exec, v0, v2
-// GFX10: v_add_co_u32 v0, exec, v0, v2 ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, exec, v0, v2 ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
v_add_co_u32 v0, exec_lo, v0, v2
-// GFX10: v_add_co_u32 v0, exec_lo, v0, v2 ; encoding: [0x00,0x7e,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10-ERR: :[[@LINE-1]]:18: error: invalid operand for instruction
v_add_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_add_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x0f,0xd7,0x00,0x05,0x02,0x00]
v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
// GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0x0a,0x00]
v_sub_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
v_sub_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_sub_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x10,0xd7,0x00,0x05,0x02,0x00]
v_sub_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
// GFX10: v_sub_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x29,0xd5,0x01,0x0b,0x0a,0x00]
v_subrev_co_u32 v0, s[0:1], v0, v2
-// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
v_subrev_co_u32_e64 v0, s[0:1], v0, v2
-// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
+// GFX10: v_subrev_co_u32 v0, s[0:1], v0, v2 ; encoding: [0x00,0x00,0x19,0xd7,0x00,0x05,0x02,0x00]
v_subrev_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3]
// GFX10: v_subrev_co_ci_u32_e64 v4, s[0:1], v1, v5, s[2:3] ; encoding: [0x04,0x00,0x2a,0xd5,0x01,0x0b,0x0a,0x00]
v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2
-// GFX10: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s2 ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s[2:3]
-// GFX10: v_add_co_ci_u32_e64 v4, vcc_lo, v1, v5, s[2:3] ; encoding: [0x04,0x6a,0x28,0xd5,0x01,0x0b,0x0a,0x00]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo
-// GFX10: v_add_co_ci_u32_e64 v4, s0, v1, v5, vcc_lo ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc
// GFX10: v_add_co_ci_u32_e64 v4, s[0:1], v1, v5, vcc ; encoding: [0x04,0x00,0x28,0xd5,0x01,0x0b,0xaa,0x01]
v_div_scale_f32 v2, s2, v0, v0, v2
-// GFX10: v_div_scale_f32 v2, s2, v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:21: error: invalid operand for instruction
v_div_scale_f32 v2, s[2:3], v0, v0, v2
-// GFX10: v_div_scale_f32 v2, s[2:3], v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10: v_div_scale_f32 v2, s[2:3], v0, v0, v2 ; encoding: [0x02,0x02,0x6d,0xd5,0x00,0x01,0x0a,0x04]
v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3]
-// GFX10: v_div_scale_f64 v[2:3], s2, v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:25: error: invalid operand for instruction
v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3]
// GFX10: v_div_scale_f64 v[2:3], s[2:3], v[0:1], v[0:1], v[2:3] ; encoding: [0x02,0x02,0x6e,0xd5,0x00,0x01,0x0a,0x04]
v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3]
-// GFX10: v_mad_i64_i32 v[0:1], s6, v0, v1, v[2:3] ; encoding: [0x00,0x06,0x77,0xd5,0x00,0x03,0x0a,0x04]
+// GFX10-ERR: :[[@LINE-1]]:23: error: invalid operand for instruction
v_mad_i64_i32 v[0:1], s[6:7], v0, v1, v[2:3]
// GFX10: v_mad_i64_...
[truncated]
|
Does this make |
It's useful for assembling a single source file containing wave32 and wave64 code. This is common in graphics where you can have a single pipeline containing two shaders, like a vertex shader using wave32 and a pixel shader using wave64. |
To actually support this you would need to have MCSubtargetInfo specified per function, and not globally. Just turning on both modes globally is not the same thing |
Also do you really need to do that? We could have a much simpler ABI if we just banned this and required those to inhabit separate modules, with different triple arches |
All I know is we have been doing it pretty much forever in graphics, and we do rely on being able to disassemble and reassemble these modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In amdgpu-dis we used to hint an llvm-mc command line with -mattr=+wavefrontsize32,+wavefrontsize64
as part of it to accept both variants of instructions, which in the review of the change adding that -mattr option was said to be useful. Tagging @Flakebi.
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^ | ||
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) && | ||
"wavesize features are mutually exclusive"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^ | |
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) && | |
"wavesize features are mutually exclusive"); | |
assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) != | |
STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) && | |
"wavesize features are mutually exclusive"); |
I agree we do use both waveness in the same module in PAL graphics, and we do not want to change that.
Why is that? How come we have been able to use this? Is it maybe limited to compute and the way you write a kernel header or info with wave size? In PAL, we do not do that; instead, the wave size for each graphics shader stage is in the PAL metadata. How about only adding this assert when not in PAL? And changing the test for both together to specify PAL? Or making the assert only fire if the two functions with different wave size have the same calling convention? |
I would like to change your mind about that. This is unrelated to this particular PR, but I actively want this to be changed. Or at least some firm reasons that this system is desirable, and we should continue with something resembling the status quo. I'm working towards supporting libcalls in the compiler, and want to make a definitive determination that this is how wavesize will work forever. For the compiler to support library calls, the status quo mandates shipping 2 copies of every symbol, which is not ideal for toolchain complexity or linking times.
In the end the encoding is the same, the register value is just the low half of the register tuple. For assembly, enabling both modes is unduly permissive would permit wave64 and wave32 usage anywhere in the module.
Definitely not. The compiler should never key core behaviors off of the target like that. Plus that will not address my immediate issue which requires these to be mutually exclusive states.
The assembler has no knowledge of the function (and there's often no function, like this test is just random instructions). MC does not have function level subtargets like codegen does, which is pretty broken. In theory we could have some kind of directive to switch the wavesize
There's no reason to specify both modes for disassembly. One will definitely win (and it appears wave32 wins).
If it's disassemble and reassemble, and not assemble there shouldn't be an issue. Disassembly implicitly ignores wave64 for the wave32 targets |
Maybe a job for mapping symbols?
You're right, it was not necessary for pure disassembly + reassembly. It only became an issue for disassembly + hand-editing + reassembly. People would naturally write code like this:
which you can only reassemble with |
Yes, this is essentially the same problem |
50bdbb4
to
0524124
Compare
I found a way to hack in the old assembler behavior with both wave sizes. We're really not handling assembler predicates correctly. Essentially none of the manual validation code is correct, and shouldn't be necessary |
ping |
Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor. This breaks the assembler test which enables both, but this behavior is not really useful. Maybe it's better to just delete the test.
Description needs updating. |
8860688
to
8e65760
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comment fix.
// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+wavefrontsize64 -o - %s | FileCheck -check-prefix=GFX1250 %s | ||
// RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -mattr=+wavefrontsize32 -o - %s | FileCheck -check-prefix=GFX900 %s | ||
|
||
// Both sure setting both modes is supported at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Both sure setting"???
* main: (502 commits) GlobalISel: Adjust insert point when expanding G_[SU]DIVREM (llvm#160683) [LV] Add coverage for fixing-up scalar resume values (llvm#160492) AMDGPU: Convert wave_any test to use update_mc_test_checks [LV] Add partial reduction tests multiplying extend with constants. Revert "[MLIR] Implement remark emitting policies in MLIR" (llvm#160681) [NFC][InstSimplify] Refactor fminmax-folds.ll test (llvm#160504) [LoongArch] Pre-commit tests for [x]vldi instructions with special constant splats (llvm#159228) [BOLT] Fix dwarf5-dwoid-no-dwoname.s (llvm#160676) [lldb][test] Refactor and expand TestMemoryRegionDirtyPages.py (llvm#156035) [gn build] Port 833d5f0 AMDGPU: Ensure both wavesize features are not set (llvm#159234) [LoopInterchange] Bail out when finding a dependency with all `*` elements (llvm#149049) [libc++] Avoid constructing additional objects when using map::at (llvm#157866) [lldb][test] Make hex prefix optional in DWARF union types test [X86] Add missing prefixes to trunc-sat tests (llvm#160662) [AMDGPU] Fix vector legalization for bf16 valu ops (llvm#158439) [LoongArch][NFC] Pre-commit tests for `[x]vadda.{b/h/w/d}` [mlir][tosa] Relax constraint on matmul verifier requiring equal operand types (llvm#155799) [clang][Sema] Accept gnu format attributes (llvm#160255) [LoongArch][NFC] Add tests for element extraction from binary add operation (llvm#159725) ...
Make sure we cannot be in a mode with both wavesizes. This
prevents assertions in a future change. This should probably
just be an error, but we do not have a good way to report
errors from the MCSubtargetInfo constructor.