[AMDGPU] Don't form sext/abs/neg fp8 cvt #83843

Pierre-vh · 2024-03-04T13:39:10Z

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.

Fixes SWDEV-447468

Verifier should have also caught this, I will add that in a separate patch. Fixes SWDEV-447468

llvmbot · 2024-03-04T13:39:41Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants.
Verifier should have also caught this, I will add that in a separate patch.

Fixes SWDEV-447468

Full diff: https://github.com/llvm/llvm-project/pull/83843.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp (+9)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll (+96)

diff --git a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
index afc380b4203457..1fadd8ce45b1f5 100644
--- a/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
@@ -338,6 +338,15 @@ MachineInstr *SDWASrcOperand::potentialToConvert(const SIInstrInfo *TII) {
 }
 
 bool SDWASrcOperand::convertToSDWA(MachineInstr &MI, const SIInstrInfo *TII) {
+  switch (MI.getOpcode()) {
+  case AMDGPU::V_CVT_F32_FP8_sdwa:
+  case AMDGPU::V_CVT_F32_BF8_sdwa:
+  case AMDGPU::V_CVT_PK_F32_FP8_sdwa:
+  case AMDGPU::V_CVT_PK_F32_BF8_sdwa:
+    // Does not support input modifiers: noabs, noneg, nosext.
+    return false;
+  }
+
   // Find operand in instruction that matches source operand and replace it with
   // target operand. Set corresponding src_sel
   bool IsPreserveSrc = false;
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
index fc4b663b85a61b..9b8fdf90170458 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
@@ -534,3 +534,99 @@ define i32 @test_cvt_sr_fp8_f32_byte3(float %x, i32 %r, i32 %old) {
   %ret = tail call i32 @llvm.amdgcn.cvt.sr.fp8.f32(float %x, i32 %r, i32 %old, i32 3)
   ret i32 %ret
 }
+
+define float @test_sext_cvt_f32_fp8(i16 %a) {
+; GFX940-LABEL: test_sext_cvt_f32_fp8:
+; GFX940:       ; %bb.0:
+; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX940-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX940-NEXT:    v_cvt_f32_fp8_sdwa v0, v0 src0_sel:BYTE_1
+; GFX940-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_sext_cvt_f32_fp8:
+; GFX12:       ; %bb.0:
+; GFX12-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-NEXT:    s_wait_expcnt 0x0
+; GFX12-NEXT:    s_wait_samplecnt 0x0
+; GFX12-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_cvt_f32_fp8_e64 v0, v0 op_sel:[0,1]
+; GFX12-NEXT:    s_setpc_b64 s[30:31]
+  %a.sext = sext i16 %a to i32
+  %ret = tail call float @llvm.amdgcn.cvt.f32.fp8(i32 %a.sext, i32 1)
+  ret float %ret
+}
+
+define float @test_sext_cvt_f32_bf8(i16 %a) {
+; GFX940-LABEL: test_sext_cvt_f32_bf8:
+; GFX940:       ; %bb.0:
+; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX940-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX940-NEXT:    v_cvt_f32_bf8_sdwa v0, v0 src0_sel:BYTE_1
+; GFX940-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_sext_cvt_f32_bf8:
+; GFX12:       ; %bb.0:
+; GFX12-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-NEXT:    s_wait_expcnt 0x0
+; GFX12-NEXT:    s_wait_samplecnt 0x0
+; GFX12-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_cvt_f32_bf8_e64 v0, v0 op_sel:[0,1]
+; GFX12-NEXT:    s_setpc_b64 s[30:31]
+  %a.sext = sext i16 %a to i32
+  %ret = tail call float @llvm.amdgcn.cvt.f32.bf8(i32 %a.sext, i32 1)
+  ret float %ret
+}
+
+define <2 x float> @test_sext_cvt_pk_f32_bf8_word1(i16 %a) {
+; GFX940-LABEL: test_sext_cvt_pk_f32_bf8_word1:
+; GFX940:       ; %bb.0:
+; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX940-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX940-NEXT:    v_cvt_pk_f32_bf8_sdwa v[0:1], v0 src0_sel:WORD_1
+; GFX940-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_sext_cvt_pk_f32_bf8_word1:
+; GFX12:       ; %bb.0:
+; GFX12-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-NEXT:    s_wait_expcnt 0x0
+; GFX12-NEXT:    s_wait_samplecnt 0x0
+; GFX12-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_cvt_pk_f32_bf8_e64 v[0:1], v0 op_sel:[1,0]
+; GFX12-NEXT:    s_setpc_b64 s[30:31]
+  %a.sext = sext i16 %a to i32
+  %ret = tail call <2 x float> @llvm.amdgcn.cvt.pk.f32.bf8(i32 %a.sext, i1 true)
+  ret <2 x float> %ret
+}
+
+define <2 x float> @test_sext_cvt_pk_f32_fp8_word0(i16 %a) {
+; GFX940-LABEL: test_sext_cvt_pk_f32_fp8_word0:
+; GFX940:       ; %bb.0:
+; GFX940-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX940-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX940-NEXT:    v_cvt_pk_f32_fp8_e32 v[0:1], v0
+; GFX940-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-LABEL: test_sext_cvt_pk_f32_fp8_word0:
+; GFX12:       ; %bb.0:
+; GFX12-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-NEXT:    s_wait_expcnt 0x0
+; GFX12-NEXT:    s_wait_samplecnt 0x0
+; GFX12-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_cvt_pk_f32_fp8_e32 v[0:1], v0
+; GFX12-NEXT:    s_setpc_b64 s[30:31]
+  %a.sext = sext i16 %a to i32
+  %ret = tail call <2 x float> @llvm.amdgcn.cvt.pk.f32.fp8(i32 %a.sext, i1 false)
+  ret <2 x float> %ret
+}

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468 Change-Id: I960c5182f226b500d4c325226b3574ecebcd670d

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468 Change-Id: I818c4e029b04728bbf0fe15c5fff96c3727a7e97

[AMDGPU] Don't form sext/abs/neg fp8 cvt on gfx940

596979a

Verifier should have also caught this, I will add that in a separate patch. Fixes SWDEV-447468

Pierre-vh requested review from jayfoad and arsenm March 4, 2024 13:39

llvmbot added the backend:AMDGPU label Mar 4, 2024

add verifier

55b8a4d

arsenm requested changes Mar 5, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Show resolved Hide resolved

add verifier test

ea62559

Pierre-vh requested a review from arsenm March 6, 2024 07:24

arsenm approved these changes Mar 6, 2024

View reviewed changes

Pierre-vh merged commit 52d5b8e into llvm:main Mar 6, 2024
4 checks passed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 11, 2024

[AMDGPU] Don't form sext/abs/neg fp8 cvt (llvm#83843)

8761205

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468 Change-Id: I960c5182f226b500d4c325226b3574ecebcd670d

rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request May 8, 2024

[AMDGPU] Don't form sext/abs/neg fp8 cvt (llvm#83843)

122eb3d

gfx940 does not allow abs/sext/neg on v_cvt_fp8/bf8 & pk variants. Fixes SWDEV-447468 Change-Id: I818c4e029b04728bbf0fe15c5fff96c3727a7e97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Don't form sext/abs/neg fp8 cvt #83843

[AMDGPU] Don't form sext/abs/neg fp8 cvt #83843

Pierre-vh commented Mar 4, 2024 •

edited

llvmbot commented Mar 4, 2024

[AMDGPU] Don't form sext/abs/neg fp8 cvt #83843

[AMDGPU] Don't form sext/abs/neg fp8 cvt #83843

Conversation

Pierre-vh commented Mar 4, 2024 • edited

llvmbot commented Mar 4, 2024

Pierre-vh commented Mar 4, 2024 •

edited