[AMDGPU] Encode unused VALU src0/1/2 fields as inline 0 on GFX10+#175753
[AMDGPU] Encode unused VALU src0/1/2 fields as inline 0 on GFX10+#175753
Conversation
…eudos This is less convoluted since VOP1/2/C and VOP3 Reals share the same Pseudo, and we only need the PostEncoderMethod for the VOP3 Reals.
This has no functional effect since these source fields are unused, but it can avoid some conservative stalls due to these instructions looking like they read from an SGPR, since 0 is the encoding for s0. Using 0x80 is more benign since it is the encoding for inline immediate 0.
|
This PR includes #175182 as its first commit. |
|
Actually this patch only does it for VOP3* encodings. We could do the same for src0 of VOP1/2/C encodings, but in practice that would probably only affect v_nop and v_pipeflush. As a followup/cleanup, we could reimplement #167777 using the approach in this patch. |
kosarev
left a comment
There was a problem hiding this comment.
Maybe replace the '0x80' with 'inline 0' in the title?
| // Set unused source fields in VOP3 encodings to inline immediate 0 to avoid | ||
| // hardware conservatively assuming the instruction reads SGPRs. |
There was a problem hiding this comment.
Does hardware actually do that assumption?
There was a problem hiding this comment.
Yes we have definitely seen some unnecessary stalls caused by this on some hardware.
Sure, done - I was just trying to keep it as short as possible. |
rampitec
left a comment
There was a problem hiding this comment.
This unfortunately disagrees with SP3, but I see the point. LGTM.
|
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesThis has no functional effect since these source fields are unused, but Fixes: SWDEV-574953 Patch is 24.07 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175753.diff 128 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index 49e94183202bd..598d6e46207b1 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -96,6 +96,10 @@ class AMDGPUMCCodeEmitter : public MCCodeEmitter {
APInt &Inst, APInt &Scratch,
const MCSubtargetInfo &STI) const;
+ template <bool HasSrc0, bool HasSrc1, bool HasSrc2>
+ APInt postEncodeVOP3(const MCInst &MI, APInt EncodedValue,
+ const MCSubtargetInfo &STI) const;
+
APInt postEncodeVOPCX(const MCInst &MI, APInt EncodedValue,
const MCSubtargetInfo &STI) const;
};
@@ -719,6 +723,23 @@ void AMDGPUMCCodeEmitter::getMachineOpValueCommon(
llvm_unreachable("Encoding of this operand type is not supported yet.");
}
+template <bool HasSrc0, bool HasSrc1, bool HasSrc2>
+APInt AMDGPUMCCodeEmitter::postEncodeVOP3(const MCInst &MI, APInt EncodedValue,
+ const MCSubtargetInfo &STI) const {
+ if (!AMDGPU::isGFX10Plus(STI))
+ return EncodedValue;
+ // Set unused source fields in VOP3 encodings to inline immediate 0 to avoid
+ // hardware conservatively assuming the instruction reads SGPRs.
+ constexpr uint64_t InlineImmediate0 = 0x80;
+ if (!HasSrc0)
+ EncodedValue |= InlineImmediate0 << 32;
+ if (!HasSrc1)
+ EncodedValue |= InlineImmediate0 << 41;
+ if (!HasSrc2)
+ EncodedValue |= InlineImmediate0 << 50;
+ return EncodedValue;
+}
+
APInt AMDGPUMCCodeEmitter::postEncodeVOPCX(const MCInst &MI, APInt EncodedValue,
const MCSubtargetInfo &STI) const {
// GFX10+ v_cmpx opcodes promoted to VOP3 have implied dst=EXEC.
@@ -732,7 +753,7 @@ APInt AMDGPUMCCodeEmitter::postEncodeVOPCX(const MCInst &MI, APInt EncodedValue,
Desc.hasImplicitDefOfPhysReg(AMDGPU::EXEC));
EncodedValue |= MRI.getEncodingValue(AMDGPU::EXEC_LO) &
AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
- return EncodedValue;
+ return postEncodeVOP3<true, true, false>(MI, EncodedValue, STI);
}
#include "AMDGPUGenMCCodeEmitter.inc"
diff --git a/llvm/lib/Target/AMDGPU/VOP3PInstructions.td b/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
index a96d54a8210c3..cb6a413f993bc 100644
--- a/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -2478,6 +2478,7 @@ multiclass VOP3P_Real_LD_SCALE_gfx1250<bits<8> op> {
VOP3P_Real_Gen<ps, GFX1250Gen, ps.Mnemonic>,
VOP3Pe_gfx11_gfx12<op, ps.Pfl> {
let Inst{58-50} = 0x100; // scale src2 = vgpr0 (dummy)
+ let PostEncoderMethod = "";
}
}
diff --git a/llvm/lib/Target/AMDGPU/VOPCInstructions.td b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
index 95e40dd8e99d9..989181b21f3e9 100644
--- a/llvm/lib/Target/AMDGPU/VOPCInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
@@ -422,7 +422,6 @@ multiclass VOPC_Pseudos <string opName,
}
-let PostEncoderMethod = "postEncodeVOPCX" in
multiclass VOPCX_Pseudos <string opName,
VOPC_Profile P, VOPC_Profile P_NoSDst,
SDPatternOperator cond = COND_NULL,
@@ -1120,7 +1119,6 @@ multiclass VOPC_Class_Pseudos <string opName, VOPC_Profile p, bit DefExec,
} // end SubtargetPredicate = isGFX11Plus
}
-let PostEncoderMethod = "postEncodeVOPCX" in
multiclass VOPCX_Class_Pseudos <string opName,
VOPC_Profile P,
VOPC_Profile P_NoSDst> :
@@ -1536,7 +1534,8 @@ class VOPC64_DPP<VOP_DPP_Pseudo ps, string opName = ps.OpName>
let Uses = ps.Uses;
let OtherPredicates = ps.OtherPredicates;
let Constraints = ps.Constraints;
- let PostEncoderMethod = ps.PostEncoderMethod;
+
+ let PostEncoderMethod = !if(!empty(ps.Defs), "", "postEncodeVOPCX");
}
class VOPC64_DPP16_Dst<bits<10> op, VOP_DPP_Pseudo ps,
@@ -1577,7 +1576,8 @@ class VOPC64_DPP8<VOP_Pseudo ps, string opName = ps.OpName>
let Uses = ps.Uses;
let OtherPredicates = ps.OtherPredicates;
let True16Predicate = ps.True16Predicate;
- let PostEncoderMethod = ps.PostEncoderMethod;
+
+ let PostEncoderMethod = !if(!empty(ps.Defs), "", "postEncodeVOPCX");
}
class VOPC64_DPP8_Dst<bits<10> op, VOP_Pseudo ps, string opName = ps.OpName>
@@ -1780,6 +1780,7 @@ multiclass VOPCX_Real<GFXGen Gen, bits<9> op> {
let Inst{7-0} = ?; // sdst
let AsmString = !subst("_nosdst", "", ps64.Mnemonic)
# "{_e64} " # ps64.AsmOperands;
+ let PostEncoderMethod = "postEncodeVOPCX";
}
defm : VOPCXInstAliases<NAME, !substr(Gen.Suffix, 1)>;
@@ -1841,6 +1842,7 @@ multiclass VOPCX_Real_with_name<GFXGen Gen, bits<9> op, string OpName,
let Inst{7-0} = ?; // sdst
let Inst{14} = 0;
let AsmString = asm_name # "{_e64} " # ps64.AsmOperands;
+ let PostEncoderMethod = "postEncodeVOPCX";
}
} else {
def _e64#Gen.Suffix
@@ -1848,6 +1850,7 @@ multiclass VOPCX_Real_with_name<GFXGen Gen, bits<9> op, string OpName,
VOP3a_gfx11_gfx12<{0, op}, ps64.Pfl> {
let Inst{7-0} = ?; // sdst
let AsmString = asm_name # "{_e64} " # ps64.AsmOperands;
+ let PostEncoderMethod = "postEncodeVOPCX";
}
}
@@ -2189,6 +2192,7 @@ let AssemblerPredicate = isGFX10Only, DecoderNamespace = "GFX10" in {
let Inst{7-0} = ?; // sdst
let AsmString = !subst("_nosdst", "", !cast<VOP3_Pseudo>(NAME#"_nosdst_e64").Mnemonic)
# "{_e64} " # !cast<VOP3_Pseudo>(NAME#"_nosdst_e64").AsmOperands;
+ let PostEncoderMethod = "postEncodeVOPCX";
}
if !cast<VOPC_Pseudo>(NAME#"_nosdst_e32").Pfl.HasExtSDWA9 then
diff --git a/llvm/lib/Target/AMDGPU/VOPInstructions.td b/llvm/lib/Target/AMDGPU/VOPInstructions.td
index 6056cd22875a7..39a202dc9c379 100644
--- a/llvm/lib/Target/AMDGPU/VOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOPInstructions.td
@@ -199,9 +199,10 @@ class VOP3_Real <VOP_Pseudo ps, int EncodingFamily, string asm_name = ps.Mnemoni
let mayStore = ps.mayStore;
let TRANS = ps.TRANS;
let isConvergent = ps.isConvergent;
- let PostEncoderMethod = ps.PostEncoderMethod;
VOPProfile Pfl = ps.Pfl;
+
+ let PostEncoderMethod = !if(!and(Pfl.HasSrc0, Pfl.HasSrc1, Pfl.HasSrc2), "", "postEncodeVOP3<"#Pfl.HasSrc0#","#Pfl.HasSrc1#","#Pfl.HasSrc2#">");
}
class VOP3_Real_Gen <VOP_Pseudo ps, GFXGen Gen, string asm_name = ps.Mnemonic> :
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate-gfx1250.ll b/llvm/test/CodeGen/AMDGPU/code-size-estimate-gfx1250.ll
index fcbf7efdaa653..75af0b5f15306 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate-gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate-gfx1250.ll
@@ -6,7 +6,7 @@ define i16 @cvt_pk_bf8_f16_v(ptr addrspace(1) %out) {
; GFX1250: ; %bb.0:
; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; encoding: [0x00,0x00,0xc8,0xbf]
; GFX1250-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1250-NEXT: v_cvt_pk_bf8_f16 v0, 0x38003800 ; encoding: [0x00,0x00,0x73,0xd7,0xff,0x00,0x00,0x00,0x00,0x38,0x00,0x38]
+; GFX1250-NEXT: v_cvt_pk_bf8_f16 v0, 0x38003800 ; encoding: [0x00,0x00,0x73,0xd7,0xff,0x00,0x01,0x02,0x00,0x38,0x00,0x38]
; GFX1250-NEXT: s_set_pc_i64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
%cvt = tail call i16 @llvm.amdgcn.cvt.pk.bf8.f16(<2 x half> <half 0xH3800, half 0xH3800>)
ret i16 %cvt
@@ -19,7 +19,7 @@ define i16 @cvt_pk_fp8_f16_v(ptr addrspace(1) %out) {
; GFX1250: ; %bb.0:
; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; encoding: [0x00,0x00,0xc8,0xbf]
; GFX1250-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1250-NEXT: v_cvt_pk_fp8_f16 v0, 0x3800 ; encoding: [0x00,0x00,0x72,0xd7,0xff,0x00,0x00,0x00,0x00,0x38,0x00,0x00]
+; GFX1250-NEXT: v_cvt_pk_fp8_f16 v0, 0x3800 ; encoding: [0x00,0x00,0x72,0xd7,0xff,0x00,0x01,0x02,0x00,0x38,0x00,0x00]
; GFX1250-NEXT: s_set_pc_i64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
%cvt = tail call i16 @llvm.amdgcn.cvt.pk.fp8.f16(<2 x half> <half 0xH3800, half 0xH0>)
ret i16 %cvt
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.ll b/llvm/test/CodeGen/AMDGPU/code-size-estimate.ll
index 00da5613820cd..f7c6db60678f9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.ll
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.ll
@@ -145,13 +145,13 @@ define float @v_mul_f32_vop3_src_mods(float %x, float %y) {
; GFX10-LABEL: v_mul_f32_vop3_src_mods:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x00]
+; GFX10-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x02]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_mul_f32_vop3_src_mods:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x89,0xbf]
-; GFX11-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x00]
+; GFX11-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x02]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1200-LABEL: v_mul_f32_vop3_src_mods:
@@ -161,14 +161,14 @@ define float @v_mul_f32_vop3_src_mods(float %x, float %y) {
; GFX1200-NEXT: s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
; GFX1200-NEXT: s_wait_bvhcnt 0x0 ; encoding: [0x00,0x00,0xc3,0xbf]
; GFX1200-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1200-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x00]
+; GFX1200-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x02]
; GFX1200-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1250-LABEL: v_mul_f32_vop3_src_mods:
; GFX1250: ; %bb.0:
; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; encoding: [0x00,0x00,0xc8,0xbf]
; GFX1250-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1250-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x00]
+; GFX1250-NEXT: v_mul_f32_e64 v0, |v0|, v1 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0x03,0x02,0x02]
; GFX1250-NEXT: s_set_pc_i64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul = fmul float %fabs.x, %y
@@ -188,13 +188,13 @@ define float @v_mul_f32_vop3_src_mods_inline_imm(float %x, float %y) {
; GFX10-LABEL: v_mul_f32_vop3_src_mods_inline_imm:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x00]
+; GFX10-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x02]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_mul_f32_vop3_src_mods_inline_imm:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x89,0xbf]
-; GFX11-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x00]
+; GFX11-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x02]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1200-LABEL: v_mul_f32_vop3_src_mods_inline_imm:
@@ -204,14 +204,14 @@ define float @v_mul_f32_vop3_src_mods_inline_imm(float %x, float %y) {
; GFX1200-NEXT: s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
; GFX1200-NEXT: s_wait_bvhcnt 0x0 ; encoding: [0x00,0x00,0xc3,0xbf]
; GFX1200-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1200-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x00]
+; GFX1200-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x02]
; GFX1200-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1250-LABEL: v_mul_f32_vop3_src_mods_inline_imm:
; GFX1250: ; %bb.0:
; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; encoding: [0x00,0x00,0xc8,0xbf]
; GFX1250-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1250-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x00]
+; GFX1250-NEXT: v_mul_f32_e64 v0, |v0|, 4.0 ; encoding: [0x00,0x01,0x08,0xd5,0x00,0xed,0x01,0x02]
; GFX1250-NEXT: s_set_pc_i64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul = fmul float %fabs.x, 4.0
@@ -233,13 +233,13 @@ define float @v_mul_f32_vop3_src_mods_literal(float %x, float %y) {
; GFX10-LABEL: v_mul_f32_vop3_src_mods_literal:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x00,0x00,0x00,0xf6,0x42]
+; GFX10-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x02,0x00,0x00,0xf6,0x42]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_mul_f32_vop3_src_mods_literal:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x89,0xbf]
-; GFX11-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x00,0x00,0x00,0xf6,0x42]
+; GFX11-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x02,0x00,0x00,0xf6,0x42]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1200-LABEL: v_mul_f32_vop3_src_mods_literal:
@@ -249,14 +249,14 @@ define float @v_mul_f32_vop3_src_mods_literal(float %x, float %y) {
; GFX1200-NEXT: s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
; GFX1200-NEXT: s_wait_bvhcnt 0x0 ; encoding: [0x00,0x00,0xc3,0xbf]
; GFX1200-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1200-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x00,0x00,0x00,0xf6,0x42]
+; GFX1200-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x02,0x00,0x00,0xf6,0x42]
; GFX1200-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1250-LABEL: v_mul_f32_vop3_src_mods_literal:
; GFX1250: ; %bb.0:
; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; encoding: [0x00,0x00,0xc8,0xbf]
; GFX1250-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1250-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x00,0x00,0x00,0xf6,0x42]
+; GFX1250-NEXT: v_mul_f32_e64 v0, 0x42f60000, |v0| ; encoding: [0x00,0x02,0x08,0xd5,0xff,0x00,0x02,0x02,0x00,0x00,0xf6,0x42]
; GFX1250-NEXT: s_set_pc_i64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
%fabs.x = call float @llvm.fabs.f32(float %x)
%mul = fmul float %fabs.x, 123.0
@@ -280,7 +280,7 @@ define float @v_mul_f32_vop2_frame_index(float %x) {
; GFX10-LABEL: v_mul_f32_vop2_frame_index:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s32 ; encoding: [0x01,0x00,0x16,0xd5,0x85,0x40,0x00,0x00]
+; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s32 ; encoding: [0x01,0x00,0x16,0xd5,0x85,0x40,0x00,0x02]
; GFX10-NEXT: v_mul_f32_e32 v0, v1, v0 ; encoding: [0x01,0x01,0x00,0x10]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
@@ -609,13 +609,13 @@ define double @v_mul_f64_vop2_literal_32(double %x) {
; GFX10-LABEL: v_mul_f64_vop2_literal_32:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_mul_f64 v[0:1], 0x405ec000, v[0:1] ; encoding: [0x00,0x00,0x65,0xd5,0xff,0x00,0x02,0x00,0x00,0xc0,0x5e,0x40]
+; GFX10-NEXT: v_mul_f64 v[0:1], 0x405ec000, v[0:1] ; encoding: [0x00,0x00,0x65,0xd5,0xff,0x00,0x02,0x02,0x00,0xc0,0x5e,0x40]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_mul_f64_vop2_literal_32:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x89,0xbf]
-; GFX11-NEXT: v_mul_f64 v[0:1], 0x405ec000, v[0:1] ; encoding: [0x00,0x00,0x28,0xd7,0xff,0x00,0x02,0x00,0x00,0xc0,0x5e,0x40]
+; GFX11-NEXT: v_mul_f64 v[0:1], 0x405ec000, v[0:1] ; encoding: [0x00,0x00,0x28,0xd7,0xff,0x00,0x02,0x02,0x00,0xc0,0x5e,0x40]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1200-LABEL: v_mul_f64_vop2_literal_32:
@@ -658,7 +658,7 @@ define double @v_mul_f64_vop2_literal_64(double %x) {
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
; GFX10-NEXT: s_mov_b32 s4, 0x66666666 ; encoding: [0xff,0x03,0x84,0xbe,0x66,0x66,0x66,0x66]
; GFX10-NEXT: s_mov_b32 s5, 0x405ec666 ; encoding: [0xff,0x03,0x85,0xbe,0x66,0xc6,0x5e,0x40]
-; GFX10-NEXT: v_mul_f64 v[0:1], v[0:1], s[4:5] ; encoding: [0x00,0x00,0x65,0xd5,0x00,0x09,0x00,0x00]
+; GFX10-NEXT: v_mul_f64 v[0:1], v[0:1], s[4:5] ; encoding: [0x00,0x00,0x65,0xd5,0x00,0x09,0x00,0x02]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_mul_f64_vop2_literal_64:
@@ -667,7 +667,7 @@ define double @v_mul_f64_vop2_literal_64(double %x) {
; GFX11-NEXT: s_mov_b32 s0, 0x66666666 ; encoding: [0xff,0x00,0x80,0xbe,0x66,0x66,0x66,0x66]
; GFX11-NEXT: s_mov_b32 s1, 0x405ec666 ; encoding: [0xff,0x00,0x81,0xbe,0x66,0xc6,0x5e,0x40]
; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) ; encoding: [0x09,0x00,0x87,0xbf]
-; GFX11-NEXT: v_mul_f64 v[0:1], v[0:1], s[0:1] ; encoding: [0x00,0x00,0x28,0xd7,0x00,0x01,0x00,0x00]
+; GFX11-NEXT: v_mul_f64 v[0:1], v[0:1], s[0:1] ; encoding: [0x00,0x00,0x28,0xd7,0x00,0x01,0x00,0x02]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
;
; GFX1200-LABEL: v_mul_f64_vop2_literal_64:
@@ -710,14 +710,14 @@ define i64 @v_add_u64_vop2_literal_32(i64 %x) {
; GFX10-LABEL: v_add_u64_vop2_literal_32:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x8c,0xbf]
-; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x0f,0xd7,0xff,0x00,0x02,0x00,0x7b,0x00,0x00,0x00]
+; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x0f,0xd7,0xff,0x00,0x02,0x02,0x7b,0x00,0x00,0x00]
; GFX10-NEXT: v_add_co_ci_u32_e64 v1, null, 0, v1, vcc_lo ; encoding: [0x01,0x7d,0x28,0xd5,0x80,0x02,0xaa,0x01]
; GFX10-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x20,0x80,0xbe]
;
; GFX11-LABEL: v_add_u64_vop2_literal_32:
; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: [0x00,0x00,0x89,0xbf]
-; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x00,0xd7,0xff,0x00,0x02,0x00,0x7b,0x00,0x00,0x00]
+; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x00,0xd7,0xff,0x00,0x02,0x02,0x7b,0x00,0x00,0x00]
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) ; encoding: [0x01,0x00,0x87,0xbf]
; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, 0, v1, vcc_lo ; encoding: [0x01,0x7c,0x20,0xd5,0x80,0x02,0xaa,0x01]
; GFX11-NEXT: s_setpc_b64 s[30:31] ; encoding: [0x1e,0x48,0x80,0xbe]
@@ -729,7 +729,7 @@ define i64 @v_add_u64_vop2_literal_32(i64 %x) {
; GFX1200-NEXT: s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
; GFX1200-NEXT: s_wait_bvhcnt 0x0 ; encoding: [0x00,0x00,0xc3,0xbf]
; GFX1200-NEXT: s_wait_kmcnt 0x0 ; encoding: [0x00,0x00,0xc7,0xbf]
-; GFX1200-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x00,0xd7,0xff,0x00,0x02,0x00,0x7b,0x00,0x00,0x00]
+; GFX1200-NEXT: v_add_co_u32 v0, vcc_lo, 0x7b, v0 ; encoding: [0x00,0x6a,0x00,0xd7,0xff,0x00,0x02,0x02,0x7b,0x00,0x00,0x00]
; GFX1200-NEXT: s_wait_alu depctr_va_vcc(0) ; encoding: [0x9d,0xff,0x88,0xbf]
; GFX1200-NEXT: v_add_co_c...
[truncated]
|
Reimplement llvm#167777 using PostEncoderMethod. This has the advantage that the disassembler will tolerate any value in the unused scale_src2 field and it is more consistent with how other unused VALU source fields are handled since llvm#175753.
…m#176140) Reimplement llvm#167777 using PostEncoderMethod. This has the advantage that the disassembler will tolerate any value in the unused scale_src2 field and it is more consistent with how other unused VALU source fields are handled since llvm#175753.
…vm#175753) This has no functional effect since these source fields are unused, but it can avoid some conservative stalls due to these instructions looking like they read from an SGPR, since 0 is the encoding for s0. Using 0x80 is more benign since it is the encoding for inline immediate 0. Fixes: SWDEV-574953
…m#176140) Reimplement llvm#167777 using PostEncoderMethod. This has the advantage that the disassembler will tolerate any value in the unused scale_src2 field and it is more consistent with how other unused VALU source fields are handled since llvm#175753.
…vm#175753) This has no functional effect since these source fields are unused, but it can avoid some conservative stalls due to these instructions looking like they read from an SGPR, since 0 is the encoding for s0. Using 0x80 is more benign since it is the encoding for inline immediate 0. Fixes: SWDEV-574953
…m#176140) Reimplement llvm#167777 using PostEncoderMethod. This has the advantage that the disassembler will tolerate any value in the unused scale_src2 field and it is more consistent with how other unused VALU source fields are handled since llvm#175753.
This changes the fix from llvm#167777 to use the encoding for literal 0 instead of the encoding for vgpr0, to match new SP3 behaviour and for consistency with all other unused VALU source operands since llvm#175753.
|
I think there might be some additional cases for VOPC? |
The part that ignores unused fields in the disassembler should work for all encodings. The part that encodes them as literal0 (in the assembler/codegen) probably does not cover all encodings -- I have been meaning to do some systematic checking to find cases that I missed. |
|
Sorry - I made a mistake. I suspect it is all fine. |
) This changes the fix from llvm#167777 to use the encoding for literal 0 instead of the encoding for vgpr0, to match new SP3 behaviour and for consistency with all other unused VALU source operands since llvm#175753.
) This changes the fix from llvm#167777 to use the encoding for literal 0 instead of the encoding for vgpr0, to match new SP3 behaviour and for consistency with all other unused VALU source operands since llvm#175753.
This has no functional effect since these source fields are unused, but
it can avoid some conservative stalls due to these instructions looking
like they read from an SGPR, since 0 is the encoding for s0. Using 0x80
is more benign since it is the encoding for inline immediate 0.
Fixes: SWDEV-574953