Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Fix decoder for BF16 inline constants #82276

Merged
merged 4 commits into from
Feb 19, 2024

Conversation

rampitec
Copy link
Collaborator

@rampitec rampitec commented Feb 19, 2024

Fix #82039.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 19, 2024

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-mc

Author: Stanislav Mekhanoshin (rampitec)

Changes

Patch is 27.34 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82276.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+70-30)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h (+13-8)
  • (modified) llvm/lib/Target/AMDGPU/SIDefines.h (+9)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+60-50)
  • (modified) llvm/test/MC/Disassembler/AMDGPU/bf16_imm.txt (+3-5)
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index aece4402b44d90..3526e2c1ed7105 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -154,11 +154,12 @@ static DecodeStatus decodeSrcOp(MCInst &Inst, unsigned EncSize,
                                 AMDGPUDisassembler::OpWidthTy OpWidth,
                                 unsigned Imm, unsigned EncImm,
                                 bool MandatoryLiteral, unsigned ImmWidth,
+                                AMDGPU::OperandSemantics Sema,
                                 const MCDisassembler *Decoder) {
   assert(Imm < (1U << EncSize) && "Operand doesn't fit encoding!");
   auto DAsm = static_cast<const AMDGPUDisassembler *>(Decoder);
-  return addOperand(
-      Inst, DAsm->decodeSrcOp(OpWidth, EncImm, MandatoryLiteral, ImmWidth));
+  return addOperand(Inst, DAsm->decodeSrcOp(OpWidth, EncImm, MandatoryLiteral,
+                                            ImmWidth, Sema));
 }
 
 // Decoder for registers. Imm(7-bit) is number of register, uses decodeSrcOp to
@@ -174,7 +175,7 @@ template <AMDGPUDisassembler::OpWidthTy OpWidth>
 static DecodeStatus decodeAV10(MCInst &Inst, unsigned Imm, uint64_t /* Addr */,
                                const MCDisassembler *Decoder) {
   return decodeSrcOp(Inst, 10, OpWidth, Imm, Imm | AMDGPU::EncValues::IS_VGPR,
-                     false, 0, Decoder);
+                     false, 0, AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 // Decoder for Src(9-bit encoding) registers only.
@@ -182,7 +183,8 @@ template <AMDGPUDisassembler::OpWidthTy OpWidth>
 static DecodeStatus decodeSrcReg9(MCInst &Inst, unsigned Imm,
                                   uint64_t /* Addr */,
                                   const MCDisassembler *Decoder) {
-  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, false, 0, Decoder);
+  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, false, 0,
+                     AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 // Decoder for Src(9-bit encoding) AGPR, register number encoded in 9bits, set
@@ -191,7 +193,8 @@ static DecodeStatus decodeSrcReg9(MCInst &Inst, unsigned Imm,
 template <AMDGPUDisassembler::OpWidthTy OpWidth>
 static DecodeStatus decodeSrcA9(MCInst &Inst, unsigned Imm, uint64_t /* Addr */,
                                 const MCDisassembler *Decoder) {
-  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm | 512, false, 0, Decoder);
+  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm | 512, false, 0,
+                     AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 // Decoder for 'enum10' from decodeSrcOp, Imm{0-8} is 9-bit Src encoding
@@ -200,7 +203,8 @@ template <AMDGPUDisassembler::OpWidthTy OpWidth>
 static DecodeStatus decodeSrcAV10(MCInst &Inst, unsigned Imm,
                                   uint64_t /* Addr */,
                                   const MCDisassembler *Decoder) {
-  return decodeSrcOp(Inst, 10, OpWidth, Imm, Imm, false, 0, Decoder);
+  return decodeSrcOp(Inst, 10, OpWidth, Imm, Imm, false, 0,
+                     AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 // Decoder for RegisterOperands using 9-bit Src encoding. Operand can be
@@ -208,11 +212,13 @@ static DecodeStatus decodeSrcAV10(MCInst &Inst, unsigned Imm,
 // will be decoded and InstPrinter will report warning. Immediate will be
 // decoded into constant of size ImmWidth, should match width of immediate used
 // by OperandType (important for floating point types).
-template <AMDGPUDisassembler::OpWidthTy OpWidth, unsigned ImmWidth>
+template <AMDGPUDisassembler::OpWidthTy OpWidth, unsigned ImmWidth,
+          unsigned OperandSemantics>
 static DecodeStatus decodeSrcRegOrImm9(MCInst &Inst, unsigned Imm,
                                        uint64_t /* Addr */,
                                        const MCDisassembler *Decoder) {
-  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, false, ImmWidth, Decoder);
+  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, false, ImmWidth,
+                     (AMDGPU::OperandSemantics)OperandSemantics, Decoder);
 }
 
 // Decoder for Src(9-bit encoding) AGPR or immediate. Set Imm{9} to 1 (set acc)
@@ -222,14 +228,15 @@ static DecodeStatus decodeSrcRegOrImmA9(MCInst &Inst, unsigned Imm,
                                         uint64_t /* Addr */,
                                         const MCDisassembler *Decoder) {
   return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm | 512, false, ImmWidth,
-                     Decoder);
+                     AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 template <AMDGPUDisassembler::OpWidthTy OpWidth, unsigned ImmWidth>
 static DecodeStatus decodeSrcRegOrImmDeferred9(MCInst &Inst, unsigned Imm,
                                                uint64_t /* Addr */,
                                                const MCDisassembler *Decoder) {
-  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, true, ImmWidth, Decoder);
+  return decodeSrcOp(Inst, 9, OpWidth, Imm, Imm, true, ImmWidth,
+                     AMDGPU::OperandSemantics::INT, Decoder);
 }
 
 // Default decoders generated by tablegen: 'Decode<RegClass>RegisterClass'
@@ -394,8 +401,9 @@ static DecodeStatus decodeOperand_VSrc_f64(MCInst &Inst, unsigned Imm,
                                            const MCDisassembler *Decoder) {
   assert(Imm < (1 << 9) && "9-bit encoding");
   auto DAsm = static_cast<const AMDGPUDisassembler *>(Decoder);
-  return addOperand(
-      Inst, DAsm->decodeSrcOp(AMDGPUDisassembler::OPW64, Imm, false, 64, true));
+  return addOperand(Inst,
+                    DAsm->decodeSrcOp(AMDGPUDisassembler::OPW64, Imm, false, 64,
+                                      AMDGPU::OperandSemantics::FP64));
 }
 
 #define DECODE_SDWA(DecName) \
@@ -1414,7 +1422,7 @@ static int64_t getInlineImmVal64(unsigned Imm) {
   }
 }
 
-static int64_t getInlineImmVal16(unsigned Imm) {
+static int64_t getInlineImmValF16(unsigned Imm) {
   switch (Imm) {
   case 240:
     return 0x3800;
@@ -1439,9 +1447,40 @@ static int64_t getInlineImmVal16(unsigned Imm) {
   }
 }
 
-MCOperand AMDGPUDisassembler::decodeFPImmed(unsigned ImmWidth, unsigned Imm) {
-  assert(Imm >= AMDGPU::EncValues::INLINE_FLOATING_C_MIN
-      && Imm <= AMDGPU::EncValues::INLINE_FLOATING_C_MAX);
+static int64_t getInlineImmValBF16(unsigned Imm) {
+  switch (Imm) {
+  case 240:
+    return 0x3F00;
+  case 241:
+    return 0xBF00;
+  case 242:
+    return 0x3F80;
+  case 243:
+    return 0xBF80;
+  case 244:
+    return 0x4000;
+  case 245:
+    return 0xC000;
+  case 246:
+    return 0x4080;
+  case 247:
+    return 0xC080;
+  case 248: // 1 / (2 * PI)
+    return 0x3E22;
+  default:
+    llvm_unreachable("invalid fp inline imm");
+  }
+}
+
+static int64_t getInlineImmVal16(unsigned Imm, AMDGPU::OperandSemantics Sema) {
+  return (Sema == AMDGPU::OperandSemantics::BF16) ? getInlineImmValBF16(Imm)
+                                                  : getInlineImmValF16(Imm);
+}
+
+MCOperand AMDGPUDisassembler::decodeFPImmed(unsigned ImmWidth, unsigned Imm,
+                                            AMDGPU::OperandSemantics Sema) {
+  assert(Imm >= AMDGPU::EncValues::INLINE_FLOATING_C_MIN &&
+         Imm <= AMDGPU::EncValues::INLINE_FLOATING_C_MAX);
 
   // ToDo: case 248: 1/(2*PI) - is allowed only on VI
   // ImmWidth 0 is a default case where operand should not allow immediates.
@@ -1454,7 +1493,7 @@ MCOperand AMDGPUDisassembler::decodeFPImmed(unsigned ImmWidth, unsigned Imm) {
   case 64:
     return MCOperand::createImm(getInlineImmVal64(Imm));
   case 16:
-    return MCOperand::createImm(getInlineImmVal16(Imm));
+    return MCOperand::createImm(getInlineImmVal16(Imm, Sema));
   default:
     llvm_unreachable("implement me");
   }
@@ -1568,7 +1607,8 @@ int AMDGPUDisassembler::getTTmpIdx(unsigned Val) const {
 
 MCOperand AMDGPUDisassembler::decodeSrcOp(const OpWidthTy Width, unsigned Val,
                                           bool MandatoryLiteral,
-                                          unsigned ImmWidth, bool IsFP) const {
+                                          unsigned ImmWidth,
+                                          AMDGPU::OperandSemantics Sema) const {
   using namespace AMDGPU::EncValues;
 
   assert(Val < 1024); // enum10
@@ -1581,14 +1621,13 @@ MCOperand AMDGPUDisassembler::decodeSrcOp(const OpWidthTy Width, unsigned Val,
                                    : getVgprClassId(Width), Val - VGPR_MIN);
   }
   return decodeNonVGPRSrcOp(Width, Val & 0xFF, MandatoryLiteral, ImmWidth,
-                            IsFP);
+                            Sema);
 }
 
-MCOperand AMDGPUDisassembler::decodeNonVGPRSrcOp(const OpWidthTy Width,
-                                                 unsigned Val,
-                                                 bool MandatoryLiteral,
-                                                 unsigned ImmWidth,
-                                                 bool IsFP) const {
+MCOperand
+AMDGPUDisassembler::decodeNonVGPRSrcOp(const OpWidthTy Width, unsigned Val,
+                                       bool MandatoryLiteral, unsigned ImmWidth,
+                                       AMDGPU::OperandSemantics Sema) const {
   // Cases when Val{8} is 1 (vgpr, agpr or true 16 vgpr) should have been
   // decoded earlier.
   assert(Val < (1 << 8) && "9-bit Src encoding when Val{8} is 0");
@@ -1609,14 +1648,14 @@ MCOperand AMDGPUDisassembler::decodeNonVGPRSrcOp(const OpWidthTy Width,
     return decodeIntImmed(Val);
 
   if (INLINE_FLOATING_C_MIN <= Val && Val <= INLINE_FLOATING_C_MAX)
-    return decodeFPImmed(ImmWidth, Val);
+    return decodeFPImmed(ImmWidth, Val, Sema);
 
   if (Val == LITERAL_CONST) {
     if (MandatoryLiteral)
       // Keep a sentinel value for deferred setting
       return MCOperand::createImm(LITERAL_CONST);
     else
-      return decodeLiteralConstant(IsFP && ImmWidth == 64);
+      return decodeLiteralConstant(Sema == AMDGPU::OperandSemantics::FP64);
   }
 
   switch (Width) {
@@ -1715,7 +1754,8 @@ MCOperand AMDGPUDisassembler::decodeSpecialReg64(unsigned Val) const {
 
 MCOperand AMDGPUDisassembler::decodeSDWASrc(const OpWidthTy Width,
                                             const unsigned Val,
-                                            unsigned ImmWidth) const {
+                                            unsigned ImmWidth,
+                                            AMDGPU::OperandSemantics Sema) const {
   using namespace AMDGPU::SDWA;
   using namespace AMDGPU::EncValues;
 
@@ -1746,7 +1786,7 @@ MCOperand AMDGPUDisassembler::decodeSDWASrc(const OpWidthTy Width,
       return decodeIntImmed(SVal);
 
     if (INLINE_FLOATING_C_MIN <= SVal && SVal <= INLINE_FLOATING_C_MAX)
-      return decodeFPImmed(ImmWidth, SVal);
+      return decodeFPImmed(ImmWidth, SVal, Sema);
 
     return decodeSpecialReg32(SVal);
   } else if (STI.hasFeature(AMDGPU::FeatureVolcanicIslands)) {
@@ -1756,11 +1796,11 @@ MCOperand AMDGPUDisassembler::decodeSDWASrc(const OpWidthTy Width,
 }
 
 MCOperand AMDGPUDisassembler::decodeSDWASrc16(unsigned Val) const {
-  return decodeSDWASrc(OPW16, Val, 16);
+  return decodeSDWASrc(OPW16, Val, 16, AMDGPU::OperandSemantics::FP16);
 }
 
 MCOperand AMDGPUDisassembler::decodeSDWASrc32(unsigned Val) const {
-  return decodeSDWASrc(OPW32, Val, 32);
+  return decodeSDWASrc(OPW32, Val, 32, AMDGPU::OperandSemantics::FP32);
 }
 
 MCOperand AMDGPUDisassembler::decodeSDWAVopcDst(unsigned Val) const {
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index 02feaf553c0c45..3142b8a14a4dd5 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H
 #define LLVM_LIB_TARGET_AMDGPU_DISASSEMBLER_AMDGPUDISASSEMBLER_H
 
+#include "SIDefines.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/MC/MCDisassembler/MCDisassembler.h"
@@ -231,25 +232,29 @@ class AMDGPUDisassembler : public MCDisassembler {
   unsigned getTtmpClassId(const OpWidthTy Width) const;
 
   static MCOperand decodeIntImmed(unsigned Imm);
-  static MCOperand decodeFPImmed(unsigned ImmWidth, unsigned Imm);
+  static MCOperand decodeFPImmed(unsigned ImmWidth, unsigned Imm,
+                                 AMDGPU::OperandSemantics Sema);
 
   MCOperand decodeMandatoryLiteralConstant(unsigned Imm) const;
   MCOperand decodeLiteralConstant(bool ExtendFP64) const;
 
-  MCOperand decodeSrcOp(const OpWidthTy Width, unsigned Val,
-                        bool MandatoryLiteral = false, unsigned ImmWidth = 0,
-                        bool IsFP = false) const;
+  MCOperand decodeSrcOp(
+      const OpWidthTy Width, unsigned Val, bool MandatoryLiteral = false,
+      unsigned ImmWidth = 0,
+      AMDGPU::OperandSemantics Sema = AMDGPU::OperandSemantics::INT) const;
 
-  MCOperand decodeNonVGPRSrcOp(const OpWidthTy Width, unsigned Val,
-                               bool MandatoryLiteral = false,
-                               unsigned ImmWidth = 0, bool IsFP = false) const;
+  MCOperand decodeNonVGPRSrcOp(
+      const OpWidthTy Width, unsigned Val, bool MandatoryLiteral = false,
+      unsigned ImmWidth = 0,
+      AMDGPU::OperandSemantics Sema = AMDGPU::OperandSemantics::INT) const;
 
   MCOperand decodeVOPDDstYOp(MCInst &Inst, unsigned Val) const;
   MCOperand decodeSpecialReg32(unsigned Val) const;
   MCOperand decodeSpecialReg64(unsigned Val) const;
 
   MCOperand decodeSDWASrc(const OpWidthTy Width, unsigned Val,
-                          unsigned ImmWidth = 0) const;
+                          unsigned ImmWidth,
+                          AMDGPU::OperandSemantics Sema) const;
   MCOperand decodeSDWASrc16(unsigned Val) const;
   MCOperand decodeSDWASrc32(unsigned Val) const;
   MCOperand decodeSDWAVopcDst(unsigned Val) const;
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index c4dd0d41219de0..f8b2dd825ac7d9 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -269,6 +269,15 @@ enum OperandType : unsigned {
   OPERAND_KIMM_LAST = OPERAND_KIMM16
 
 };
+
+// Should be in sync with the OperandSematics defined in SIRegisterInfo.td
+enum OperandSemantics : unsigned {
+  INT = 0,
+  FP16 = 1,
+  BF16 = 2,
+  FP32 = 3,
+  FP64 = 4,
+};
 }
 
 // Input operand modifiers bit-masks
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index a19cc3c7dae407..af7994ad6f6c68 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -1108,24 +1108,34 @@ class RegOrImmOperand <RegisterClass RegClass, string OperandTypeName>
     let ParserMatchClass = RegImmMatcher<!subst("_Deferred", "", NAME)>;
 }
 
+// Should be in sync with the OperandSematics defined in SIDefines.h
+def OperandSematics {
+  int INT = 0;
+  int FP16 = 1;
+  int BF16 = 2;
+  int FP32 = 3;
+  int FP64 = 4;
+}
+
 //===----------------------------------------------------------------------===//
 //  SSrc_* Operands with an SGPR or a 32-bit immediate
 //===----------------------------------------------------------------------===//
 
 class SrcRegOrImm9<RegisterClass regClass, string opWidth, string operandType,
-                   int immWidth> : RegOrImmOperand<regClass, operandType> {
+                   int immWidth, int OperandSematics>
+    : RegOrImmOperand<regClass, operandType> {
   let DecoderMethod = "decodeSrcRegOrImm9<AMDGPUDisassembler::" # opWidth #
-                      ", " # immWidth # ">";
+                      ", " # immWidth # ", " # OperandSematics # ">";
 }
 
-def SSrc_b16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT16", 16>;
-def SSrc_bf16: SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_BF16", 16>;
-def SSrc_f16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP16", 16>;
-def SSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT32", 32>;
-def SSrc_f32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP32", 32>;
-def SSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_IMM_INT64", 64>;
+def SSrc_b16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT16", 16, OperandSematics.INT>;
+def SSrc_bf16: SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_BF16", 16, OperandSematics.BF16>;
+def SSrc_f16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP16", 16, OperandSematics.FP16>;
+def SSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT32", 32, OperandSematics.INT>;
+def SSrc_f32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP32", 32, OperandSematics.FP32>;
+def SSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_IMM_INT64", 64, OperandSematics.INT>;
 
-def SSrcOrLds_b32 : SrcRegOrImm9 <SRegOrLds_32, "OPW32", "OPERAND_REG_IMM_INT32", 32>;
+def SSrcOrLds_b32 : SrcRegOrImm9 <SRegOrLds_32, "OPW32", "OPERAND_REG_IMM_INT32", 32, OperandSematics.INT>;
 
 //===----------------------------------------------------------------------===//
 //  SSrc_32_Deferred Operands with an SGPR or a 32-bit immediate for use with
@@ -1145,17 +1155,17 @@ def SSrc_f32_Deferred : SrcRegOrImmDeferred9<SReg_32, "OPW32", "OPERAND_REG_IMM_
 //  SCSrc_* Operands with an SGPR or a inline constant
 //===----------------------------------------------------------------------===//
 
-def SCSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_INLINE_C_INT32", 32>;
-def SCSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_INLINE_C_INT64", 64>;
+def SCSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_INLINE_C_INT32", 32, OperandSematics.INT>;
+def SCSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_INLINE_C_INT64", 64, OperandSematics.INT>;
 
 //===----------------------------------------------------------------------===//
 //  VSrc_* Operands with an SGPR, VGPR or a 32-bit immediate
 //===----------------------------------------------------------------------===//
 
 // The current and temporary future default used case for VOP3.
-def VSrc_b16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_INT16", 16>;
-def VSrc_bf16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_BF16", 16>;
-def VSrc_f16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_FP16", 16>;
+def VSrc_b16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_INT16", 16, OperandSematics.INT>;
+def VSrc_bf16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_BF16", 16, OperandSematics.BF16>;
+def VSrc_f16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_FP16", 16, OperandSematics.FP16>;
 
 // True16 VOP3 operands.
 def VSrcT_b16 : RegOrImmOperand <VS_16, "OPERAND_REG_IMM_INT16"> {
@@ -1187,21 +1197,21 @@ def VSrcT_f16_Lo128 : RegOrImmOperand <VS_16_Lo128, "OPERAND_REG_IMM_FP16"> {
 
 // The current and temporary future default used case for fake VOP1/2/C.
 // For VOP1,2,C True16 instructions. _Lo128 use first 128 32-bit VGPRs only.
-def VSrcFake16_b16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_INT16", 16>;
-def VSrcFake16_bf16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_BF16", 16>;
-def VSrcFake16_f16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_FP16", 16>;
-
-def VSrc_b32 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_INT32", 32>;
-def VSrc_f32 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_FP32", 32>;
-def VSrc_v2b16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_V2INT16", 32>;
-def VSrc_v2bf16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_V2BF16", 16>;
-def VSrc_v2f16 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_V2FP16", 16>;
-def VSrc_b64 : SrcRegOrImm9 <VS_64, "OPW64", "OPERAND_REG_IMM_INT64", 64>;
-def VSrc_f64 : SrcRegOrImm9 <VS_64, "OPW64", "OPERAND_REG_IMM_FP64", 64> {
+def VSrcFake16_b16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_INT16", 16, OperandSematics.INT>;
+def VSrcFake16_bf16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_BF16", 16, OperandSematics.BF16>;
+def VSrcFake16_f16_Lo128 : SrcRegOrImm9 <VS_32_Lo128, "OPW16", "OPERAND_REG_IMM_FP16", 16, OperandSematics.FP16>;
+
+def VSrc_b32 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_INT32", 32, OperandSematics.INT>;
+def VSrc_f32 : SrcRegOrImm9 <VS_32, "OPW32", "OPERAND_REG_IMM_FP32", 32, OperandSematics.FP32>;
+def VSrc_v2b16 : SrcRegOrImm9 <VS_32, "OPW32", "OPE...
[truncated]

@rampitec
Copy link
Collaborator Author

There are more operand types to handle this way and all inline constants need tests, but this fixes immediate regression.

Copy link

github-actions bot commented Feb 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

These are NCF, at the moment at least. There are currently no such
instructions. It is here for completeness.
pi/2 seems to be incorrectly rounded in the inst printer,
but this is another patch.
@rampitec
Copy link
Collaborator Author

I believe this patch is ready and functional.

def SSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT32", 32>;
def SSrc_f32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP32", 32>;
def SSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_IMM_INT64", 64>;
def SSrc_b16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT16", 16, OperandSematics.INT>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentionally that you don't infer the format from the operand type, OPERAND_REG_IMM_INT16, which seems possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All ints have the same semantics - 2-complement. No need to create more template specializations. The real problem is to distinguish f16 and bf16.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said it may have sense to drop immwidth and the rest, and keep only OperandSemantics, extending it to all scalar types, possibly renaming the whole thing. I didn't explore it, but I guess it may have its pluses and minuses. It is a little iffy when it goes beyond scalar types. This would be really a different change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

llvm/lib/Target/AMDGPU/SIRegisterInfo.td Outdated Show resolved Hide resolved
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

int FP16 = 1;
int BF16 = 2;
int FP32 = 3;
int FP64 = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for now it only matters for 16-bit literal but we can keep them just in case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but really semantics of fp32 and fp64 is different. I also think it may allow some further simplifications in the asm parser and decoder.

@rampitec rampitec merged commit 13e6495 into llvm:main Feb 19, 2024
4 checks passed
@rampitec rampitec deleted the bf16-decoder branch February 19, 2024 21:45
v_dot2_bf16_bf16 v2, v0, -4.0, v2
// CHECK: v_dot2_bf16_bf16 v2, v0, -4.0, v2 ; encoding: [0x02,0x00,0x67,0xd6,0x00,0xef,0x09,0x04]

// FIXME: pi/2 rounded value is incorrect in the inst printer.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT this value: 0.15915494 cannot be represented as bf16. The actual hex value is 0x3e22 which is 0.158203125. The next representable value is 0x3e23 which is 0.1591796875.

We can accept non-representable FP32 value of 0.15915494 out of courtesy, but that is a separate patch.

int FP16 = 1;
int BF16 = 2;
int FP32 = 3;
int FP64 = 4;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but really semantics of fp32 and fp64 is different. I also think it may allow some further simplifications in the asm parser and decoder.

def SSrc_b32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT32", 32>;
def SSrc_f32 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_FP32", 32>;
def SSrc_b64 : SrcRegOrImm9 <SReg_64, "OPW64", "OPERAND_REG_IMM_INT64", 64>;
def SSrc_b16 : SrcRegOrImm9 <SReg_32, "OPW32", "OPERAND_REG_IMM_INT16", 16, OperandSematics.INT>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said it may have sense to drop immwidth and the rest, and keep only OperandSemantics, extending it to all scalar types, possibly renaming the whole thing. I didn't explore it, but I guess it may have its pluses and minuses. It is a little iffy when it goes beyond scalar types. This would be really a different change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU][Disassembler] getInlineImmVal16 can't tell BF16 and FP16 apart when it comes to inline literal
4 participants