Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Change the representation of double literals in operands #68740

Merged
merged 17 commits into from
Oct 12, 2023

Conversation

rampitec
Copy link
Collaborator

A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the #67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.

A 64-bit literal can be used as a 32-bit zero or sign extended
operand. In case of double zeroes are added to the low 32 bits.
Currently asm parser stores only high 32 bits of a double into
an operand. To support codegen as requested by the
llvm#67781 we need to
change the representation to store a full 64-bit value so that
codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests
and asm kernels. We allow to use short hex strings to represent
only a high 32 bit of a double value as a valid literal.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 10, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Stanislav Mekhanoshin (rampitec)

Changes

A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the #67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.


Full diff: https://github.com/llvm/llvm-project/pull/68740.diff

9 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+18-3)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+21-7)
  • (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h (+6-3)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp (+8-4)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp (+3)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+3-1)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+7)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+3)
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 35656bcaea1af7f..0553d3f20b21c56 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -2140,9 +2140,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, int64_t Val, bool ApplyMo
           const_cast<AMDGPUAsmParser *>(AsmParser)->Warning(Inst.getLoc(),
           "Can't encode literal as exact 64-bit floating-point operand. "
           "Low 32-bits will be set to zero");
+          Val &= 0xffffffff00000000u;
         }
 
-        Inst.addOperand(MCOperand::createImm(Literal.lshr(32).getZExtValue()));
+        Inst.addOperand(MCOperand::createImm(Val));
         setImmKindLiteral();
         return;
       }
@@ -2241,7 +2242,10 @@ void AMDGPUOperand::addLiteralImmOperand(MCInst &Inst, int64_t Val, bool ApplyMo
       return;
     }
 
-    Inst.addOperand(MCOperand::createImm(Lo_32(Val)));
+    if (isInt<32>(Val) || isUInt<32>(Val))
+      Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);
+
+    Inst.addOperand(MCOperand::createImm(Val));
     setImmKindLiteral();
     return;
 
@@ -4297,7 +4301,18 @@ bool AMDGPUAsmParser::validateVOPLiteral(const MCInst &Inst,
       continue;
 
     if (MO.isImm() && !isInlineConstant(Inst, OpIdx)) {
-      uint32_t Value = static_cast<uint32_t>(MO.getImm());
+      uint64_t Value = static_cast<uint64_t>(MO.getImm());
+      bool IsFP = AMDGPU::isSISrcFPOperand(Desc, OpIdx);
+      bool IsValid32Op = AMDGPU::isValid32BitLiteral(Value, IsFP);
+
+      if (!IsValid32Op && !isInt<32>(Value) && !isUInt<32>(Value)) {
+        Error(getLitLoc(Operands), "invalid operand for instruction");
+        return false;
+      }
+
+      if (IsFP && IsValid32Op)
+        Value = Hi_32(Value);
+
       if (NumLiterals == 0 || LiteralValue != Value) {
         LiteralValue = Value;
         ++NumLiterals;
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 439762bc6caf786..8c49c9a9c87772e 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -378,6 +378,15 @@ static DecodeStatus decodeOperand_AVLdSt_Any(MCInst &Inst, unsigned Imm,
   return addOperand(Inst, DAsm->decodeSrcOp(Opw, Imm | 256));
 }
 
+static DecodeStatus
+decodeOperand_VSrc_f64(MCInst &Inst, unsigned Imm, uint64_t Addr,
+                       const MCDisassembler *Decoder) {
+  assert(Imm < (1 << 9) && "9-bit encoding");
+  auto DAsm = static_cast<const AMDGPUDisassembler *>(Decoder);
+  return addOperand(Inst, DAsm->decodeSrcOp(AMDGPUDisassembler::OPW64, Imm,
+                                            false, 64, true));
+}
+
 static DecodeStatus
 DecodeAVLdSt_32RegisterClass(MCInst &Inst, unsigned Imm, uint64_t Addr,
                              const MCDisassembler *Decoder) {
@@ -1218,7 +1227,7 @@ AMDGPUDisassembler::decodeMandatoryLiteralConstant(unsigned Val) const {
   return MCOperand::createImm(Literal);
 }
 
-MCOperand AMDGPUDisassembler::decodeLiteralConstant() const {
+MCOperand AMDGPUDisassembler::decodeLiteralConstant(bool ExtendFP64) const {
   // For now all literal constants are supposed to be unsigned integer
   // ToDo: deal with signed/unsigned 64-bit integer constants
   // ToDo: deal with float/double constants
@@ -1228,9 +1237,11 @@ MCOperand AMDGPUDisassembler::decodeLiteralConstant() const {
                         Twine(Bytes.size()));
     }
     HasLiteral = true;
-    Literal = eatBytes<uint32_t>(Bytes);
+    Literal = Literal64 = eatBytes<uint32_t>(Bytes);
+    if (ExtendFP64)
+      Literal64 <<= 32;
   }
-  return MCOperand::createImm(Literal);
+  return MCOperand::createImm(ExtendFP64 ? Literal64 : Literal);
 }
 
 MCOperand AMDGPUDisassembler::decodeIntImmed(unsigned Imm) {
@@ -1447,7 +1458,8 @@ int AMDGPUDisassembler::getTTmpIdx(unsigned Val) const {
 
 MCOperand AMDGPUDisassembler::decodeSrcOp(const OpWidthTy Width, unsigned Val,
                                           bool MandatoryLiteral,
-                                          unsigned ImmWidth) const {
+                                          unsigned ImmWidth,
+                                          bool IsFP) const {
   using namespace AMDGPU::EncValues;
 
   assert(Val < 1024); // enum10
@@ -1459,13 +1471,15 @@ MCOperand AMDGPUDisassembler::decodeSrcOp(const OpWidthTy Width, unsigned Val,
     return createRegOperand(IsAGPR ? getAgprClassId(Width)
                                    : getVgprClassId(Width), Val - VGPR_MIN);
   }
-  return decodeNonVGPRSrcOp(Width, Val & 0xFF, MandatoryLiteral, ImmWidth);
+  return decodeNonVGPRSrcOp(Width, Val & 0xFF, MandatoryLiteral, ImmWidth,
+                            IsFP);
 }
 
 MCOperand AMDGPUDisassembler::decodeNonVGPRSrcOp(const OpWidthTy Width,
                                                  unsigned Val,
                                                  bool MandatoryLiteral,
-                                                 unsigned ImmWidth) const {
+                                                 unsigned ImmWidth,
+                                                 bool IsFP) const {
   // Cases when Val{8} is 1 (vgpr, agpr or true 16 vgpr) should have been
   // decoded earlier.
   assert(Val < (1 << 8) && "9-bit Src encoding when Val{8} is 0");
@@ -1493,7 +1507,7 @@ MCOperand AMDGPUDisassembler::decodeNonVGPRSrcOp(const OpWidthTy Width,
       // Keep a sentinel value for deferred setting
       return MCOperand::createImm(LITERAL_CONST);
     else
-      return decodeLiteralConstant();
+      return decodeLiteralConstant(IsFP && ImmWidth == 64);
   }
 
   switch (Width) {
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index 5f3b277d577ff7c..865db2b26307b43 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -97,6 +97,7 @@ class AMDGPUDisassembler : public MCDisassembler {
   const unsigned TargetMaxInstBytes;
   mutable ArrayRef<uint8_t> Bytes;
   mutable uint32_t Literal;
+  mutable uint64_t Literal64;
   mutable bool HasLiteral;
   mutable std::optional<bool> EnableWavefrontSize32;
 
@@ -229,15 +230,17 @@ class AMDGPUDisassembler : public MCDisassembler {
   static MCOperand decodeFPImmed(unsigned ImmWidth, unsigned Imm);
 
   MCOperand decodeMandatoryLiteralConstant(unsigned Imm) const;
-  MCOperand decodeLiteralConstant() const;
+  MCOperand decodeLiteralConstant(bool ExtendFP64) const;
 
   MCOperand decodeSrcOp(const OpWidthTy Width, unsigned Val,
                         bool MandatoryLiteral = false,
-                        unsigned ImmWidth = 0) const;
+                        unsigned ImmWidth = 0,
+                        bool IsFP = false) const;
 
   MCOperand decodeNonVGPRSrcOp(const OpWidthTy Width, unsigned Val,
                                bool MandatoryLiteral = false,
-                               unsigned ImmWidth = 0) const;
+                               unsigned ImmWidth = 0,
+                               bool IsFP = false) const;
 
   MCOperand decodeVOPDDstYOp(MCInst &Inst, unsigned Val) const;
   MCOperand decodeSpecialReg32(unsigned Val) const;
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index ad4c48a8d65581a..40e92f00a9e52a6 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -426,7 +426,7 @@ void AMDGPUInstPrinter::printImmediate32(uint32_t Imm,
 
 void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
                                          const MCSubtargetInfo &STI,
-                                         raw_ostream &O) {
+                                         raw_ostream &O, bool IsFP) {
   int64_t SImm = static_cast<int64_t>(Imm);
   if (SImm >= -16 && SImm <= 64) {
     O << SImm;
@@ -454,6 +454,8 @@ void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
   else if (Imm == 0x3fc45f306dc9c882 &&
            STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
     O << "0.15915494309189532";
+  else if (IsFP && AMDGPU::isValid32BitLiteral(Imm, true))
+    O << formatHex(static_cast<uint64_t>(Hi_32(Imm)));
   else {
     assert(isUInt<32>(Imm) || isInt<32>(Imm));
 
@@ -605,11 +607,13 @@ void AMDGPUInstPrinter::printRegularOperand(const MCInst *MI, unsigned OpNo,
       printImmediate32(Op.getImm(), STI, O);
       break;
     case AMDGPU::OPERAND_REG_IMM_INT64:
-    case AMDGPU::OPERAND_REG_IMM_FP64:
     case AMDGPU::OPERAND_REG_INLINE_C_INT64:
+      printImmediate64(Op.getImm(), STI, O, false);
+      break;
+    case AMDGPU::OPERAND_REG_IMM_FP64:
     case AMDGPU::OPERAND_REG_INLINE_C_FP64:
     case AMDGPU::OPERAND_REG_INLINE_AC_FP64:
-      printImmediate64(Op.getImm(), STI, O);
+      printImmediate64(Op.getImm(), STI, O, true);
       break;
     case AMDGPU::OPERAND_REG_INLINE_C_INT16:
     case AMDGPU::OPERAND_REG_INLINE_AC_INT16:
@@ -671,7 +675,7 @@ void AMDGPUInstPrinter::printRegularOperand(const MCInst *MI, unsigned OpNo,
       if (RCBits == 32)
         printImmediate32(llvm::bit_cast<uint32_t>((float)Value), STI, O);
       else if (RCBits == 64)
-        printImmediate64(llvm::bit_cast<uint64_t>(Value), STI, O);
+        printImmediate64(llvm::bit_cast<uint64_t>(Value), STI, O, true);
       else
         llvm_unreachable("Invalid register class size");
     }
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 3b14faab136b35a..dc83547a4afe049 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -91,7 +91,7 @@ class AMDGPUInstPrinter : public MCInstPrinter {
   void printImmediate32(uint32_t Imm, const MCSubtargetInfo &STI,
                         raw_ostream &O);
   void printImmediate64(uint64_t Imm, const MCSubtargetInfo &STI,
-                        raw_ostream &O);
+                        raw_ostream &O, bool IsFP);
   void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
                     raw_ostream &O);
   void printRegularOperand(const MCInst *MI, unsigned OpNo,
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index 57ccb523c70eee6..d93f747bf6f0a64 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -411,6 +411,9 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
     } else if (!Op.isExpr()) // Exprs will be replaced with a fixup value.
       llvm_unreachable("Must be immediate or expr");
 
+    if (Desc.operands()[i].OperandType == AMDGPU::OPERAND_REG_IMM_FP64)
+      Imm = Hi_32(Imm);
+
     support::endian::write<uint32_t>(CB, Imm, support::endianness::little);
 
     // Only one literal value allowed
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index c3c5bfae405aa45..ea06e85fb400c1b 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -1263,7 +1263,9 @@ def VSrc_f32 : RegOrF32 <"VS_32", "OPERAND_REG_IMM">;
 def VSrc_v2b16 : RegOrV2B16 <"VS_32", "OPERAND_REG_IMM">;
 def VSrc_v2f16 : RegOrV2F16 <"VS_32", "OPERAND_REG_IMM">;
 def VSrc_b64 : RegOrB64 <"VS_64", "OPERAND_REG_IMM">;
-def VSrc_f64 : RegOrF64 <"VS_64", "OPERAND_REG_IMM">;
+def VSrc_f64 : RegOrF64 <"VS_64", "OPERAND_REG_IMM"> {
+  let DecoderMethod = "decodeOperand_VSrc_f64";
+}
 def VSrc_v2b32 : RegOrV2B32 <"VS_64", "OPERAND_REG_IMM">;
 def VSrc_v2f32 : RegOrV2F32 <"VS_64", "OPERAND_REG_IMM">;
 
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 6d0ad763d9e6cc1..e7907b28abedf9d 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -2519,6 +2519,13 @@ bool isFoldableLiteralV216(int32_t Literal, bool HasInv2Pi) {
   return Lo16 == Hi16;
 }
 
+bool isValid32BitLiteral(uint64_t Val, bool IsFP) {
+  if (IsFP)
+    return !(Val & 0xffffffffu);
+
+  return isUInt<32>(Val) || isInt<32>(Val);
+}
+
 bool isArgPassedInSGPR(const Argument *A) {
   const Function *F = A->getParent();
 
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 297a69f54d63721..fbe9adfd74fa9c6 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -1290,6 +1290,9 @@ bool isInlinableIntLiteralV216(int32_t Literal);
 LLVM_READNONE
 bool isFoldableLiteralV216(int32_t Literal, bool HasInv2Pi);
 
+LLVM_READNONE
+bool isValid32BitLiteral(uint64_t Val, bool IsFP);
+
 bool isArgPassedInSGPR(const Argument *Arg);
 
 bool isArgPassedInSGPR(const CallBase *CB, unsigned ArgNo);

@github-actions
Copy link

github-actions bot commented Oct 10, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 2245 to 2246
if (isInt<32>(Val) || isUInt<32>(Val))
Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fully make sense to me. Ignoring the FP case, it simplifies to:

  if (isInt<32>(Val) || isUInt<32>(Val))
    Val = Lo_32(Val);

Why would we only do the Lo_32 if Val is already a (signed or unsigned) 32 bit value???

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mask potential 0xffffffff in the high32 bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there three interesting cases?

  if (!AMDGPU::isSISrcFPOperand(InstDesc, OpNum)) // integer operand
    Val = Lo_32(Val);
  else if (isInt<32>(Val) || isUInt<32>(Val))) // small hex literal for double operand
    Val = Val << 32;
  else // large hex literal for double operand
    do nothing;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lo_32 for integers is used because inst printer would just use O << formatHex(static_cast<uint64_t>(Imm)); and since we have full 64-bits there the output will look like this if the value is unmasked:

llvm-mc -arch=amdgcn -mcpu=tonga -show-encoding  <<< 's_mov_b64_e32 s[0:1], -54321'
        s_mov_b64 s[0:1], 0xffffffffffff2bcf    ; encoding: [0xff,0x01,0x80,0xbe,0xcf,0x2b,0xff,0xff]

This is technically correct but inconvenient and not what we are expecting in our tests too:

// SICI: s_mov_b64 s[0:1], 0xffff2bcf ; encoding: [0xff,0x04,0x80,0xbe,0xcf,0x2b,0xff,0xff]
// GFX89: s_mov_b64 s[0:1], 0xffff2bcf ; encoding: [0xff,0x01,0x80,0xbe,0xcf,0x2b,0xff,0xff]
s_mov_b64_e32 s[0:1], -54321

For the large FP literals we do the validation:

llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding  <<< 'v_add_f64 v[0:1], 100.1, v[0:1]'
<unknown>:0: warning: Can't encode literal as exact 64-bit floating-point operand. Low 32-bits will be set to zero
        v_add_f64 v[0:1], 0x40590666, v[0:1]    ; encoding: [0x00,0x00,0x64,0xd5,0xff,0x00,0x02,0x00,0x66,0x06,0x59,0x40]

But we need to truncate the the value as in the warning. If we don't there will be an error following the warning and this would change current behavior:

<unknown>:0: warning: Can't encode literal as exact 64-bit floating-point operand. Low 32-bits will be set to zero
<stdin>:1:19: error: invalid operand for instruction
v_add_f64 v[0:1], 100.1, v[0:1]

This is for the FP literal. For the hex string the situation just does not happen and we are getting the error instead because the constant does not pass isLiteralImm() check:

llvm-mc -arch=amdgcn -mcpu=gfx1030 -show-encoding  <<< 'v_add_f64 v[0:1], 0xa12345678, v[0:1]'
<stdin>:1:19: error: invalid operand for instruction
v_add_f64 v[0:1], 0xa12345678, v[0:1]
                  ^

Note that I have no intent to introduce any visible changes, just to change the internal operand representation and keep all existing tests passing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find the Int/UInt check confusing here. How about:

Suggested change
if (isInt<32>(Val) || isUInt<32>(Val))
Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);
if (AMDGPU::isSISrcFPOperand(InstDesc, OpNum))
Val <<= 32;
else
Val = Lo_32(Val);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change

I literally hate gitbub. I cannot even get a clue what file it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find the Int/UInt check confusing here. How about:

Done.

@@ -4297,7 +4301,18 @@ bool AMDGPUAsmParser::validateVOPLiteral(const MCInst &Inst,
continue;

if (MO.isImm() && !isInlineConstant(Inst, OpIdx)) {
uint32_t Value = static_cast<uint32_t>(MO.getImm());
uint64_t Value = static_cast<uint64_t>(MO.getImm());
bool IsFP = AMDGPU::isSISrcFPOperand(Desc, OpIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this either. Are you trying to allow cases where the same 32-bit literal is used both as the high 32 bits of a double, and as a normal 32 bit integer, in the same instruction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is allowed. Encoded literal is 32-bit and can be used as both integer and double. For example in the gfx11_asm_vop3.s:

v_trig_preop_f64 v[254:255], 0xaf123456, 0xaf123456 clamp div:2
// GFX11: encoding: [0xfe,0x80,0x2f,0xd7,0xff,0xfe,0x01,0x18,0x56,0x34,0x12,0xaf]

Without this piece of code it would give error:

llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding  <<< 'v_trig_preop_f64 v[254:255], 0xaf123456, 0xaf123456 clamp div:2'
<stdin>:1:42: error: only one unique literal operand is allowed
v_trig_preop_f64 v[254:255], 0xaf123456, 0xaf123456 clamp div:2
                                         ^

@jayfoad
Copy link
Contributor

jayfoad commented Oct 11, 2023

we need to change the representation to store a full 64-bit value

I think this is clearly a Good Thing. There are just some details of the implementation that I don't understand.

A 64-bit literal can be used as a 32-bit zero or sign extended
operand. In case of double zeroes are added to the low 32 bits.
Currently asm parser stores only high 32 bits of a double into
an operand. To support codegen as requested by the
llvm#67781 we need to
change the representation to store a full 64-bit value so that
codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests
and asm kernels. We allow to use short hex strings to represent
only a high 32 bit of a double value as a valid literal.
@rampitec rampitec requested review from JDevlieghere, nikic and a team as code owners October 12, 2023 07:51
@rampitec
Copy link
Collaborator Author

Sorry for the noise, rebase and merge seems to add everyone committed in between to the reviewers.

@nikic nikic removed request for a team and nikic October 12, 2023 07:58
@rampitec rampitec removed the request for review from JDevlieghere October 12, 2023 08:00
@@ -454,6 +454,8 @@ void AMDGPUInstPrinter::printImmediate64(uint64_t Imm,
else if (Imm == 0x3fc45f306dc9c882 &&
STI.hasFeature(AMDGPU::FeatureInv2PiInlineImm))
O << "0.15915494309189532";
else if (IsFP && AMDGPU::isValid32BitLiteral(Imm, true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be:

Suggested change
else if (IsFP && AMDGPU::isValid32BitLiteral(Imm, true))
else if (IsFP) {
assert(AMDGPU::isValid32BitLiteral(Imm, true));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can, given that lo32 are cleared by the parser. Changed.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one remaining comment about validateVOPLiteral.

A 64-bit literal can be used as a 32-bit zero or sign extended
operand. In case of double zeroes are added to the low 32 bits.
Currently asm parser stores only high 32 bits of a double into
an operand. To support codegen as requested by the
llvm#67781 we need to
change the representation to store a full 64-bit value so that
codegen can simply add immediates to an instruction.

There is some code to support compatibility with existing tests
and asm kernels. We allow to use short hex strings to represent
only a high 32 bit of a double value as a valid literal.
@rampitec
Copy link
Collaborator Author

LGTM modulo one remaining comment about validateVOPLiteral.

Done.

@rampitec rampitec merged commit ab6c3d5 into llvm:main Oct 12, 2023
3 checks passed
@rampitec rampitec deleted the lit64-mc-enc branch October 12, 2023 21:47
@mikaelholmen
Copy link
Collaborator

Hi @rampitec

With UBSan built binaries the
MC/AMDGPU/literals.s
testcase fails and triggers UB like

07:33:04 ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59: runtime error: left shift of negative value -54321
07:33:04 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59 

It's this new shift that it complains on:

    Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);

@rampitec
Copy link
Collaborator Author

Hi @rampitec

With UBSan built binaries the MC/AMDGPU/literals.s testcase fails and triggers UB like

07:33:04 ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59: runtime error: left shift of negative value -54321
07:33:04 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59 

It's this new shift that it complains on:

    Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);

I am happy to fix it, but I do not understand how a LEFT shift can be a UB? Low bits will be zero, right? High bits will be disposed. I really do not understand.

@rampitec
Copy link
Collaborator Author

Hi @rampitec
With UBSan built binaries the MC/AMDGPU/literals.s testcase fails and triggers UB like

07:33:04 ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59: runtime error: left shift of negative value -54321
07:33:04 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:2246:59 

It's this new shift that it complains on:

    Val = AMDGPU::isSISrcFPOperand(InstDesc, OpNum) ? Val << 32 : Lo_32(Val);

I am happy to fix it, but I do not understand how a LEFT shift can be a UB? Low bits will be zero, right? High bits will be disposed. I really do not understand.

Building ubsan now...

@rampitec
Copy link
Collaborator Author

I have a small little problem that I cannot build tip now:

FAILED: tools/llvm-remarkutil/CMakeFiles/llvm-remarkutil.dir/RemarkCounter.cpp.o
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/stas/work/llvm/ubsan/tools/llvm-remarkutil -I/home/stas/work/llvm/llvm/tools/llvm-remarkutil -I/home/stas/work/llvm/ubsan/include -I/home/stas/work/llvm/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g  -fno-exceptions -funwind-tables -fno-rtti -std=c++17 -MD -MT tools/llvm-remarkutil/CMakeFiles/llvm-remarkutil.dir/RemarkCounter.cpp.o -MF tools/llvm-remarkutil/CMakeFiles/llvm-remarkutil.dir/RemarkCounter.cpp.o.d -o tools/llvm-remarkutil/CMakeFiles/llvm-remarkutil.dir/RemarkCounter.cpp.o -c /home/stas/work/llvm/llvm/tools/llvm-remarkutil/RemarkCounter.cpp
In file included from /home/stas/work/llvm/llvm/tools/llvm-remarkutil/RemarkCounter.cpp:13:
/home/stas/work/llvm/llvm/tools/llvm-remarkutil/RemarkCounter.h:90:14: error: call to deleted constructor of 'llvm::Error'
      return E;
             ^

@mikaelholmen
Copy link
Collaborator

I suppose left shift of negative values is undefined because if you shift out the sign bit you can overflow and get a positive value.

@rampitec
Copy link
Collaborator Author

I suppose left shift of negative values is undefined because if you shift out the sign bit you can overflow and get a positive value.

Sounds like BS. It is defined. Unexpected maybe.

@rampitec
Copy link
Collaborator Author

I suppose left shift of negative values is undefined because if you shift out the sign bit you can overflow and get a positive value.

#68959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants