[LoopIdiomRecognizer] Implement CRC recognition #79295

joe-img · 2024-01-24T14:28:04Z

Recognizes CRC byte loops and replaces them with a table lookup.

Current limitations:

Only works on byte loops
CRC size can be any, but the data is limited to one byte. i.e. a loop with iteration count 8.
Only works on single-block loops
most CRC loops would have been flattened to one block with select instructions this far into the pipeline.

Both limitations were in effort to reduce complexity, especially for a first patch. The code can be fairly easily extended to overcome these limitations.

Implementation details:

Check if the loop looks like CRC and extract some useful information
Execute one iteration of the instruction of the loop to see what happens to our output value
Ensure the output value is predicated on the value of the M/LSB of our input
Construct an expected output value of one iteration of CRC using the extracted information from step one and compare
Construct a lookup table and replace the output value with a lookup

github-actions · 2024-01-24T14:28:21Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-01-24T14:28:50Z

@llvm/pr-subscribers-llvm-transforms

Author: Joe Faulls (joe-img)

Changes

Recognizes CRC byte loops and replaces them with a table lookup.

Current limitations:

Only works on byte loops
CRC size can be any, but the data is limited to one byte. i.e. a loop with iteration count 8.
Only works on single-block loops
most CRC loops would have been flattened to one block with select instructions this far into the pipeline.

Both limitations were in effort to reduce complexity, especially for a first patch. The code can be fairly easily extended to overcome these limitations.

Implementation details:

Check if the loop looks like CRC and extract some useful information
Execute one iteration of the instruction of the loop to see what happens to our output value
Ensure the output value is predicated on the value of the M/LSB of our input
Construct an expected output value of one iteration of CRC using the extracted information from step one and compare
Construct a lookup table and replace the output value with a lookup

Patch is 46.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79295.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (+910-9)
(added) llvm/test/Transforms/LoopIdiom/crc/crc.ll (+195)
(added) llvm/test/Transforms/LoopIdiom/crc/not-crc.ll (+113)

diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 3721564890ddb4..f20947daaed8d5 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -90,6 +90,8 @@
 #include <algorithm>
 #include <cassert>
 #include <cstdint>
+#include <map>
+#include <sstream>
 #include <utility>
 #include <vector>
 
@@ -135,6 +137,9 @@ static cl::opt<bool> UseLIRCodeSizeHeurs(
              "with -Os/-Oz"),
     cl::init(true), cl::Hidden);
 
+static cl::opt<bool> CRCRecognize("recognize-crc", cl::desc("CRC RECOGNIZE"),
+                                  cl::init(false), cl::Hidden);
+
 namespace {
 
 class LoopIdiomRecognize {
@@ -186,6 +191,15 @@ class LoopIdiomRecognize {
             // handling.
   };
 
+  struct CRCInfo {
+    Value *CRCInput;
+    Value *CRCOutput;
+    Value *DataInput;
+    uint64_t Width;
+    uint64_t Polynomial;
+    bool BitReversed;
+  };
+
   /// \name Countable Loop Idiom Handling
   /// @{
 
@@ -242,6 +256,9 @@ class LoopIdiomRecognize {
 
   bool recognizeShiftUntilBitTest();
   bool recognizeShiftUntilZero();
+  std::optional<CRCInfo> looksLikeCRC(const SCEV *BECount);
+  bool recognizeCRC(const SCEV *BECount);
+  void writeTableBasedCRCOneByte(CRCInfo &CRC);
 
   /// @}
 };
@@ -298,13 +315,8 @@ bool LoopIdiomRecognize::runOnLoop(Loop *L) {
   ApplyCodeSizeHeuristics =
       L->getHeader()->getParent()->hasOptSize() && UseLIRCodeSizeHeurs;
 
-  HasMemset = TLI->has(LibFunc_memset);
-  HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
-  HasMemcpy = TLI->has(LibFunc_memcpy);
-
-  if (HasMemset || HasMemsetPattern || HasMemcpy)
-    if (SE->hasLoopInvariantBackedgeTakenCount(L))
-      return runOnCountableLoop();
+  if (SE->hasLoopInvariantBackedgeTakenCount(L))
+    return runOnCountableLoop();
 
   return runOnNoncountableLoop();
 }
@@ -329,6 +341,17 @@ bool LoopIdiomRecognize::runOnCountableLoop() {
                     << "] Countable Loop %" << CurLoop->getHeader()->getName()
                     << "\n");
 
+  bool MadeChange = false;
+  if (CRCRecognize)
+    MadeChange |= recognizeCRC(BECount);
+
+  HasMemset = TLI->has(LibFunc_memset);
+  HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
+  HasMemcpy = TLI->has(LibFunc_memcpy);
+
+  if (!(HasMemset || HasMemsetPattern || HasMemcpy))
+    return MadeChange;
+
   // The following transforms hoist stores/memsets into the loop pre-header.
   // Give up if the loop has instructions that may throw.
   SimpleLoopSafetyInfo SafetyInfo;
@@ -336,8 +359,6 @@ bool LoopIdiomRecognize::runOnCountableLoop() {
   if (SafetyInfo.anyBlockMayThrow())
     return false;
 
-  bool MadeChange = false;
-
   // Scan all the blocks in the loop that are not in subloops.
   for (auto *BB : CurLoop->getBlocks()) {
     // Ignore blocks in subloops.
@@ -2868,3 +2889,883 @@ bool LoopIdiomRecognize::recognizeShiftUntilZero() {
   ++NumShiftUntilZero;
   return MadeChange;
 }
+
+static uint64_t reverseBits(uint64_t Num, unsigned NumBits) {
+  uint64_t Reversed = 0;
+  for (unsigned i = 1; i <= NumBits; i++) {
+    Reversed |= (Num & 1) << (NumBits - i);
+    Num >>= 1;
+  }
+  return Reversed;
+}
+
+class ValueBits {
+  // This is a representation of a value's bits in terms of references to
+  // other values' bits, or 1/0 if the bit is known. This allows symbolic
+  // execution of bitwise instructions without knowing the exact values.
+  //
+  // Example:
+  //
+  // LLVM IR Value i8 %x:
+  // [%x[7], %x[6], %x[5], %x[4], %x[3], %x[2], %x[1], %x[0]]
+  //
+  // %shr = lshr i8 %x, 2
+  // [ 0, 0, %x[7], %x[6], %x[5], %x[4], %x[3], %x[2]]
+  //
+  // %shl = shl i8 %shr, 1
+  // [ 0, %x[7], %x[6], %x[5], %x[4], %x[3], %x[2], 0]
+  //
+  // %xor = xor i8 %shl, 0xb
+  // [ 0, %x[7], %x[6], %x[5], %x[4]^1, %x[3], %x[2]^1, 1]
+public:
+  class ValueBit {
+  public:
+    enum BitType { ONE, ZERO, REF, XOR };
+
+  private:
+    BitType _Type;
+    std::pair<Value *, uint64_t> _BitRef;
+    ValueBit *_LHS;
+    ValueBit *_RHS;
+
+    ValueBit(BitType Type) : _Type(Type) {}
+    ValueBit(BitType Type, std::pair<Value *, uint64_t> BitRef)
+        : _Type(Type), _BitRef(BitRef) {}
+    ValueBit(BitType Type, ValueBit *LHS, ValueBit *RHS)
+        : _Type(Type), _LHS(LHS), _RHS(RHS) {}
+
+  public:
+    static ValueBit *CreateOneBit() { return new ValueBit(BitType::ONE); }
+    static ValueBit *CreateZeroBit() { return new ValueBit(BitType::ZERO); }
+    static ValueBit *CreateRefBit(Value *Ref, uint64_t Offset) {
+      return new ValueBit(BitType::REF, std::make_pair(Ref, Offset));
+    }
+    static ValueBit *CreateXORBit(ValueBit *LHS, ValueBit *RHS) {
+      return new ValueBit(BitType::XOR, LHS, RHS);
+    }
+    inline BitType getType() { return _Type; }
+    bool equals(ValueBit *RHS) {
+      if (_Type != RHS->getType())
+        return false;
+      switch (_Type) {
+      case BitType::ONE:
+      case BitType::ZERO:
+        return true;
+      case BitType::REF:
+        return _BitRef == RHS->_BitRef;
+      case BitType::XOR:
+        return (_LHS->equals(RHS->_LHS) && _RHS->equals(RHS->_RHS)) ||
+               (_LHS->equals(RHS->_RHS) && _RHS->equals(RHS->_LHS));
+      }
+      return false;
+    }
+
+    void print(raw_ostream &OS) {
+      switch (_Type) {
+      case BitType::ONE:
+        OS << "1";
+        break;
+      case BitType::ZERO:
+        OS << "0";
+        break;
+      case BitType::REF:
+        OS << _BitRef.first->getNameOrAsOperand() << "[" << _BitRef.second
+           << "]";
+        break;
+      case BitType::XOR:
+        _LHS->print(OS);
+        OS << "^";
+        _RHS->print(OS);
+        break;
+      }
+    }
+  };
+
+private:
+  uint64_t Size;
+  std::vector<ValueBit *> Bits;
+
+  virtual void _Shl(uint64_t N) {
+    for (; N > 0; N--) {
+      Bits.insert(Bits.begin(), ValueBit::CreateZeroBit());
+      Bits.pop_back();
+    }
+  }
+  virtual void _LShr(uint64_t N) {
+    for (; N > 0; N--) {
+      Bits.insert(Bits.end(), ValueBit::CreateZeroBit());
+      Bits.erase(Bits.begin());
+    }
+  }
+  virtual void _Xor(ValueBits *RHS) {
+    assert(Size == RHS->getSize());
+    for (unsigned I = 0; I < Size; I++) {
+      auto It = Bits.begin() + I;
+      ValueBit *RHSBit = RHS->getBit(I);
+      if (RHSBit->getType() == ValueBit::BitType::ONE) {
+        Bits.erase(It);
+        if ((*It)->getType() == ValueBit::BitType::ZERO) {
+          Bits.insert(It, ValueBit::CreateOneBit());
+        } else if ((*It)->getType() == ValueBit::BitType::ONE) {
+          Bits.insert(It, ValueBit::CreateZeroBit());
+        } else {
+          ValueBit *One = ValueBit::CreateOneBit();
+          Bits.insert(It, ValueBit::CreateXORBit(*It, One));
+        }
+      } else if (RHSBit->getType() != ValueBit::BitType::ZERO) {
+        if ((*It)->getType() == ValueBit::BitType::ZERO) {
+          Bits.erase(It);
+          ValueBit *BitRef = new ValueBit(*RHSBit);
+          Bits.insert(It, BitRef);
+        } else {
+          ValueBit *ItVB = *It;
+          Bits.erase(It);
+          Bits.insert(It, ValueBit::CreateXORBit(ItVB, RHSBit));
+        }
+      }
+    }
+  }
+  virtual void _ZExt(uint64_t ToSize) {
+    assert(ToSize > Size);
+    for (uint64_t I = 0; I < ToSize - Size; I++)
+      Bits.push_back(ValueBit::CreateZeroBit());
+    Size = ToSize;
+  }
+  virtual void _Trunc(uint64_t ToSize) {
+    assert(ToSize < Size);
+    Bits.erase(Bits.begin() + ToSize, Bits.end());
+    Size = ToSize;
+  }
+  virtual void _And(uint64_t RHS) {
+    for (unsigned I = 0; I < Size; I++) {
+      if (!(RHS & 1)) {
+        auto It = Bits.begin() + I;
+        Bits.erase(It);
+        Bits.insert(It, ValueBit::CreateZeroBit());
+      }
+      RHS >>= 1;
+    }
+  }
+
+protected:
+  ValueBits() {}
+
+public:
+  ValueBits(Value *InitialVal, uint64_t BitLength) : Size(BitLength) {
+    for (unsigned i = 0; i < BitLength; i++)
+      Bits.push_back(ValueBit::CreateRefBit(InitialVal, i));
+  }
+  ValueBits(uint64_t InitialVal, uint64_t BitLength) : Size(BitLength) {
+    for (unsigned i = 0; i < BitLength; i++) {
+      if (InitialVal & 0x1)
+        Bits.push_back(ValueBit::CreateOneBit());
+      else
+        Bits.push_back(ValueBit::CreateZeroBit());
+      InitialVal >>= 1;
+    }
+  }
+  uint64_t getSize() { return Size; }
+  ValueBit *getBit(unsigned i) { return Bits[i]; }
+
+  virtual ValueBits *copyBits() { return new ValueBits(*this); }
+
+  static ValueBits *Shl(ValueBits *LHS, uint64_t N) {
+    ValueBits *Shifted = LHS->copyBits();
+    Shifted->_Shl(N);
+    return Shifted;
+  }
+  static ValueBits *LShr(ValueBits *LHS, uint64_t N) {
+    ValueBits *Shifted = LHS->copyBits();
+    Shifted->_LShr(N);
+    return Shifted;
+  }
+  static ValueBits *Xor(ValueBits *LHS, ValueBits *RHS) {
+    ValueBits *Xord = LHS->copyBits();
+    Xord->_Xor(RHS);
+    return Xord;
+  }
+  static ValueBits *ZExt(ValueBits *LHS, uint64_t ToSize) {
+    ValueBits *Zexted = LHS->copyBits();
+    Zexted->_ZExt(ToSize);
+    return Zexted;
+  }
+  static ValueBits *Trunc(ValueBits *LHS, uint64_t N) {
+    ValueBits *Trunced = LHS->copyBits();
+    Trunced->_Trunc(N);
+    return Trunced;
+  }
+  static ValueBits *And(ValueBits *LHS, uint64_t RHS) {
+    ValueBits *Anded = LHS->copyBits();
+    Anded->_And(RHS);
+    return Anded;
+  }
+
+  virtual bool isPredicated() { return false; }
+
+  virtual bool equals(ValueBits *RHS) {
+    if (Size != RHS->getSize())
+      return false;
+
+    for (unsigned I = 0; I < Size; I++)
+      if (!getBit(I)->equals(RHS->getBit(I)))
+        return false;
+
+    return true;
+  }
+
+  virtual void print(raw_ostream &OS) {
+    assert(Size != 0);
+    OS << "[";
+    Bits[Size - 1]->print(OS);
+    for (int i = Size - 2; i >= 0; i--) {
+      OS << " | ";
+      Bits[i]->print(OS);
+    }
+    OS << "]\n";
+  }
+};
+
+inline raw_ostream &operator<<(raw_ostream &OS, ValueBits &VBS) {
+  VBS.print(OS);
+  return OS;
+}
+
+inline raw_ostream &operator<<(raw_ostream &OS, ValueBits::ValueBit &VB) {
+  VB.print(OS);
+  return OS;
+}
+class PredicatedValueBits : public ValueBits {
+  // This would be representitive of select or phi instructions where the bits
+  // would depend on an icmp.
+private:
+  ICmpInst *_Predicate;
+  ValueBits *_IfTrue;
+  ValueBits *_IfFalse;
+
+  void _Shl(uint64_t N) override {
+    _IfTrue = ValueBits::Shl(_IfTrue, N);
+    _IfFalse = ValueBits::Shl(_IfFalse, N);
+  }
+  void _LShr(uint64_t N) override {
+    _IfTrue = ValueBits::LShr(_IfTrue, N);
+    _IfFalse = ValueBits::LShr(_IfFalse, N);
+  }
+  void _ZExt(uint64_t N) override {
+    _IfTrue = ValueBits::ZExt(_IfTrue, N);
+    _IfFalse = ValueBits::ZExt(_IfFalse, N);
+  }
+  void _And(uint64_t N) override {
+    _IfTrue = ValueBits::And(_IfTrue, N);
+    _IfFalse = ValueBits::And(_IfFalse, N);
+  }
+  void _Xor(ValueBits *RHS) override {
+    _IfTrue = ValueBits::Xor(_IfTrue, RHS);
+    _IfFalse = ValueBits::Xor(_IfFalse, RHS);
+  }
+  void _Trunc(uint64_t N) override {
+    _IfTrue = ValueBits::Trunc(_IfTrue, N);
+    _IfFalse = ValueBits::Trunc(_IfFalse, N);
+  }
+
+public:
+  PredicatedValueBits(ICmpInst *Predicate, ValueBits *IfTrue,
+                      ValueBits *IfFalse)
+      : _Predicate(Predicate), _IfTrue(IfTrue), _IfFalse(IfFalse) {}
+
+  ValueBits *copyBits() override { return new PredicatedValueBits(*this); }
+  bool isPredicated() override { return true; }
+  ValueBits *getIfTrue() { return _IfTrue; }
+  ValueBits *getIfFalse() { return _IfFalse; }
+  ICmpInst *getPredicate() { return _Predicate; }
+
+  virtual void print(raw_ostream &OS) override {
+    OS << "Predicate: " << *_Predicate << "\nIf True:\n"
+       << *_IfTrue << "If False:\n"
+       << *_IfFalse;
+  }
+};
+
+// Execute the instructions in a basic block whilst mapping out Values to
+// ValueBits
+static bool symbolicallyExecute(BasicBlock *BB,
+                                std::map<Value *, ValueBits *> &ValueMap) {
+
+  auto getConstantOperand = [](Instruction *I, uint8_t Operand) {
+    ConstantInt *CI = dyn_cast<ConstantInt>(I->getOperand(Operand));
+    if (!CI) {
+      LLVM_DEBUG(dbgs() << DEBUG_TYPE " CRCRegonize: Do not know how to"
+                        << " handle this operation with non-constant operand "
+                        << Operand << ":\n"
+                        << *I << "\n");
+    }
+    return CI;
+  };
+
+  auto getOrCreateValueBits = [&ValueMap](Value *Val) {
+    auto Result = ValueMap.find(Val);
+    ValueBits *LHSBits = nullptr;
+    if (Result == ValueMap.end()) {
+      ConstantInt *CI = dyn_cast<ConstantInt>(Val);
+      if (CI) {
+        LHSBits = new ValueBits(CI->getSExtValue(),
+                                Val->getType()->getScalarSizeInBits());
+      } else {
+        LHSBits = new ValueBits(Val, Val->getType()->getScalarSizeInBits());
+      }
+    } else
+      LHSBits = Result->second;
+    return LHSBits;
+  };
+
+  for (Instruction &I : *BB) {
+    uint64_t BitSize = I.getType()->getScalarSizeInBits();
+    switch (I.getOpcode()) {
+    case Instruction::PHI: {
+      PHINode *PHI = dyn_cast<PHINode>(&I);
+      const BasicBlock *IncomingBlock = nullptr;
+      for (const BasicBlock *Incoming : PHI->blocks()) {
+        if (Incoming != BB) {
+          if (IncomingBlock) {
+            LLVM_DEBUG(dbgs()
+                       << DEBUG_TYPE " CRCRegonize: Do not know how to"
+                       << " handle loop with multiple entries" << I << "\n");
+            return false;
+          }
+          IncomingBlock = Incoming;
+        }
+      }
+      assert(IncomingBlock);
+      ValueMap[&I] =
+          getOrCreateValueBits(PHI->getIncomingValueForBlock(IncomingBlock));
+    } break;
+    case Instruction::Shl: {
+      ConstantInt *CI = getConstantOperand(&I, 1);
+      if (!CI)
+        return false;
+      Value *LHSVal = I.getOperand(0);
+      ValueBits *LHSBits = getOrCreateValueBits(LHSVal);
+      ValueMap[&I] = ValueBits::Shl(LHSBits, CI->getSExtValue());
+    } break;
+    case Instruction::LShr: {
+      ConstantInt *CI = getConstantOperand(&I, 1);
+      if (!CI)
+        return false;
+      Value *LHSVal = I.getOperand(0);
+      ValueBits *LHSBits = getOrCreateValueBits(LHSVal);
+      ValueMap[&I] = ValueBits::LShr(LHSBits, CI->getSExtValue());
+    } break;
+    case Instruction::And: {
+      ConstantInt *CI = getConstantOperand(&I, 1);
+      if (!CI)
+        return false;
+      Value *LHSVal = I.getOperand(0);
+      ValueBits *LHSBits = getOrCreateValueBits(LHSVal);
+      ValueMap[&I] = ValueBits::And(LHSBits, CI->getSExtValue());
+    } break;
+    case Instruction::Xor: {
+      ValueBits *LHSBits = getOrCreateValueBits(I.getOperand(0));
+      ValueBits *RHSBits = getOrCreateValueBits(I.getOperand(1));
+      ValueMap[&I] = ValueBits::Xor(LHSBits, RHSBits);
+    } break;
+    case Instruction::ZExt: {
+      ValueBits *LHSBits = getOrCreateValueBits(I.getOperand(0));
+      ValueMap[&I] = ValueBits::ZExt(LHSBits, BitSize);
+    } break;
+    case Instruction::Trunc: {
+      ValueBits *LHSBits = getOrCreateValueBits(I.getOperand(0));
+      ValueMap[&I] = ValueBits::Trunc(LHSBits, BitSize);
+    } break;
+    case Instruction::Select: {
+      SelectInst *Select = cast<SelectInst>(&I);
+      ICmpInst *Cond = dyn_cast<ICmpInst>(Select->getCondition());
+      if (!Cond) {
+        LLVM_DEBUG(dbgs() << DEBUG_TYPE " CRCRegonize: Do not know how to"
+                          << " handle SelectInst with non-icmp condition: " << I
+                          << "\n");
+        return false;
+      }
+      ValueBits *IfTrue = getOrCreateValueBits(Select->getTrueValue());
+      ValueBits *IfFalse = getOrCreateValueBits(Select->getFalseValue());
+      ValueMap[&I] = new PredicatedValueBits(Cond, IfTrue, IfFalse);
+    } break;
+    default:
+      // If this instruction is not recognized, then just continue. This is
+      // okay because users of this will just reference it by value, which is
+      // conservative.
+      break;
+    }
+  }
+  return true;
+}
+
+void LoopIdiomRecognize::writeTableBasedCRCOneByte(CRCInfo &CRC) {
+  BasicBlock *ExitBB = CurLoop->getExitBlock();
+  IRBuilder<> Builder(ExitBB);
+  Builder.SetInsertPoint(ExitBB->getFirstNonPHI());
+  Type *CRCType = CRC.CRCInput->getType();
+  uint64_t CRCSize = CRCType->getScalarSizeInBits();
+
+  // Construct the CRC table
+  uint64_t CRCTable[256];
+  uint64_t Polynomial = CRC.Polynomial;
+  uint64_t SB = CRC.BitReversed ? 0x1 : (0x1 << (CRCSize - 1));
+  if (CRC.BitReversed)
+    Polynomial = reverseBits(Polynomial, CRCSize);
+  for (uint64_t Dividend = 0; Dividend < 256; Dividend++) {
+    uint64_t CurByte = Dividend;
+    if (!CRC.BitReversed)
+      CurByte <<= CRCSize - 8;
+    for (uint8_t Bit = 0; Bit < 8; Bit++) {
+      if ((CurByte & SB) != 0) {
+        CurByte = CRC.BitReversed ? CurByte >> 1 : CurByte << 1;
+        CurByte = CurByte ^ Polynomial;
+      } else {
+        CurByte = CRC.BitReversed ? CurByte >> 1 : CurByte << 1;
+      }
+    }
+    CRCTable[Dividend] = CurByte;
+  }
+  // To construct a global data array, we need the raw data in bytes.
+  // The calculated table array is an array of 64bit values because we can't
+  // dynamically type it, so we need to truncate the values to the crc size
+  // to avoid padded zeros. Do this by allocating a byte array (of slightly more
+  // than we need to account for overflow) and copying the 64bit values across
+  // aligned correctly
+  uint64_t CRCNumBytes = CRCSize / 8;
+  char *CRCTableData = (char *)malloc(CRCNumBytes * 260);
+  for (int I = 0; I < 256; I++) {
+    *((uint64_t *)(CRCTableData + I * CRCNumBytes)) = CRCTable[I];
+  }
+
+  // Construct and add the table as a global variable
+  ArrayType *TableType = ArrayType::get(CRCType, 256);
+  Constant *ConstantArr = ConstantDataArray::getRaw(
+      StringRef(CRCTableData, CRCNumBytes * 256), 256, CRCType);
+  std::stringstream TableNameSS;
+  TableNameSS << "crctable.i" << CRCSize << "." << CRC.Polynomial;
+  if (CRC.BitReversed)
+    TableNameSS << ".reversed";
+  GlobalVariable *CRCTableGlobal = new GlobalVariable(
+      TableType, true, GlobalVariable::LinkageTypes::PrivateLinkage,
+      ConstantArr, TableNameSS.str());
+  ExitBB->getModule()->insertGlobalVariable(CRCTableGlobal);
+  free(CRCTableData);
+
+  // Construct the IR to load from this table
+  Value *CRCOffset = CRC.CRCInput;
+  if (CRCSize > 8) {
+    // Get the next byte into position and truncate
+    if (!CRC.BitReversed)
+      CRCOffset = Builder.CreateLShr(CRCOffset, CRCSize - 8);
+    CRCOffset = Builder.CreateTrunc(CRCOffset, Builder.getInt8Ty());
+  }
+  if (CRC.DataInput) {
+    // Data size can be more than 8 due to extending
+    Value *Data = CRC.DataInput;
+    if (CRC.DataInput->getType()->getScalarSizeInBits() > 8) {
+      Data = Builder.CreateTrunc(Data, Builder.getInt8Ty());
+    }
+    // Xor the data, offset into the table and load
+    CRCOffset = Builder.CreateXor(CRCOffset, Data);
+  }
+
+  CRCOffset = Builder.CreateZExt(CRCOffset, Builder.getInt32Ty());
+  Value *Gep = Builder.CreateInBoundsGEP(CRCType, CRCTableGlobal, {CRCOffset});
+  Value *CRCRes = Builder.CreateLoad(CRCType, Gep);
+  if (CRCSize > 8) {
+    // Shift out SB used for division and Xor the rest of the crc back in
+    Value *RestOfCRC = CRC.CRCInput;
+    if (CRC.BitReversed)
+      RestOfCRC = Builder.CreateLShr(CRC.CRCInput, 8);
+    else
+      RestOfCRC = Builder.CreateShl(CRC.CRCInput, 8);
+    CRCRes = Builder.CreateXor(RestOfCRC, CRCRes);
+  }
+  for (PHINode &ExitPhi : CurLoop->getExitBlock()->phis()) {
+    if (ExitPhi.getNumIncomingValues() == 1 &&
+        ExitPhi.getIncomingValue(0) == CRC.CRCOutput)
+      ExitPhi.replaceAllUsesWith(CRCRes);
+  }
+}
+
+bool LoopIdiomRecognize::recognizeCRC(const SCEV *BECount) {
+  // Step one: Check if the loop looks like crc, and extract some useful
+  // information for us to check
+  std::optional<CRCInfo> MaybeCRC = looksLikeCRC(BECount);
+  if (!MaybeCRC)
+    return false;
+  CRCInfo CRC = *MaybeCRC;
+
+  uint64_t CRCSize = CRC.CRCInput->getType()->getScalarSizeInBits();
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE " CRCRegonize: Found potential CRCLoop "
+                    << *CurLoop << "\n"
+                    << "Input CRC: " << *CRC.CRCInput << "\n"
+                    << "Output CRC: " << *CRC.CRCOutput << "\n"
+           ...
[truncated]

joe-img · 2024-01-24T14:30:33Z

RFC link: https://discourse.llvm.org/t/rfc-crc-recognition-in-loopidiomrecognizer/76491

mgudim · 2024-02-01T07:58:41Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+  // Symbolically execute one iteration of the loop to populate a map of
+  // Value's to their ValueBits, aka a representation of their bits in terms of
+  // 1's, 0's and references to other values' bits. If these match pre-computed
+  // crc values, then we can say it's doing crc.


Why is executing one iteration enough?

Updated. Does it make sense?

mgudim · 2024-02-01T08:01:49Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+    return false;
+  }
+
+  // If there exists a Data input, ensure the check bit is crc^data.


this comment is too specific to coremark. Can you make it more general?

Looking at many CRC implementations, I disagree that this is specific to CoreMark. The data is almost always xor'd into the crc. Half of these, coremark included, decide to xor inside the loop.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

mgudim · 2024-02-07T22:00:18Z

@joe-img I am going to review this properly, but now I am pretty busy with other work. Please ping me in a 2 weeks, if I haven't reviewed before, sorry for the long wait.

joe-img · 2024-02-21T11:03:56Z

@joe-img I am going to review this properly, but now I am pretty busy with other work. Please ping me in a 2 weeks, if I haven't reviewed before, sorry for the long wait.

Ping : )

mgudim · 2024-02-22T02:16:32Z

@joe-img I am going to review this properly, but now I am pretty busy with other work. Please ping me in a 2 weeks, if I haven't reviewed before, sorry for the long wait.

Ping : )

Thanks for the reminder. I'm in the process.

topperc · 2024-02-22T03:26:28Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

@@ -135,6 +137,9 @@ static cl::opt<bool> UseLIRCodeSizeHeurs(
             "with -Os/-Oz"),
    cl::init(true), cl::Hidden);

+static cl::opt<bool> CRCRecognize("recognize-crc", cl::desc("CRC RECOGNIZE"),


Description should not be in all caps

topperc · 2024-02-22T03:33:03Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+    static ValueBit *CreateOneBit() { return new ValueBit(BitType::ONE); }
+    static ValueBit *CreateZeroBit() { return new ValueBit(BitType::ZERO); }
+    static ValueBit *CreateRefBit(Value *Ref, uint64_t Offset) {
+      return new ValueBit(BitType::REF, std::make_pair(Ref, Offset));


Is this memory ever freed? A quick search doesn't show any calls to delete or unique_ptr

topperc · 2024-02-22T03:33:48Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+        if (Incoming != BB) {
+          if (IncomingBlock) {
+            LLVM_DEBUG(dbgs()
+                       << DEBUG_TYPE " CRCRegonize: Do not know how to"


Recognize*. This spelling error is repeated in many messages.

topperc · 2024-02-22T03:35:43Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+  // than we need to account for overflow) and copying the 64bit values across
+  // aligned correctly
+  uint64_t CRCNumBytes = CRCSize / 8;
+  char *CRCTableData = (char *)malloc(CRCNumBytes * 260);


Please use array new and delete.

And use uint8_t or int8_t instead of char. The signedness of char is platform dependent

topperc · 2024-02-22T03:41:03Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+    CRCOffset = Builder.CreateXor(CRCOffset, Data);
+  }
+
+  CRCOffset = Builder.CreateZExt(CRCOffset, Builder.getInt32Ty());


You should probably get the index type from DataLayout and not hardcode i32

topperc · 2024-02-22T03:42:27Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+  // (MSB): crc < 0
+  // (MSB): crc <= -1
+  // And decide whether the check is checking for existence of 1 or 0
+  bool checkZero = false;


Capitalize variable name

topperc · 2024-02-22T03:50:20Z

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

+    return std::nullopt;
+
+  // Check iteration count is 8
+  const SCEV *TripCountSCEV =


It's less code to use ScalarEvolution::getSmallConstantTripCount, but we already have the backedge count so I don't know what's better.

mgudim · 2024-02-24T04:19:28Z

llvm/test/Transforms/LoopIdiom/crc/crc.ll

+define dso_local zeroext i8 @crc8_loop(ptr noundef %data, i32 noundef %length) {
+entry:
+  br label %for.cond
+


Please minimize the tests so that they only show the loop involved without all the extra code.

mgudim · 2024-02-24T06:02:03Z

We should think through the high-level plan of how the implementation should work eventually for targets supporting this. At the same time, we can divide this into smaller pieces which has many advantages.

I can suggest the following, but I really would appreciate feedback / approval of others (@kparzysz):

(1) Create new (overloaded) intrinsics for performing a number of steps in polynomial division. Something like:
pdivstep(i8 dividend, i8 divisor, i32 nsteps). This intrinsic returns the result of performing nsteps steps in the polynomial divison of dividend by the divisor. Here it is assumed that the highest degree of polynomial corresponds to the MSB of the number representing it. Actual loop could be using reverse representation, but we can express it by combining with bitreverse intrinsics.

(2) Add functions to PreISelIntrinsicLowering to expand the pdivstep intrinsic. We can expand back into a loop (or unrolled loop) like expandMemCpyAsLoop. Or we can emit a table lookup (like in this MR), or we can expand using clmul intrinsics.
For testing purposes, we can introduce flags to always (if possible) do specified type of expansion. We can start with "expand into a loop" and a "table lookup". We can add expansion using clmuls later.

(3) Implement the recognition part (and generation of pdivstep intrinsic) in LoopIdiomRecognize. As in this MR, the generation of the intrinsic should be controlled by a flag. Also, the idiom should be "polynomial division", CRC is just one application of it. I don't know any other though :-) If the whole plan is approved, we can merge this part first.

(4) Recognize identities on pdiv intrinsics. I think in coremark, the 8-bit ones combine eventually into larger one. This should be after inlining, so probably we can do it in PreISelIntrinsicLowering?

(5) Create a way for targets to specify if pdiv intrinsics should be generated. If yes, how should they be expanded (during PreISelLowering)?

joe-img · 2024-02-29T10:44:30Z

Hi both and thanks for the comments. Sorry, I am working on this PR, I've just had some higher priority items.

pdivstep(i8 dividend, i8 divisor, i32 nsteps). This intrinsic returns the result of performing nsteps steps in the polynomial divison of dividend by the divisor

So the return value is the quotient? Doesn't that complicate the expression of CRC? What we need for CRC is surely something like premstep instead?

Actual loop could be using reverse representation, but we can express it by combining with bitreverse intrinsics.

Would it be easier to have a bit reversed variation of the intrinsic? We'd have to identify these bitreverse intrinsics upon expansion anyway. It would just avoid having to insert and then recognize it.

In general I think this is a good plan, and was the original vision. If this whole plan is approved, would the first MR be just recognition and creation of the division intrinsic? I.E strip the table based lookup from this MR and emit intrinsic but would error in the backend.

mgudim · 2024-03-07T05:34:10Z

So the return value is the quotient? Doesn't that complicate the expression of CRC? What we need for CRC is surely something like premstep instead?

~~Yes, you're right.~~ Actually, maybe we need both? In coremark after we emit crc8 intinsic, after inlining they combine to a larger crc (32 I think). To recognize that, we need to see that after doing 8-bit crc we continue with the quotient that we got, right? My memory is vague on this, so please correct me if I am wrong.

Would it be easier to have a bit reversed variation of the intrinsic? We'd have to identify these bitreverse intrinsics upon expansion anyway. It would just avoid having to insert and then recognize it.

I am OK with that too. I just didn't want to introduce more intrinsics than we need.

would the first MR be just recognition and creation of the division intrinsic?

I think first MR should be just defining the intrinsics.

would the first MR be just recognition and creation of the division intrinsic? I.E strip the table based lookup from this MR

yes.

but would error in the backend.

Only emit intrinsic if a flag is passed so it doesn't break anything. Name of the flag could indicate that at this point the code gen doesn't work yet for the intrinsic. But at the same time you should be able to add tests to your MR.

mgudim · 2024-03-07T07:33:24Z

So the return value is the quotient? Doesn't that complicate the expression of CRC? What we need for CRC is surely something like premstep instead?

Yes, you're right. Actually, maybe we need both? In coremark after we emit crc8 intinsic, after inlining they combine to a larger crc (32 I think). To recognize that, we need to see that after doing 8-bit crc we continue with the quotient that we got, right? My memory is vague on this, so please correct me if I am wrong.

Sorry for all the noise. Looks like we only need the remainder after all. The identity is something like:

prem((prem(x, d, k_1) << s) || y), d, k_2) = prem((x << s ) || y, d, k_1 + k_2)

github-actions · 2024-03-20T17:37:29Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 182ab1c7034b951433fb8831b67e7758fe61d4e8 9ebb437002f79e51f1ece7e17c06bd821f6ae7b7 -- llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index b4bc8d71db..8bec76d5d3 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -2955,7 +2955,7 @@ public:
         _RHS = new ValueBit(*VB._RHS);
       }
     }
-    ValueBit& operator=(const ValueBit &VB) {
+    ValueBit &operator=(const ValueBit &VB) {
       _Type = VB._Type;
       if (_Type == BitType::REF)
         _BitRef = VB._BitRef;
@@ -3272,8 +3272,8 @@ symbolicallyExecute(BasicBlock *BB,
         }
       }
       assert(IncomingBlock);
-      ValueMap.insert({&I,
-          getOrCreateValueBits(PHI->getIncomingValueForBlock(IncomingBlock))});
+      ValueMap.insert({&I, getOrCreateValueBits(
+                               PHI->getIncomingValueForBlock(IncomingBlock))});
     } break;
     case Instruction::Shl: {
       ConstantInt *CI = getConstantOperand(&I, 1);
@@ -3320,8 +3320,8 @@ symbolicallyExecute(BasicBlock *BB,
       }
       auto IfTrue = getOrCreateValueBits(Select->getTrueValue());
       auto IfFalse = getOrCreateValueBits(Select->getFalseValue());
-      ValueMap.insert({&I,
-          std::make_shared<PredicatedValueBits>(Cond, IfTrue, IfFalse)});
+      ValueMap.insert(
+          {&I, std::make_shared<PredicatedValueBits>(Cond, IfTrue, IfFalse)});
     } break;
     default:
       // If this instruction is not recognized, then just continue. This is
@@ -3456,7 +3456,7 @@ bool LoopIdiomRecognize::recognizeCRC(const SCEV *BECount) {
   // any unexpected loop variant operations happening, e.g. additional select
   // logic or shifts, then this will be captured in the ValueBits.
   std::map<Value *, std::shared_ptr<ValueBits>> ValueMap;
-    
+
   if (!symbolicallyExecute(CurLoop->getHeader(), ValueMap))
     return false;
 
@@ -3487,8 +3487,8 @@ bool LoopIdiomRecognize::recognizeCRC(const SCEV *BECount) {
   // whether this is bit reversed CRC
   ICmpInst *ICmp = CRCOutBitsPred->getPredicate();
   CmpInst::Predicate Pred = ICmp->getPredicate();
-  LLVM_DEBUG(dbgs() << DEBUG_TYPE << " CRCRecognize checking to see if " << *ICmp
-                    << " is checking the "
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE << " CRCRecognize checking to see if "
+                    << *ICmp << " is checking the "
                     << (CRC.BitReversed ? "LSB\n" : "MSB\n"));
 
   // Firstly check the LHS is in our map, and RHS is a constant
@@ -3620,7 +3620,8 @@ bool LoopIdiomRecognize::recognizeCRC(const SCEV *BECount) {
   }
   uint64_t GeneratorPolynomial =
       CRC.BitReversed ? reverseBits(CRC.Polynomial, CRCSize) : CRC.Polynomial;
-  PValueBits Polynomial = std::make_shared<ValueBits>(GeneratorPolynomial, CRCSize);
+  PValueBits Polynomial =
+      std::make_shared<ValueBits>(GeneratorPolynomial, CRCSize);
 
   // Case where the MSB/LSB of the data is 0
   PValueBits IfZero = CRC.BitReversed ? ValueBits::LShr(*CRCValueBits, 1)

joe-img · 2024-03-21T17:53:29Z

So the return value is the quotient? Doesn't that complicate the expression of CRC? What we need for CRC is surely something like premstep instead?

Yes, you're right. Actually, maybe we need both? In coremark after we emit crc8 intinsic, after inlining they combine to a larger crc (32 I think). To recognize that, we need to see that after doing 8-bit crc we continue with the quotient that we got, right? My memory is vague on this, so please correct me if I am wrong.

Sorry for all the noise. Looks like we only need the remainder after all. The identity is something like:
prem((prem(x, d, k_1) << s) || y), d, k_2) = prem((x << s ) || y, d, k_1 + k_2)

I'm a bit confused with all those arguments to prem. What do they represent?

Also, what is considered a 'step' in polynomial division?

In CoreMark, the helper function ee_u16 crcu8(ee_u8 data, ee_u16 crc ) is used, but this is still crc16, as the generator polynomial and remainder are 16-bits wide. I think it's called 'crcu8' because it xor's in 8 bits of data.

If we're using a prem intrinsic, then surely it makes most sense for it to be something like prem(divisor, dividend), where it returns the remainder after polynomial division? With this, I see two issues:
Firstly, for crcn, prem would have to take a divisor of bitsize 2n. CRC implementations avoid this by using a linear feedback shift register.

Secondly, I think this complicates the expression of crc. For example, I'm pretty sure that coremark's crcu8 will be represented like:

(crc << 8) ^ prem((((crc & 0xff00) >> 8) ^ (data)) << 16), poly | (1 << 16))

Although crc is the remainder of polynomial division accross some stream of data, the way it's implemented in crc, I don't think matches up very well with a polynomial remainder intrinsic.

We can expand back into a loop (or unrolled loop) like expandMemCpyAsLoop

Is this necessary? If in LoopIdiomRecognize we're recognizing the loop, can we just avoid emitting the intrinsic using the same logic as would be in PreISelIntrinsicLowering to decide whether to lower to loop?

mgudim · 2024-04-09T07:45:54Z

Hi @joe-img, I am really sorry, I am still too busy with other things (and I don't know for how long).

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

This will check to see if a loop looks like CRC, not necessarily guaranteeing that it is CRC.

This is a representation of a value's bits in terms of references to other values' bits, or 1/0 if the bit is known. This allows symbolic execution of bitwise instructions without knowing the exact values. Example: LLVM IR Value i8 %x: [%x[7], %x[6], %x[5], %x[4], %x[3], %x[2], %x[1], %x[0]] %shr = lshr i8 %x, 2 [ 0, 0, %x[7], %x[6], %x[5], %x[4], %x[3], %x[2]] %shl = shl i8 %shr, 1 [ 0, %x[7], %x[6], %x[5], %x[4], %x[3], %x[2], 0] %xor = xor i8 %shl, 0xb [ 0, %x[7], %x[6], %x[5], %x[4]^1, %x[3], %x[2]^1, 1]

These would be representitive of select or phi instructions where the bits would depend on an icmp.

The result is a map between llvm Values and their bit representations as ValueBits.

- Use shared pointers for factory methods I think the use of shared pointers in `symbolicallyExecute` is the correct decision due to the filling of a map. I tried for a while to make this a map of objects, but due to [object slicing](https://en.wikipedia.org/wiki/Object_slicing) it would not be possible to store a PredicatedValueBits in this map. (This would require PredicatedValueBits to be redesigned into ValueBits class, but then the internal methods get messy. Virtual functions was a clean answer to this) Additionally, using `unique_ptr` proves difficult due to having to store it in a map. And unique pointers cannot be copied. As `symbolicallyExecute` is the primary user of the factory methods, it made sense for the factory methods to return a shared pointer instead of a raw pointer or object and then wrapping this in a shared pointer within symbolicallyExecute. - Have XOR bits own their LHS/RHS Having them as member objects instead of pointers I don't think makes much sense due to them not being needed for most ValueBit objects, but they would have to be initialized for all ValueBit instantiations. - Use object values instead of pointers for bit representation IMO the cleanest way. They're very lightweight anyway.

getNameOrAsOperand is only available in debug mode, so causes failures when built in release.

llvmbot added the llvm:transforms label Jan 24, 2024

mgudim reviewed Feb 1, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Show resolved Hide resolved

topperc reviewed Feb 22, 2024

View reviewed changes

mgudim reviewed Feb 24, 2024

View reviewed changes

zqb-all reviewed Apr 24, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Show resolved Hide resolved

joe-img added 10 commits April 24, 2024 16:02

[LoopIdiomRecognize] Implement function to extract CRC data from loops

5626c07

This will check to see if a loop looks like CRC, not necessarily guaranteeing that it is CRC.

[LoopIdiomRecognize] Implement PredicatedValueBits

cd8aaf7

These would be representitive of select or phi instructions where the bits would depend on an icmp.

[LoopIdiomRecognize] Add function to symbolically execute basic block

180c398

The result is a map between llvm Values and their bit representations as ValueBits.

[LoopIdiomRecognize] Check result of symbolic execution matches CRC

8939111

[LoopIdiomRecognize] Write lookup table based CRC for one-byte data

bb3a6c7

[LoopIdiomRecognize] Add unit tests for CRC idiom recognizer

6831e2c

Clarify some comments

4eb1497

Review changes that aren't memory

7438109

Format code correctly

09510f5

joe-img force-pushed the upstream-crc-recognize branch from 9ebb437 to 09510f5 Compare April 24, 2024 16:05

Surround ValueBit(s) printing with NDEBUG

e781f66

getNameOrAsOperand is only available in debug mode, so causes failures when built in release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopIdiomRecognizer] Implement CRC recognition #79295

[LoopIdiomRecognizer] Implement CRC recognition #79295

joe-img commented Jan 24, 2024

github-actions bot commented Jan 24, 2024

llvmbot commented Jan 24, 2024

joe-img commented Jan 24, 2024

mgudim Feb 1, 2024

joe-img Feb 1, 2024

mgudim Feb 1, 2024

joe-img Feb 1, 2024

mgudim commented Feb 7, 2024

joe-img commented Feb 21, 2024

mgudim commented Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

topperc Feb 22, 2024

mgudim Feb 24, 2024

mgudim commented Feb 24, 2024 •

edited

joe-img commented Feb 29, 2024

mgudim commented Mar 7, 2024 •

edited

mgudim commented Mar 7, 2024 •

edited

github-actions bot commented Mar 20, 2024

joe-img commented Mar 21, 2024

mgudim commented Apr 9, 2024

[LoopIdiomRecognizer] Implement CRC recognition #79295

Are you sure you want to change the base?

[LoopIdiomRecognizer] Implement CRC recognition #79295

Conversation

joe-img commented Jan 24, 2024

github-actions bot commented Jan 24, 2024

llvmbot commented Jan 24, 2024

joe-img commented Jan 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgudim commented Feb 7, 2024

joe-img commented Feb 21, 2024

mgudim commented Feb 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgudim commented Feb 24, 2024 • edited

joe-img commented Feb 29, 2024

mgudim commented Mar 7, 2024 • edited

mgudim commented Mar 7, 2024 • edited

github-actions bot commented Mar 20, 2024

joe-img commented Mar 21, 2024

mgudim commented Apr 9, 2024

mgudim commented Feb 24, 2024 •

edited

mgudim commented Mar 7, 2024 •

edited

mgudim commented Mar 7, 2024 •

edited