[RISCV] Implement `EmitTargetCodeForStrcmp` for unaligned case. #86645

mgudim · 2024-03-26T09:19:58Z

In case when strings are unaligned and of the arguments is a known constant
string we specialize the `strcmp` function.

First, we check the above two conditions in `EmitTargetCodeForStrcmp`
and if they are satisfied we emit target node `RISCVISD::STRCMP`. The
node has additional argument to indicate which of the strings (first or
second) was constant.
During `ISel` we match it to the pseudo instruction `PseudoSTRCMPI`.
Finally, during `FinalizeLowering` we expand the pseudo into code.

This optimization is triggered about 2000 times on C/C++ spec2017
benchmarks, but unfortunately it doesn't have any noticable performance impact
on the dynamic instruction count. This optimization is off by default.

Note that gcc already does this.

github-actions · 2024-03-26T09:23:48Z

✅ With the latest revision this PR passed the C/C++ code formatter.

topperc · 2024-03-26T20:38:00Z

llvm/lib/Target/RISCV/RISCVInstrInfo.td

@@ -137,6 +137,7 @@ def SPMem : MemOperand<SP>;

 def GPRCMem : MemOperand<GPRC>;

+


Stray change

fixed, thanks

llvmbot · 2024-04-02T04:47:20Z

@llvm/pr-subscribers-backend-risc-v

Author: Mikhail Gudim (mgudim)

Changes

In case when strings are unaligned and of the arguments is a known constant
string we specialize the `strcmp` function.

First, we check the above two conditions in `EmitTargetCodeForStrcmp`
and if they are satisfied we emit target node `RISCVISD::STRCMP`. The
node has additional argument to indicate which of the strings (first or
second) was constant.
During `ISel` we match it to the pseudo instruction `PseudoSTRCMPI`.
Finally, during `FinalizeLowering` we expand the pseudo into code.

This optimization is triggered about 2000 times on C/C++ spec2017
benchmarks, but unfortunately it doesn't have any noticable performance impact
on the dynamic instruction count. This optimization is off by default.

Note that gcc already does this.

Patch is 32.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86645.diff

9 Files Affected:

(modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+164)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+23)
(added) llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.cpp (+127)
(added) llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.h (+33)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+1)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+2-1)
(added) llvm/test/CodeGen/RISCV/specialize-strcmp.ll (+371)

diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 8715403f3839a6..6229029106ae2a 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -52,6 +52,7 @@ add_llvm_target(RISCVCodeGen
   RISCVPushPopOptimizer.cpp
   RISCVRegisterInfo.cpp
   RISCVSubtarget.cpp
+  RISCVSelectionDAGTargetInfo.cpp
   RISCVTargetMachine.cpp
   RISCVTargetObjectFile.cpp
   RISCVTargetTransformInfo.cpp
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index e647f56416bfa6..5ae7f536bfb968 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17655,6 +17655,167 @@ static MachineBasicBlock *emitFROUND(MachineInstr &MI, MachineBasicBlock *MBB,
   return DoneMBB;
 }
 
+static MachineBasicBlock *emitSTRCMPI(MachineInstr &MI, MachineBasicBlock *MBB,
+                                      const RISCVSubtarget &Subtarget) {
+
+  const RISCVInstrInfo &TII = *Subtarget.getInstrInfo();
+  MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();
+  MachineFunction &MF = *MI.getParent()->getParent();
+  DebugLoc DL = MI.getDebugLoc();
+
+  const GlobalVariable *GV = cast<GlobalVariable>(MI.getOperand(2).getGlobal());
+  StringRef Str = cast<ConstantDataArray>(GV->getInitializer())->getAsCString();
+  int NumOfBytes = Str.str().length();
+  const BasicBlock *LLVM_BB = MBB->getBasicBlock();
+  MachineFunction::iterator MBBI = ++MBB->getIterator();
+
+  MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(LLVM_BB);
+  MF.insert(MBBI, ExitMBB);
+  ExitMBB->splice(ExitMBB->end(), MBB, std::next(MI.getIterator()), MBB->end());
+  ExitMBB->transferSuccessorsAndUpdatePHIs(MBB);
+  MBBI = ExitMBB->getIterator();
+
+  // In the code below we assume that the constant string is second argument
+  // and negate the result if needed.
+  bool NeedToNegateResult = MI.getOperand(3).getImm() == 0;
+  Register PHIReg = NeedToNegateResult
+                        ? MRI.createVirtualRegister(&RISCV::GPRRegClass)
+                        : MI.getOperand(0).getReg();
+  MachineInstrBuilder PHI_MIB =
+      BuildMI(*ExitMBB, ExitMBB->begin(), DL, TII.get(RISCV::PHI), PHIReg);
+  if (NeedToNegateResult) {
+    BuildMI(*ExitMBB, ++ExitMBB->begin(), DL, TII.get(RISCV::SUB),
+            MI.getOperand(0).getReg())
+        .addReg(RISCV::X0)
+        .addReg(PHIReg);
+  }
+
+  MachineBasicBlock *ReturnEarlyNullByteMBB =
+      MF.CreateMachineBasicBlock(LLVM_BB);
+  MF.insert(MBBI, ReturnEarlyNullByteMBB);
+  Register NegReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(*ReturnEarlyNullByteMBB, ReturnEarlyNullByteMBB->end(), DL,
+          TII.get(RISCV::ADDI), NegReg)
+      .addReg(RISCV::X0)
+      .addImm(-1);
+  ReturnEarlyNullByteMBB->addSuccessor(ExitMBB);
+  PHI_MIB.addReg(NegReg).addMBB(ReturnEarlyNullByteMBB);
+  MBBI = ReturnEarlyNullByteMBB->getIterator();
+
+  Register BaseReg = MI.getOperand(1).getReg();
+  MachineMemOperand &MMO = *MI.memoperands()[0];
+
+  MachineBasicBlock *CheckNullByteMBB = MF.CreateMachineBasicBlock(LLVM_BB);
+  MF.insert(MBBI, CheckNullByteMBB);
+  Register LoadedLastByteReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  MachineInstr &LoadLastByteMI =
+      *BuildMI(*CheckNullByteMBB, CheckNullByteMBB->end(), DL,
+               TII.get(RISCV::LBU), LoadedLastByteReg)
+           .addReg(BaseReg)
+           .addImm(NumOfBytes)
+           .cloneMemRefs(MI)
+           .getInstr();
+  MachineMemOperand *NewMMO = MF.getMachineMemOperand(
+      MMO.getPointerInfo(), MachineMemOperand::MOLoad, LLT(MVT::i8), Align(1));
+  LoadLastByteMI.setMemRefs(MF, {NewMMO});
+  LoadLastByteMI.memoperands()[0]->setOffset(NumOfBytes);
+
+  Register NegLoadedLastByteReg =
+      MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(*CheckNullByteMBB, CheckNullByteMBB->end(), DL, TII.get(RISCV::SUB),
+          NegLoadedLastByteReg)
+      .addReg(RISCV::X0)
+      .addReg(LoadedLastByteReg);
+  BuildMI(*CheckNullByteMBB, CheckNullByteMBB->end(), DL,
+          TII.get(RISCV::PseudoBR))
+      .addMBB(ExitMBB);
+  CheckNullByteMBB->addSuccessor(ExitMBB);
+  PHI_MIB.addReg(NegLoadedLastByteReg).addMBB(CheckNullByteMBB);
+  MBBI = CheckNullByteMBB->getIterator();
+
+  // First byte will be processed in the original MBB.
+  // Create NewMBBs for all other (non-null) bytes.
+  MachineFunction::iterator NewMBBI = MBBI;
+  SmallVector<MachineBasicBlock *> NewMBBs(NumOfBytes);
+  for (int i = NumOfBytes - 2; i >= 0; --i) {
+    MachineBasicBlock *NewMBB = MF.CreateMachineBasicBlock(LLVM_BB);
+    NewMBBs[i] = NewMBB;
+    MF.insert(NewMBBI, NewMBB);
+    NewMBBI = NewMBB->getIterator();
+  }
+  // The CheckNullByteMBB will be a fall-through successor
+  // of the block checking last non-null byte.
+  NewMBBs[NumOfBytes - 1] = CheckNullByteMBB;
+
+  int64_t Offset = 0;
+  char Byte = Str[0];
+  MachineBasicBlock::iterator MII = std::next(MI.getIterator());
+  MachineBasicBlock *CurrMBB = MBB;
+  MachineBasicBlock *NextMBB = NewMBBs[0];
+
+  auto emitCodeToCheckOneByteEquality = [&] {
+    Register LoadedByteReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    MachineInstr &LoadByteMI =
+        *BuildMI(*CurrMBB, MII, DL, TII.get(RISCV::LBU), LoadedByteReg)
+             .addReg(BaseReg)
+             .addImm(Offset)
+             .cloneMemRefs(MI)
+             .getInstr();
+    MachineMemOperand *NewMMO =
+        MF.getMachineMemOperand(MMO.getPointerInfo(), MachineMemOperand::MOLoad,
+                                LLT(MVT::i8), Align(1));
+    LoadByteMI.setMemRefs(MF, {NewMMO});
+    LoadByteMI.memoperands()[0]->setOffset(Offset);
+
+    BuildMI(*CurrMBB, MII, DL, TII.get(RISCV::BEQ))
+        .addReg(LoadedByteReg)
+        .addReg(RISCV::X0)
+        .addMBB(ReturnEarlyNullByteMBB);
+
+    MBBI = NextMBB->getIterator();
+    MachineBasicBlock *CheckBytesEqualMBB = MF.CreateMachineBasicBlock(LLVM_BB);
+    MF.insert(MBBI, CheckBytesEqualMBB);
+    CurrMBB->addSuccessor(ReturnEarlyNullByteMBB);
+    CurrMBB->addSuccessor(CheckBytesEqualMBB);
+
+    MachineBasicBlock::iterator CheckBytesEqualMMBI =
+        CheckBytesEqualMBB->begin();
+    Register DiffReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(*CheckBytesEqualMBB, CheckBytesEqualMMBI, DL, TII.get(RISCV::ADDI),
+            DiffReg)
+        .addReg(LoadedByteReg)
+        .addImm(-Byte);
+
+    BuildMI(*CheckBytesEqualMBB, CheckBytesEqualMMBI, DL, TII.get(RISCV::BNE))
+        .addReg(DiffReg)
+        .addReg(RISCV::X0)
+        .addMBB(ExitMBB);
+
+    CheckBytesEqualMBB->addSuccessor(ExitMBB);
+    PHI_MIB.addReg(DiffReg).addMBB(CheckBytesEqualMBB);
+    CheckBytesEqualMBB->addSuccessor(NextMBB);
+  };
+
+  // Check the first byte.
+  emitCodeToCheckOneByteEquality();
+
+  for (int i = 0; i < NumOfBytes - 1; ++i) {
+    ++Offset;
+    Byte = Str[i + 1];
+    CurrMBB = NewMBBs[i];
+    MII = CurrMBB->begin();
+    NextMBB = NewMBBs[i + 1];
+    // Check all other non-null bytes.
+    // On the last iteration of this loop,
+    // NextMBB is CheckNullByteMBB, so it will become
+    // a fall-through successor of basic block checking last non-null byte.
+    emitCodeToCheckOneByteEquality();
+  }
+
+  MI.eraseFromParent();
+  return ExitMBB;
+}
+
 MachineBasicBlock *
 RISCVTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
                                                  MachineBasicBlock *BB) const {
@@ -17737,6 +17898,8 @@ RISCVTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
   case RISCV::PseudoFROUND_D_INX:
   case RISCV::PseudoFROUND_D_IN32X:
     return emitFROUND(MI, BB, Subtarget);
+  case RISCV::PseudoSTRCMPI:
+    return emitSTRCMPI(MI, BB, Subtarget);
   case TargetOpcode::STATEPOINT:
   case TargetOpcode::STACKMAP:
   case TargetOpcode::PATCHPOINT:
@@ -19512,6 +19675,7 @@ const char *RISCVTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(SWAP_CSR)
   NODE_NAME_CASE(CZERO_EQZ)
   NODE_NAME_CASE(CZERO_NEZ)
+  NODE_NAME_CASE(STRCMP)
   NODE_NAME_CASE(SF_VC_XV_SE)
   NODE_NAME_CASE(SF_VC_IV_SE)
   NODE_NAME_CASE(SF_VC_VV_SE)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index a38463f810270a..52dda10a56a666 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -456,6 +456,7 @@ enum NodeType : unsigned {
   TH_LDD,
   TH_SWD,
   TH_SDD,
+  STRCMP
 };
 // clang-format on
 } // namespace RISCVISD
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index e753c1f1add0c6..209a6380a88092 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -1952,6 +1952,29 @@ def : Pat<(shl (zext GPR:$rs), uimm5:$shamt),
           (SRLI (i64 (SLLI GPR:$rs, 32)), (ImmSubFrom32 uimm5:$shamt))>;
 }
 
+def riscv_strcmp : SDNode<
+  "RISCVISD::STRCMP",
+  SDTypeProfile<1, 2, [SDTCisPtrTy<0>, SDTCisPtrTy<1>]>,
+  [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]
+>;
+
+let usesCustomInserter = 1, mayLoad = 1, mayStore = 0, hasSideEffects = 0 in
+def PseudoSTRCMPI : Pseudo<
+  (outs GPR:$rd),
+  (ins GPR:$str1, i64imm:$str2, i64imm:$constant_str_idx),
+  []
+>;
+
+def : Pat<
+  (XLenVT (riscv_strcmp tglobaladdr:$str1, iPTR:$str2)),
+  (PseudoSTRCMPI GPR:$str2, tglobaladdr:$str1, 0)
+>;
+
+def : Pat<
+  (XLenVT (riscv_strcmp iPTR:$str1, tglobaladdr:$str2)),
+  (PseudoSTRCMPI GPR:$str1, tglobaladdr:$str2, 1)
+>;
+
 //===----------------------------------------------------------------------===//
 // Standard extensions
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.cpp b/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.cpp
new file mode 100644
index 00000000000000..12112103fadee2
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.cpp
@@ -0,0 +1,127 @@
+//===-- RISCVSelectionDAGTargetInfo.cpp - RISCV SelectionDAG Info
+//-----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the RISCVSelectionDAGTargetInfo class.
+//
+//===----------------------------------------------------------------------===//
+
+#include "RISCVSelectionDAGTargetInfo.h"
+#include "RISCVSubtarget.h"
+#include "llvm/CodeGen/SelectionDAG.h"
+#include "llvm/IR/GlobalValue.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/Type.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-selectiondag-target-info"
+
+static cl::opt<unsigned> MaxStrcmpSpecializeLength(
+    "riscv-max-strcmp-specialize-length", cl::Hidden,
+    cl::desc("Do not specialize strcmp if the length of constant string is "
+             "greater or equal to this parameter"),
+    cl::init(0));
+
+static bool canSpecializeStrcmp(const GlobalAddressSDNode *GA) {
+  const GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
+  if (!GV || !GV->isConstant() || !GV->hasInitializer())
+    return false;
+  // NOTE: this doesn't work for empty strings
+  const ConstantDataArray *CDA =
+      dyn_cast<ConstantDataArray>(GV->getInitializer());
+  if (!CDA || !CDA->isCString())
+    return false;
+
+  StringRef CString = CDA->getAsCString();
+  if (CString.str().length() >= MaxStrcmpSpecializeLength)
+    return false;
+
+  return true;
+}
+
+std::pair<SDValue, SDValue>
+RISCVSelectionDAGTargetInfo::EmitTargetCodeForStrcmp(
+    SelectionDAG &DAG, const SDLoc &DL, SDValue Chain, SDValue Src1,
+    SDValue Src2, MachinePointerInfo Op1PtrInfo,
+    MachinePointerInfo Op2PtrInfo) const {
+  // This is the default setting, so exit early if the optimization is turned
+  // off.
+  if (MaxStrcmpSpecializeLength == 0)
+    return std::make_pair(SDValue(), Chain);
+
+  const RISCVSubtarget &Subtarget =
+      DAG.getMachineFunction().getSubtarget<RISCVSubtarget>();
+  const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  MVT XLenVT = Subtarget.getXLenVT();
+  const DataLayout &DLayout = DAG.getDataLayout();
+
+  Align NeededAlignment = Align(XLenVT.getSizeInBits() / 8);
+  Align Src1Align;
+  Align Src2Align;
+  if (const Value *Src1V = dyn_cast_if_present<const Value *>(Op1PtrInfo.V)) {
+    Src1Align = Src1V->getPointerAlignment(DLayout);
+  }
+  if (const Value *Src2V = dyn_cast_if_present<const Value *>(Op2PtrInfo.V)) {
+    Src2Align = Src2V->getPointerAlignment(DLayout);
+  }
+  if (!(Src1Align < NeededAlignment || Src2Align < NeededAlignment))
+    return std::make_pair(SDValue(), Chain);
+
+  const GlobalAddressSDNode *CStringGA = nullptr;
+  SDValue Other;
+  MachinePointerInfo MPI;
+  bool ConstantStringIsSecond = false;
+
+  const GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Src1);
+  if (GA && canSpecializeStrcmp(GA)) {
+    CStringGA = GA;
+    Other = Src2;
+    MPI = Op2PtrInfo;
+  }
+  if (!CStringGA) {
+    GA = dyn_cast<GlobalAddressSDNode>(Src2);
+    if (GA && canSpecializeStrcmp(GA)) {
+      ConstantStringIsSecond = true;
+      CStringGA = GA;
+      Other = Src1;
+      MPI = Op1PtrInfo;
+    }
+  }
+
+  if (!CStringGA)
+    return std::make_pair(SDValue(), Chain);
+
+  // It could be that the non-constant string is actually aligned, but
+  // we can't prove it, so getPointerAlignment will return Align(1).
+  // In this case, if the constant string is sufficiently aligned, It is better
+  // to call to libc's strcmp?
+  Align ConstantStrAlignment = ConstantStringIsSecond ? Src2Align : Src1Align;
+  if (ConstantStrAlignment >= NeededAlignment)
+    return std::make_pair(SDValue(), Chain);
+
+  SDValue TGA = DAG.getTargetGlobalAddress(CStringGA->getGlobal(), DL,
+                                           TLI.getPointerTy(DLayout), 0,
+                                           CStringGA->getTargetFlags());
+
+  SDValue Str1 = TGA;
+  SDValue Str2 = Other;
+  if (ConstantStringIsSecond)
+    std::swap(Str1, Str2);
+
+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineMemOperand *MMO = MF.getMachineMemOperand(
+      MPI, MachineMemOperand::MOLoad, LLT(MVT::i8), Align(1));
+  // TODO: what should be the MemVT?
+  SDValue STRCMPNode = DAG.getMemIntrinsicNode(
+      RISCVISD::STRCMP, DL, DAG.getVTList(XLenVT, MVT::Other),
+      {Chain, Str1, Str2}, MVT::i8, MMO);
+
+  SDValue ChainOut = STRCMPNode.getValue(1);
+  return std::make_pair(STRCMPNode, ChainOut);
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.h b/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.h
new file mode 100644
index 00000000000000..1b95ff0e81a5a1
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.h
@@ -0,0 +1,33 @@
+//===-- RISCVSelectionDAGTargetInfo.h - RISCV SelectionDAG Info ---*- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the RISCV subclass for SelectionDAGTargetInfo.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_TARGET_RISCV_RISCVSELECTIONDAGINFO_H
+#define LLVM_LIB_TARGET_RISCV_RISCVSELECTIONDAGINFO_H
+
+#include "llvm/CodeGen/SelectionDAGTargetInfo.h"
+
+namespace llvm {
+
+class RISCVSelectionDAGTargetInfo : public SelectionDAGTargetInfo {
+public:
+  explicit RISCVSelectionDAGTargetInfo() = default;
+  std::pair<SDValue, SDValue>
+  EmitTargetCodeForStrcmp(SelectionDAG &DAG, const SDLoc &DL, SDValue Chain,
+                          SDValue Src1, SDValue Src2,
+                          MachinePointerInfo Op1PtrInfo,
+                          MachinePointerInfo Op2PtrInfo) const override;
+};
+
+} // end namespace llvm
+
+#endif
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index d3236bb07d56d5..00ec619b760fa6 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -16,6 +16,7 @@
 #include "GISel/RISCVRegisterBankInfo.h"
 #include "RISCV.h"
 #include "RISCVFrameLowering.h"
+#include "RISCVSelectionDAGTargetInfo.h"
 #include "RISCVTargetMachine.h"
 #include "llvm/CodeGen/MacroFusion.h"
 #include "llvm/CodeGen/ScheduleDAGMutation.h"
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index ba108912d93400..e4ad26d70c933f 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -17,6 +17,7 @@
 #include "RISCVFrameLowering.h"
 #include "RISCVISelLowering.h"
 #include "RISCVInstrInfo.h"
+#include "RISCVSelectionDAGTargetInfo.h"
 #include "llvm/CodeGen/GlobalISel/CallLowering.h"
 #include "llvm/CodeGen/GlobalISel/InstructionSelector.h"
 #include "llvm/CodeGen/GlobalISel/LegalizerInfo.h"
@@ -86,7 +87,7 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   RISCVInstrInfo InstrInfo;
   RISCVRegisterInfo RegInfo;
   RISCVTargetLowering TLInfo;
-  SelectionDAGTargetInfo TSInfo;
+  RISCVSelectionDAGTargetInfo TSInfo;
 
   /// Initializes using the passed in CPU and feature strings so that we can
   /// use initializer lists for subtarget initialization.
diff --git a/llvm/test/CodeGen/RISCV/specialize-strcmp.ll b/llvm/test/CodeGen/RISCV/specialize-strcmp.ll
new file mode 100644
index 00000000000000..8cbe641b1fe7e3
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/specialize-strcmp.ll
@@ -0,0 +1,371 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=riscv64 -riscv-max-strcmp-specialize-length=10 -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s
+
+target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
+target triple = "riscv64-unknown-linux-gnu"
+
+@str1 = private unnamed_addr constant [2 x i8] c"a\00", align 1
+@str2 = private unnamed_addr constant [3 x i8] c"ab\00", align 1
+@str3 = private unnamed_addr constant [4 x i8] c"abc\00", align 1
+@str4 = private unnamed_addr constant [2 x i8] c"a\00", align 8
+
+define i32 @test_1(ptr %x) {
+  ; CHECK-LABEL: name: test_1
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.4(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gpr = COPY $x10
+  ; CHECK-NEXT:   [[LBU:%[0-9]+]]:gpr = LBU [[COPY]], 0 :: (load (s8) from %ir.x)
+  ; CHECK-NEXT:   BEQ [[LBU]], $x0, %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[ADDI:%[0-9]+]]:gpr = ADDI [[LBU]], -97
+  ; CHECK-NEXT:   BNE [[ADDI]], $x0, %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[LBU1:%[0-9]+]]:gpr = LBU [[COPY]], 1 :: (load (s8) from %ir.x + 1)
+  ; CHECK-NEXT:   [[SUB:%[0-9]+]]:gpr = SUB $x0, [[LBU1]]
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[ADDI1:%[0-9]+]]:gpr = ADDI $x0, -1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1.entry:
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:gpr = PHI [[ADDI1]], %bb.2, [[SUB]], %bb.3, [[ADDI]], %bb.4
+  ; CHECK-NEXT:   [[SUB1:%[0-9]+]]:gpr = SUB $x0, [[PHI]]
+  ; CHECK-NEXT:   $x10 = COPY [[SUB1]]
+  ; CHECK-NEXT:   PseudoRET implicit $x10
+entry:
+  %call = call i32 @strcmp(ptr @str1, ptr %x)
+  ret i32 %call
+}
+
+define i32 @test_2(ptr %x) {
+  ; CHECK-LABEL: name: test_2
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.4(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gpr = COPY $x10
+  ; CHECK-NEXT:   [[LBU:%[0-9]+]]:gpr = LBU [[COPY]], 0 :: (load (s8) from %ir.x)
+  ; CHECK-NEXT:   BEQ [[LBU]], $x0, %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK...
[truncated]

mgudim · 2024-04-02T04:51:46Z

llvm/lib/Target/RISCV/RISCVSelectionDAGTargetInfo.cpp

+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineMemOperand *MMO = MF.getMachineMemOperand(
+      MPI, MachineMemOperand::MOLoad, LLT(MVT::i8), Align(1));
+  // TODO: what should be the MemVT?


What is the proper way to indicate that this will read a known number of bytes from its argument?

In case when strings are unaligned and of the arguments is a known constant string we specialize the `strcmp` function. First, we check the above two conditions in `EmitTargetCodeForStrcmp` and if they are satisfied we emit target node `RISCVISD::STRCMP`. The node has additional argument to indicate which of the strings (first or second) was constant. During `ISel` we match it to the pseudo instruction `PseudoSTRCMPI`. Finally, during `FinalizeLowering` we expand the pseudo into code. This optimization is triggered about 2000 times on C/C++ spec2017 benchmarks, but unfortunately it doesn't have any noticable performance impact on the dynamic instruction count. This optimization is off by default. Note that gcc already does this.

mgudim · 2024-04-02T05:59:56Z

Some possible further work:

align all constant strings to 8 bytes
vector version of this

mgudim · 2024-05-09T14:44:56Z

This was implemented in #89371 recently.
Closing.

mgudim requested review from topperc and dybv-sc March 26, 2024 09:19

topperc reviewed Mar 26, 2024

View reviewed changes

mgudim force-pushed the strcmp branch 5 times, most recently from 44462ca to e934e31 Compare April 2, 2024 04:43

mgudim changed the title ~~[RISCV][WIP] Emit code for strcmp for unaligned strings when one stri…~~ [RISCV] Implement EmitTargetCodeForStrcmp for unaligned case. Apr 2, 2024

mgudim marked this pull request as ready for review April 2, 2024 04:46

llvmbot added the backend:RISC-V label Apr 2, 2024

mgudim commented Apr 2, 2024

View reviewed changes

mgudim force-pushed the strcmp branch from e934e31 to f68dd9d Compare April 2, 2024 05:58

mgudim closed this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Implement `EmitTargetCodeForStrcmp` for unaligned case. #86645

[RISCV] Implement `EmitTargetCodeForStrcmp` for unaligned case. #86645

mgudim commented Mar 26, 2024 •

edited

github-actions bot commented Mar 26, 2024 •

edited

topperc Mar 26, 2024

mgudim Apr 2, 2024

llvmbot commented Apr 2, 2024

mgudim Apr 2, 2024

mgudim commented Apr 2, 2024

mgudim commented May 9, 2024

		@@ -137,6 +137,7 @@ def SPMem : MemOperand<SP>;

		def GPRCMem : MemOperand<GPRC>;

[RISCV] Implement EmitTargetCodeForStrcmp for unaligned case. #86645

[RISCV] Implement EmitTargetCodeForStrcmp for unaligned case. #86645

Conversation

mgudim commented Mar 26, 2024 • edited

github-actions bot commented Mar 26, 2024 • edited

topperc Mar 26, 2024

Choose a reason for hiding this comment

mgudim Apr 2, 2024

Choose a reason for hiding this comment

llvmbot commented Apr 2, 2024

mgudim Apr 2, 2024

Choose a reason for hiding this comment

mgudim commented Apr 2, 2024

mgudim commented May 9, 2024

[RISCV] Implement `EmitTargetCodeForStrcmp` for unaligned case. #86645

[RISCV] Implement `EmitTargetCodeForStrcmp` for unaligned case. #86645

mgudim commented Mar 26, 2024 •

edited

github-actions bot commented Mar 26, 2024 •

edited