Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes #86149

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Mar 21, 2024

This patch is part of a set of patches that add an -fextend-lifetimes flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable -fextend-lifetimes, and an -fextend-this-ptr flag which functions as -fextend-lifetimes for this pointers only. All changes and tests in these patches were written by @wolfy1961, though I will be managing these reviews and addressing any comments raised. The extend lifetimes flag is intended to eventually be set on by -Og, as discussed in the RFC here.


This patch implements a new intrinsic instruction in LLVM, llvm.fake.use in IR and FAKE_USE in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA).

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2024

@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-ir

Author: Stephen Tozer (SLTozer)

Changes

This patch is part of a set of patches that add an -fextend-lifetimes flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable -fextend-lifetimes, and an -fextend-this-ptr flag which functions as -fextend-lifetimes for this pointers only. All changes and tests in these patches were written by @wolfy1961, though I will be managing these reviews and addressing any comments raised. The extend lifetimes flag is intended to eventually be set on by -Og, as discussed in the RFC here.


This patch implements a new intrinsic instruction in LLVM, llvm.fake.use in IR and FAKE_USE in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA).


Patch is 75.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86149.diff

46 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+36)
  • (modified) llvm/include/llvm/Analysis/PtrUseVisitor.h (+1)
  • (modified) llvm/include/llvm/CodeGen/ISDOpcodes.h (+5)
  • (modified) llvm/include/llvm/CodeGen/MachineInstr.h (+2)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAGISel.h (+1)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+3)
  • (modified) llvm/include/llvm/Support/TargetOpcodes.def (+3)
  • (modified) llvm/include/llvm/Target/Target.td (+8)
  • (modified) llvm/lib/CodeGen/Analysis.cpp (+2-1)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+43)
  • (modified) llvm/lib/CodeGen/CodeGenPrepare.cpp (+40-2)
  • (modified) llvm/lib/CodeGen/DeadMachineInstructionElim.cpp (+2-1)
  • (modified) llvm/lib/CodeGen/MachineCSE.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+2-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/FastISel.cpp (+3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp (+20)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (+19)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h (+7)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp (+11)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+36-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+18)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (+2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp (+64)
  • (modified) llvm/lib/IR/Instruction.cpp (+4-1)
  • (modified) llvm/lib/IR/Verifier.cpp (+1)
  • (modified) llvm/lib/Target/X86/X86FloatingPoint.cpp (+32)
  • (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+30)
  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+6)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+3)
  • (modified) llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp (+2-1)
  • (added) llvm/test/CodeGen/MIR/X86/fake-use-phi.mir (+95)
  • (added) llvm/test/CodeGen/MIR/X86/fake-use-scheduler.mir (+115)
  • (added) llvm/test/CodeGen/MIR/X86/fake-use-tailcall.mir (+106)
  • (added) llvm/test/CodeGen/MIR/X86/fake-use-zero-length.ll (+40)
  • (added) llvm/test/CodeGen/X86/fake-use-hpfloat.ll (+17)
  • (added) llvm/test/CodeGen/X86/fake-use-ld.ll (+51)
  • (added) llvm/test/CodeGen/X86/fake-use-simple-tail-call.ll (+33)
  • (added) llvm/test/CodeGen/X86/fake-use-split-ret.ll (+53)
  • (added) llvm/test/CodeGen/X86/fake-use-sroa.ll (+54)
  • (added) llvm/test/CodeGen/X86/fake-use-suppress-load.ll (+23)
  • (added) llvm/test/CodeGen/X86/fake-use-tailcall.ll (+23)
  • (added) llvm/test/CodeGen/X86/fake-use-vector.ll (+45)
  • (added) llvm/test/CodeGen/X86/fake-use-vector2.ll (+33)
  • (added) llvm/test/DebugInfo/X86/Inputs/check-fake-use.py (+100)
  • (added) llvm/test/DebugInfo/X86/fake-use.ll (+98)
  • (added) llvm/test/Transforms/GVN/fake-use-constprop.ll (+69)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8bc1cab01bf0a6..e7b8af3a2116c7 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -27896,6 +27896,42 @@ execution, but is unknown at compile time.
 If the result value does not fit in the result type, then the result is
 a :ref:`poison value <poisonvalues>`.
 
+.. _llvm_fake_use:
+
+'``llvm.fake.use``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+      declare void @llvm.fake.use(...)
+
+Overview:
+"""""""""
+
+The ``llvm.fake.use`` intrinsic is a no-op. It takes a single
+value as an operand and is treated as a use of that operand, to force the
+optimizer to preserve that value prior to the fake use. This is used for
+extending the lifetimes of variables, where this intrinsic placed at the end of
+a variable's scope helps prevent that variable from being optimized out.
+
+Arguments:
+""""""""""
+
+The ``llvm.fake.use`` intrinsic takes one argument, which may be any
+function-local SSA value. Note that the signature is variadic so that the
+intrinsic can take any type of argument, but passing more than one argument will
+result in an error.
+
+Semantics:
+""""""""""
+
+This intrinsic does nothing, but optimizers must consider it a use of its single
+operand and should try to preserve the intrinsic and its position in the
+function.
+
 
 Stack Map Intrinsics
 --------------------
diff --git a/llvm/include/llvm/Analysis/PtrUseVisitor.h b/llvm/include/llvm/Analysis/PtrUseVisitor.h
index 86206b2d5e9f88..4e46d441fce17e 100644
--- a/llvm/include/llvm/Analysis/PtrUseVisitor.h
+++ b/llvm/include/llvm/Analysis/PtrUseVisitor.h
@@ -279,6 +279,7 @@ class PtrUseVisitor : protected InstVisitor<DerivedT>,
     default:
       return Base::visitIntrinsicInst(II);
 
+    case Intrinsic::fake_use:
     case Intrinsic::lifetime_start:
     case Intrinsic::lifetime_end:
       return; // No-op intrinsics.
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index 49d51a27e3c0f6..43aa02b5b03762 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -1304,6 +1304,11 @@ enum NodeType {
   LIFETIME_START,
   LIFETIME_END,
 
+  /// FAKE_USE represents a use of the operand but does not do anything.
+  /// Its purpose is the extension of the operand's lifetime mainly for
+  /// debugging purposes.
+  FAKE_USE,
+
   /// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the
   /// beginning and end of GC transition  sequence, and carry arbitrary
   /// information that target might need for lowering.  The first operand is
diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index fcdd73d8b65fdd..163bcf6aa5a1a2 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h
@@ -1401,6 +1401,8 @@ class MachineInstr
     return getOpcode() == TargetOpcode::EXTRACT_SUBREG;
   }
 
+  bool isFakeUse() const { return getOpcode() == TargetOpcode::FAKE_USE; }
+
   /// Return true if the instruction behaves like a copy.
   /// This does not include native copy instructions.
   bool isCopyLike() const {
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGISel.h b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
index 837f8bf7263ea9..cb8dec554a57b6 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGISel.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
@@ -453,6 +453,7 @@ class SelectionDAGISel : public MachineFunctionPass {
   void Select_READ_REGISTER(SDNode *Op);
   void Select_WRITE_REGISTER(SDNode *Op);
   void Select_UNDEF(SDNode *N);
+  void Select_FAKE_USE(SDNode *N);
   void CannotYetSelect(SDNode *N);
 
   void Select_FREEZE(SDNode *N);
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 091f9b38107989..23a9f586056267 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1778,6 +1778,9 @@ def int_is_constant : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_any_ty],
                                 [IntrNoMem, IntrWillReturn, IntrConvergent],
                                 "llvm.is.constant">;
 
+// Introduce a use of the argument without generating any code.
+def int_fake_use : Intrinsic<[], [llvm_vararg_ty]>;
+
 // Intrinsic to mask out bits of a pointer.
 // First argument must be pointer or vector of pointer. This is checked by the
 // verifier.
diff --git a/llvm/include/llvm/Support/TargetOpcodes.def b/llvm/include/llvm/Support/TargetOpcodes.def
index 899eaad5842ae0..9797bbe02a9b5c 100644
--- a/llvm/include/llvm/Support/TargetOpcodes.def
+++ b/llvm/include/llvm/Support/TargetOpcodes.def
@@ -217,6 +217,9 @@ HANDLE_TARGET_OPCODE(PATCHABLE_TYPED_EVENT_CALL)
 
 HANDLE_TARGET_OPCODE(ICALL_BRANCH_FUNNEL)
 
+/// Represents a use of the operand but generates no code.
+HANDLE_TARGET_OPCODE(FAKE_USE)
+
 // This is a fence with the singlethread scope. It represents a compiler memory
 // barrier, but does not correspond to any generated instruction.
 HANDLE_TARGET_OPCODE(MEMBARRIER)
diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index cb1c0ed2513d45..2f5d9a5b8ea877 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -1401,6 +1401,14 @@ def FAULTING_OP : StandardPseudoInstruction {
   let isTerminator = true;
   let isBranch = true;
 }
+def FAKE_USE : StandardPseudoInstruction {
+  // An instruction that uses its operands but does nothing.
+  let OutOperandList = (outs);
+  let InOperandList = (ins variable_ops);
+  let AsmString = "FAKE_USE";
+  let hasSideEffects = 0;
+  let isMeta = true;
+}
 def PATCHABLE_OP : StandardPseudoInstruction {
   let OutOperandList = (outs);
   let InOperandList = (ins variable_ops);
diff --git a/llvm/lib/CodeGen/Analysis.cpp b/llvm/lib/CodeGen/Analysis.cpp
index af7643d93591f7..05840c963ea526 100644
--- a/llvm/lib/CodeGen/Analysis.cpp
+++ b/llvm/lib/CodeGen/Analysis.cpp
@@ -566,7 +566,8 @@ bool llvm::isInTailCallPosition(const CallBase &Call, const TargetMachine &TM) {
     if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))
       if (II->getIntrinsicID() == Intrinsic::lifetime_end ||
           II->getIntrinsicID() == Intrinsic::assume ||
-          II->getIntrinsicID() == Intrinsic::experimental_noalias_scope_decl)
+          II->getIntrinsicID() == Intrinsic::experimental_noalias_scope_decl ||
+          II->getIntrinsicID() == Intrinsic::fake_use)
         continue;
     if (BBI->mayHaveSideEffects() || BBI->mayReadFromMemory() ||
         !isSafeToSpeculativelyExecute(&*BBI))
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index a15538755d73b3..1e680630e6cb58 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1099,11 +1099,46 @@ void AsmPrinter::emitFunctionEntryLabel() {
   }
 }
 
+// Recognize cases where a spilled register is reloaded solely to feed into a
+// FAKE_USE.
+static bool isLoadFeedingIntoFakeUse(const MachineInstr &MI) {
+  const MachineFunction *MF = MI.getMF();
+  const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
+
+  // If the restore size is std::nullopt then we are not dealing with a reload
+  // of a spilled register.
+  if (!MI.getRestoreSize(TII))
+    return false;
+
+  // Check if this register is the operand of a FAKE_USE and
+  // does it have the kill flag set there.
+  auto NextI = std::next(MI.getIterator());
+  if (NextI == MI.getParent()->end() || !NextI->isFakeUse())
+    return false;
+
+  unsigned Reg = MI.getOperand(0).getReg();
+  for (const MachineOperand &MO : NextI->operands()) {
+    // Return true if we came across the register from the
+    // previous spill instruction that is killed in NextI.
+    if (MO.isReg() && MO.isUse() && MO.isKill() && MO.getReg() == Reg)
+      return true;
+  }
+
+  return false;
+}
+
 /// emitComments - Pretty-print comments for instructions.
 static void emitComments(const MachineInstr &MI, raw_ostream &CommentOS) {
   const MachineFunction *MF = MI.getMF();
   const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
 
+  // If this is a reload of a spilled register that only feeds into a FAKE_USE
+  // instruction, meaning the load value has no effect on the program and has
+  // only been kept alive for debugging; since it is still available on the
+  // stack, we can skip the load itself.
+  if (isLoadFeedingIntoFakeUse(MI))
+    return;
+
   // Check for spills and reloads
 
   // We assume a single instruction only has a spill or reload, not
@@ -1828,6 +1863,8 @@ void AsmPrinter::emitFunctionBody() {
       case TargetOpcode::KILL:
         if (isVerbose()) emitKill(&MI, *this);
         break;
+      case TargetOpcode::FAKE_USE:
+        break;
       case TargetOpcode::PSEUDO_PROBE:
         emitPseudoProbe(MI);
         break;
@@ -1843,6 +1880,12 @@ void AsmPrinter::emitFunctionBody() {
         // purely meta information.
         break;
       default:
+        // If this is a reload of a spilled register that only feeds into a
+        // FAKE_USE instruction, meaning the load value has no effect on the
+        // program and has only been kept alive for debugging; since it is
+        // still available on the stack, we can skip the load itself.
+        if (isLoadFeedingIntoFakeUse(MI))
+          break;
         emitInstruction(&MI);
         if (CanDoExtraAnalysis) {
           MCInst MCI;
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 9f99bb7e693f7e..a2314779c8e687 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -2663,12 +2663,34 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
     return false;
   };
 
+  SmallVector<const IntrinsicInst *, 4> FakeUses;
+
+  auto isFakeUse = [&FakeUses](const Instruction *Inst) {
+    if (auto *II = dyn_cast<IntrinsicInst>(Inst);
+        II && II->getIntrinsicID() == Intrinsic::fake_use) {
+      // Record the instruction so it can be preserved when the exit block is
+      // removed. Do not preserve the fake use that uses the result of the
+      // PHI instruction.
+      // Do not copy fake uses that use the result of a PHI node.
+      // FIXME: If we do want to copy the fake use into the return blocks, we
+      // have to figure out which of the PHI node operands to use for each
+      // copy.
+      if (!isa<PHINode>(II->getOperand(0))) {
+        FakeUses.push_back(II);
+      }
+      return true;
+    }
+
+    return false;
+  };
+
   // Make sure there are no instructions between the first instruction
   // and return.
   const Instruction *BI = BB->getFirstNonPHI();
   // Skip over debug and the bitcast.
   while (isa<DbgInfoIntrinsic>(BI) || BI == BCI || BI == EVI ||
-         isa<PseudoProbeInst>(BI) || isLifetimeEndOrBitCastFor(BI))
+         isa<PseudoProbeInst>(BI) || isLifetimeEndOrBitCastFor(BI) ||
+         isFakeUse(BI))
     BI = BI->getNextNode();
   if (BI != RetI)
     return false;
@@ -2677,6 +2699,7 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
   /// call.
   const Function *F = BB->getParent();
   SmallVector<BasicBlock *, 4> TailCallBBs;
+  SmallVector<CallInst *, 4> CallInsts;
   if (PN) {
     for (unsigned I = 0, E = PN->getNumIncomingValues(); I != E; ++I) {
       // Look through bitcasts.
@@ -2710,6 +2733,9 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
             attributesPermitTailCall(F, CI, RetI, *TLI))
           TailCallBBs.push_back(PredBB);
       }
+      // Record the call instruction so we can insert any fake uses
+      // that need to be preserved before it.
+      CallInsts.push_back(CI);
     }
   } else {
     SmallPtrSet<BasicBlock *, 4> VisitedBBs;
@@ -2726,6 +2752,9 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
               (isIntrinsicOrLFToBeTailCalled(TLInfo, CI) &&
                V == CI->getArgOperand(0))) {
             TailCallBBs.push_back(Pred);
+            // Record the call instruction so we can insert any fake uses
+            // that need to be preserved before it.
+            CallInsts.push_back(CI);
           }
         }
       }
@@ -2752,8 +2781,17 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB,
   }
 
   // If we eliminated all predecessors of the block, delete the block now.
-  if (Changed && !BB->hasAddressTaken() && pred_empty(BB))
+  if (Changed && !BB->hasAddressTaken() && pred_empty(BB)) {
+    // Copy the fake uses found in the original return block to all blocks
+    // that contain tail calls.
+    for (auto *CI : CallInsts) {
+      for (auto const *FakeUse : FakeUses) {
+        auto *ClonedInst = FakeUse->clone();
+        ClonedInst->insertBefore(CI);
+      }
+    }
     BB->eraseFromParent();
+  }
 
   return Changed;
 }
diff --git a/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp b/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
index facc01452d2f12..468949fd4cf498 100644
--- a/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
+++ b/llvm/lib/CodeGen/DeadMachineInstructionElim.cpp
@@ -87,7 +87,8 @@ bool DeadMachineInstructionElimImpl::isDead(const MachineInstr *MI) const {
     return false;
 
   // Don't delete frame allocation labels.
-  if (MI->getOpcode() == TargetOpcode::LOCAL_ESCAPE)
+  if (MI->getOpcode() == TargetOpcode::LOCAL_ESCAPE ||
+      MI->getOpcode() == TargetOpcode::FAKE_USE)
     return false;
 
   // Don't delete instructions with side effects.
diff --git a/llvm/lib/CodeGen/MachineCSE.cpp b/llvm/lib/CodeGen/MachineCSE.cpp
index 26a8d00e662651..c3c7f48677d5c2 100644
--- a/llvm/lib/CodeGen/MachineCSE.cpp
+++ b/llvm/lib/CodeGen/MachineCSE.cpp
@@ -406,7 +406,7 @@ bool MachineCSE::PhysRegDefsReach(MachineInstr *CSMI, MachineInstr *MI,
 
 bool MachineCSE::isCSECandidate(MachineInstr *MI) {
   if (MI->isPosition() || MI->isPHI() || MI->isImplicitDef() || MI->isKill() ||
-      MI->isInlineAsm() || MI->isDebugInstr() || MI->isJumpTableDebugInfo())
+      MI->isInlineAsm() || MI->isDebugInstr() || MI->isJumpTableDebugInfo() || MI->isFakeUse())
     return false;
 
   // Ignore copies.
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 0d5bf329938781..1cc611107e5984 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -524,7 +524,8 @@ static bool isSchedBoundary(MachineBasicBlock::iterator MI,
                             MachineBasicBlock *MBB,
                             MachineFunction *MF,
                             const TargetInstrInfo *TII) {
-  return MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF);
+  return MI->isCall() || TII->isSchedulingBoundary(*MI, MBB, *MF) ||
+         MI->isFakeUse();
 }
 
 /// A region of an MBB for scheduling.
diff --git a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
index 27b8472ddb73d8..5451abcdfcf9d2 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp
@@ -1461,6 +1461,9 @@ bool FastISel::selectIntrinsicCall(const IntrinsicInst *II) {
     updateValueMap(II, ResultReg);
     return true;
   }
+  case Intrinsic::fake_use:
+    // At -O0, we don't need fake use, so just ignore it.
+    return true;
   case Intrinsic::experimental_stackmap:
     return selectStackmap(II);
   case Intrinsic::experimental_patchpoint_void:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index 3332c02ec72358..ac1242ff59f164 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -2229,6 +2229,9 @@ bool DAGTypeLegalizer::PromoteFloatOperand(SDNode *N, unsigned OpNo) {
       report_fatal_error("Do not know how to promote this operator's operand!");
 
     case ISD::BITCAST:    R = PromoteFloatOp_BITCAST(N, OpNo); break;
+    case ISD::FAKE_USE:
+      R = PromoteFloatOp_FAKE_USE(N, OpNo);
+      break;
     case ISD::FCOPYSIGN:  R = PromoteFloatOp_FCOPYSIGN(N, OpNo); break;
     case ISD::FP_TO_SINT:
     case ISD::FP_TO_UINT:
@@ -2268,6 +2271,13 @@ SDValue DAGTypeLegalizer::PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo) {
   return DAG.getBitcast(N->getValueType(0), Convert);
 }
 
+SDValue DAGTypeLegalizer::PromoteFloatOp_FAKE_USE(SDNode *N, unsigned OpNo) {
+  assert(OpNo == 1 && "Only Operand 1 must need promotion here");
+  SDValue Op = GetPromotedFloat(N->getOperand(OpNo));
+  return DAG.getNode(N->getOpcode(), SDLoc(N), MVT::Other, N->getOperand(0),
+                     Op);
+}
+
 // Promote Operand 1 of FCOPYSIGN.  Operand 0 ought to be handled by
 // PromoteFloatRes_FCOPYSIGN.
 SDValue DAGTypeLegalizer::PromoteFloatOp_FCOPYSIGN(SDNode *N, unsigned OpNo) {
@@ -3138,6 +3148,9 @@ bool DAGTypeLegalizer::SoftPromoteHalfOperand(SDNode *N, unsigned OpNo) {
                        "operand!");
 
   case ISD::BITCAST:    Res = SoftPromoteHalfOp_BITCAST(N); break;
+  case ISD::FAKE_USE:
+    Res = SoftPromoteHalfOp_FAKE_USE(N, OpNo);
+    break;
   case ISD::FCOPYSIGN:  Res = SoftPromoteHalfOp_FCOPYSIGN(N, OpNo); break;
   case ISD::FP_TO_SINT:
   case ISD::FP_TO_UINT: Res = SoftPromoteHalfOp_FP_TO_XINT(N); break;
@@ -3175,6 +3188,13 @@ SDValue DAGTypeLegalizer::SoftPromoteHalfOp_BITCAST(SDNode *N) {
   return DAG.getNode(ISD::BITCAST, SDLoc(N), N->getValueType(0), Op0);
 }
 
+SDValue DAGTypeLegalizer::SoftPromoteHalfOp_FAKE_USE(SDNode *N, unsigned OpNo) {
+  assert(OpNo == 1 && "Only Operand 1 must need promotion here");
+  SDValue Op = GetSoftPromotedHalf(N->getOperand(OpNo));
+  return DAG.getNode(N->getOpcode(), SDLoc(N), MVT::Other, N->getOperand(0),
+                     Op);
+}
+
 SDValue DAGTypeLegalizer::SoftPromoteHalfOp_FCOPYSIGN(SDNode *N,
                                                       unsigned OpNo) {
   assert(OpNo == 1 && "Only Operand 1 must need promotion here");
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 93ce9c22af5525..022cb9a4a82cdc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -1818,6 +1818,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
   case ISD::BUILD_VECTOR: Res = PromoteIntOp_BUILD_VECTOR(N); break;
   case ISD::CONCAT_VECTORS: Res = PromoteIntOp_CONCAT_VECTORS(N); break;
   case ISD::EXTRACT_VECTOR_ELT: Res = PromoteIntOp_EXTRACT_VECTOR_ELT(N); break;
+  case ISD::FAKE_USE:
+    Res = PromoteIntOp_FAKE_USE(N);
+    break;
   case ISD::INSERT_VECTOR_ELT:
     Res = PromoteIntOp_INSERT_VECTOR_ELT(N, OpNo);
     break;
@@ -5124,6 +5127,9 @@ bool DAGTypeLegalizer::ExpandIntegerOperand(SDNode *N, unsigned OpNo) {
   case ISD::BR_CC:             Res = ExpandIntOp_BR_CC(N); break;
   case ISD::BUILD_VECTOR:      Res = ExpandOp_BUILD_VECTOR(N); break;
   case ISD::EXTRACT_ELEMENT:   Res = ExpandOp_EXTRACT_ELEMENT(N); break;
+  case ISD::FAKE_USE:
+    Res = ExpandOp_FAKE_USE(N);
+    break;
   case ISD::INSERT_VECTOR_ELT: Res = ExpandOp_INSERT_VECTOR_ELT(N); break;
   case ISD::SCALAR_TO_VECTOR:  Res = ExpandOp_SCALAR_TO_VECTOR(N); break;
   case ISD::SPLAT_VECTOR:      Res = ExpandIntOp_SPLAT_VECTOR(N); break;
@@ -5931,6 +5937,19 @@ SDValue DAGTypeLegalizer::PromoteIntOp_INSERT_SUBVECTOR(SDNode *N) {
   return DAG.getAnyExtOrTrunc(Ext, dl, N->getValueType(0));
 }
 
+// FIXME: We wouldn't need this if clang could promote short integers
+// that are arguments to FAKE_USE.
+SDValue DAGTypeLegalizer::PromoteIntOp_FAKE_USE(SDNode *N) {
+  SDLoc dl(N);
+  SDValue V0 = N->getOperand(0);
+  SDValue V1 = N->getOperand(1);
+  EVT InVT1 = V1.getValueType();
+  SDValue VPromoted =
+      DAG.getNode(ISD::ANY_EXTEND, dl,
+                  TLI.getTypeToTransformTo(*DAG.getContext(), InVT1), V1);
+  return DAG.getNode(N->getOpcode(), dl, N->getValueType(0), V0, VPromoted);
+}
+
 SDValue DAGTypeLegalizer::PromoteIntOp_EXTRACT_SUBVECTOR(SDNode *N) {
   SDLoc dl(N);
   SDValue V0 = GetPromotedInteger(N->getOperand(0));
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index e08acd36b41d4e..6a3f76ae4b6ded 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeG...
[truncated]

Copy link

github-actions bot commented Mar 21, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff f5c90f3000bc75a344bf01bd4e0401e3fb7f9453 c8b6aa4c5733eab712314eff67043ca3c1ee9431 -- llvm/include/llvm/Analysis/PtrUseVisitor.h llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/include/llvm/CodeGen/SelectionDAGISel.h llvm/lib/CodeGen/Analysis.cpp llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/DeadMachineInstructionElim.cpp llvm/lib/CodeGen/MachineCSE.cpp llvm/lib/CodeGen/MachineScheduler.cpp llvm/lib/CodeGen/SelectionDAG/FastISel.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp llvm/lib/IR/Instruction.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Target/X86/X86FloatingPoint.cpp llvm/lib/Transforms/Scalar/SROA.cpp llvm/lib/Transforms/Utils/CloneFunction.cpp llvm/lib/Transforms/Utils/Local.cpp llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/MachineCSE.cpp b/llvm/lib/CodeGen/MachineCSE.cpp
index c3c7f48677..a1e33d4b67 100644
--- a/llvm/lib/CodeGen/MachineCSE.cpp
+++ b/llvm/lib/CodeGen/MachineCSE.cpp
@@ -406,7 +406,8 @@ bool MachineCSE::PhysRegDefsReach(MachineInstr *CSMI, MachineInstr *MI,
 
 bool MachineCSE::isCSECandidate(MachineInstr *MI) {
   if (MI->isPosition() || MI->isPHI() || MI->isImplicitDef() || MI->isKill() ||
-      MI->isInlineAsm() || MI->isDebugInstr() || MI->isJumpTableDebugInfo() || MI->isFakeUse())
+      MI->isInlineAsm() || MI->isDebugInstr() || MI->isJumpTableDebugInfo() ||
+      MI->isFakeUse())
     return false;
 
   // Ignore copies.

@felipepiovezan
Copy link
Contributor

I only skimmed the diff, but noticed there is no global isel lowering here. Were there any challenges in doing that or was it not considered?

@pogo59
Copy link
Collaborator

pogo59 commented Mar 21, 2024

I only skimmed the diff, but noticed there is no global isel lowering here. Were there any challenges in doing that or was it not considered?

Our target of interest is X86, so I believe it was not considered. @wolfy1961 can correct me if I'm mis-remembering.

@SLTozer
Copy link
Contributor Author

SLTozer commented Mar 21, 2024

I only skimmed the diff, but noticed there is no global isel lowering here. Were there any challenges in doing that or was it not considered?

I wasn't the original patch writer, but I think it would be the latter. I'm not familiar with the internals of GlobalISel, so I'm not sure how confident I'd be implementing support for fake use there myself - though FWIW, since it's a no-op it's mostly just filling in the boilerplate, with a few more complex cases (such as moving fake uses before tail calls).

@felipepiovezan
Copy link
Contributor

felipepiovezan commented Mar 21, 2024

I only skimmed the diff, but noticed there is no global isel lowering here. Were there any challenges in doing that or was it not considered?

I wasn't the original patch writer, but I think it would be the latter. I'm not familiar with the internals of GlobalISel, so I'm not sure how confident I'd be implementing support for fake use there myself - though FWIW, since it's a no-op it's mostly just filling in the boilerplate, with a few more complex cases (such as moving fake uses before tail calls).

The reason I asked is -- and I could be wrong -- that GlobalISel will give up and fall back to SelectionDAG when it seems something it doesn't know how to lower. That usually happens in IRTranslator.cpp:

        if (translate(Inst))
          continue;

        ...
        reportTranslationError(*MF, *TPC, *ORE, R);
        return false;
      }

So, depending on how/when these intrinsics are generated, this patch could affect a few different targets.

edit: not this patch, but the one that would generate the intrinsics

@adrian-prantl
Copy link
Collaborator

So, depending on how/when these intrinsics are generated, this patch could affect a few different targets.

And it would introduce a situation we usually try to avoid: the presence of debug info could significantly change code generation.

@wolfy1961
Copy link
Collaborator

I only skimmed the diff, but noticed there is no global isel lowering here. Were there any challenges in doing that or was it not considered?

IIRC, the original patch predated Global Isel, or else it was in a very early stage at the time, so it was simply not considered.

Our target of interest is X86, so I believe it was not considered. @wolfy1961 can correct me if I'm mis-remembering.

The patch wasn't meant to be specific to any target, although our efforts were concentrating on X86.

@SLTozer
Copy link
Contributor Author

SLTozer commented Mar 21, 2024

So, depending on how/when these intrinsics are generated, this patch could affect a few different targets.

Interesting; I think falling back on SelectionDAG would produce a relatively normal result (emitting FAKE_USE), but hard to say whether there would be any knock-on effects. In the worst case scenario we could just drop fake uses in GlobalISel - it should be safe to do so, since the intrinsics themselves are no-ops, it just means that the values will no longer be protected from the start of ISel onwards.

And it would introduce a situation we usually try to avoid: the presence of debug info could significantly change code generation.

This intrinsic has no technical relation to debug info; the purpose is to improve debug info quality when enabled, but the codegen effects of this flag are unrelated to whether debug info is emitted or not; if you use -fextend-lifetimes without -g, you'll prevent optimizations that would have harmed debug info even when that debug info isn't present.

@wolfy1961
Copy link
Collaborator

So, depending on how/when these intrinsics are generated, this patch could affect a few different targets.

And it would introduce a situation we usually try to avoid: the presence of debug info could significantly change code generation.

Conceptually, though, extend-lifetimes is purely a codegen change. It could stand on its own, even though it would only be useful to improve debuggability. I think it's important to separate the 2 concepts. Debug info generation by itself does not trigger extend-lifetimes unless we specifically also request extend-lifetimes as well.

@felipepiovezan
Copy link
Contributor

Interesting; I think falling back on SelectionDAG would produce a relatively normal result (emitting FAKE_USE),

To clarify, the entire function would be skipped by GlobalISEL, not just the lowering of one instruction.

@felipepiovezan
Copy link
Contributor

Conceptually, though, extend-lifetimes is purely a codegen change. It could stand on its own, even though it would only be useful to improve debuggability. I think it's important to separate the 2 concepts. Debug info generation by itself does not trigger extend-lifetimes unless we specifically also request extend-lifetimes as well.

Isn't the following statement contradicting the above?

The extend lifetimes flag is intended to eventually be set on by -Og

@SLTozer
Copy link
Contributor Author

SLTozer commented Mar 21, 2024

To clarify, the entire function would be skipped by GlobalISEL, not just the lowering of one instruction.

Ah, I misunderstood - in which case forcing it to be dropped until a complete implementation is present would be preferable.

Isn't the following statement contradicting the above?

-Og is also purely a codegen change, it doesn't control whether or not debug info is emitted; in both cases, we're telling the compiler to change codegen to be better for debugging.

@arsenm
Copy link
Contributor

arsenm commented Mar 22, 2024

Passing through a pseudo through globalisel should be trivial; it will just need legalization handling approximately equivalent to whatever every target does for G_IMPLICIT_DEF

@pogo59
Copy link
Collaborator

pogo59 commented Apr 25, 2024

There was also a lightning talk at the 2017 US Dev Meeting. Our users have been happy with this feature for years.

@jryans
Copy link
Member

jryans commented Apr 25, 2024

I hope to provide an additional review of this work today / tomorrow.

Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to @OCHyams, this looks good to me overall, though it does touch some parts I am not familiar with (like the legalizer). Sony's multi-year usage of this downstream adds confidence in the work. The feature as a while has received positive feedback in various discussions, so it would be great to move forward here.

I believe the only major missing component is GlobalISel support. Once that has been added, I think this would be ready to go.

Comment on lines 1104 to 1107
static bool isLoadFeedingIntoFakeUse(const MachineInstr &MI) {
const MachineFunction *MF = MI.getMF();
const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably just pass in TII instead of doing the 4 level dance to get to it

Comment on lines 1120 to 1125
for (const MachineOperand &MO : NextI->operands()) {
// Return true if we came across the register from the
// previous spill instruction that is killed in NextI.
if (MO.isReg() && MO.isUse() && MO.isKill() && MO.getReg() == Reg)
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really work correctly for register aliases. Is this the same as MI.killsRegister?

Also it would be best to avoid relying on kill flags. Hopefully we will eliminate them someday

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for what to use instead? The intent here is to determine whether the only reason this load exists is for the immediately following fake_use; if we just removed the check for MO.isKill(), this would presumably lead to us deleting loads that are actually being used. Is there some alternative information, such as an analysis, that can determine whether any other instruction is using the loaded value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred method is to maintain LiveRegUnits while walking from the end of the block (which is the wrong direction for the asmprinter...)

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 4, 2024

Apologies for the delay on updating this patch, my attention has been elsewhere for a while. I've added GlobalISel support that seems to work - in the new and updated check-fake-use.py, we test that we can successfully extend the lifetimes of variables for --global-isel=1 -mtriple=aarch64--linux-gnu.

Re: check-fake-use.py, I agree that it's not ideal to need a script for this. Having looked a little more at it, I've changed the script so that instead of checking that the variable's location range covers from DW_AT_low_pc to within 16 bytes of DW_AT_high_pc, we check that the variable's location range covers at least from prologue_end† to epilogue_begin. I personally think testing for this instead of testing for exact byte offsets is more logical and less brittle. We could also get rid of the python script if we currently had a measurement in LLVM for "scope bytes covered excluding prologue/epilogue", but we don't, so for now I think this ought to stay. Thoughts @OCHyams @dwblaikie ?

† It might make sense to use DW_AT_low_pc rather than prologue_end when talking about parameter lifetimes, but for the AArch64 GlobalISel version the prologue precedes the first DBG_VALUE instruction (and this is correct by IR semantics imo!) so I chose prologue_end in this case.

@dwblaikie
Copy link
Collaborator

A prologue begin/end solution only applies to top level locals in a function, right? For smaller scopes they won't extend that far and would need to be measured relative to the high/low/ranges bytes covered by the scope the variable is within (& of course that'll always neglect the issues of variables starting their lifetime part-way through their scope ({ foo(); int x; foo(); } won't be live for the first foo call, etc))

Though if llvm-dwarfdump statistics need improvement, I'd hope that could be done rather than building a statistic in a separate script, by the sounds of it?

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 5, 2024

A prologue begin/end solution only applies to top level locals in a function, right?

Yes, the main point here is about what's needed to test this specific case more than a general point about llvm-dwarfdump --statistics, although there are several things that could be improved about that measurement as you've suggested. I think to summarize, there is a larger scale problem of how to measure debug scope coverage, and a smaller problem of how to determine whether -fextend-lifetimes has worked, which is conceptually simple (does the variable have a DW_AT_location that covers [prologue_end, epilogue_begin)), but for which there is no existing LLVM component that tells us that.

Though, if using a python script is a major blocker there are a few alternatives I think: we could implement this statistic in llvm-dwarfdump (though I'm not sure whether that would be a justifiable change); we could make this a Dexter test in cross-project-tests (at the cost of involving extra components that aren't really related to the test); or we could extract this information some other way, e.g. using some series of sed commands.

@tru
Copy link
Collaborator

tru commented Jun 5, 2024

Since python is already a dependency for running the tests I don't see a problem with it. While I am not sure we have tests implemented in python for clang, lldb has a lot of them so it's not unprecedented. I am interested in hearing what the objections would be in this case.

Copy link
Contributor

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since python is already a dependency for running the tests I don't see a problem with it. While I am not sure we have tests implemented in python for clang, lldb has a lot of them so it's not unprecedented. I am interested in hearing what the objections would be in this case.

I can't speak for @dwblaikie but my objection was based on the fact that I would personally rather see a more brittle test that is simpler. It's not an objection against using python, just a question of whether it's really needed here.

That said, I don't feel strongly enough to insist and can see the benefit in having a slightly fuzzy test. Especially if it's going to be fiddly due to different debug info emission coming out of SelectionDAG/GlobalISel.

Does the script handle exprloc DW_AT_locations (i.e., single location that implicitly covers the scope with no explicit range)?

; RUN: %llc_dwarf -O2 -debugger-tune=sce --global-isel=1 -mtriple=aarch64--linux-gnu -filetype=obj -dwarf-linkage-names=Abstract < %s | llvm-dwarfdump --debug-info --debug-line -v - -o %t
; RUN: %python %p/Inputs/check-fake-use.py %t
; RUN: sed -e 's,call void (...) @llvm.fake.use,;,' %s | %llc_dwarf - -O2 -debugger-tune=sce -filetype=obj -dwarf-linkage-names=Abstract | llvm-dwarfdump --debug-info --debug-line -v - -o %t
; RUN: not %python %p/Inputs/check-fake-use.py %t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check-fake-use.py has a few different fail states - perhaps we should FileCheck for Location list for 'b' does not cover the whole function here and below. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps - it didn't occur to me because we probably shouldn't see any other fail state emerge as a result of just turning off extend-lifetimes, but no harm in making it more specific!

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 6, 2024

Does the script handle exprloc DW_AT_locations (i.e., single location that implicitly covers the scope with no explicit range)?

No, although I think that shouldn't be possible? I think we only emit that for constants or declares.

notail call void (...) @llvm.fake.use(i32 %v)
ret void
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test a case where the value produced by the call isn't used by the fake use, and is the return value?

Comment on lines 17 to 23

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 9.0.0"}
!2 = !{i32 -2147471544}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the metadata

@@ -0,0 +1,54 @@
; RUN: opt -S -passes=sroa %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in test/Transforms/SROA, not codegen

@@ -0,0 +1,53 @@
; RUN: opt -mtriple=x86_64-unknown-unknown -S -codegenprepare <%s -o - | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to test/Transforms/CodeGenPrepare/

Comment on lines 18 to 21
; ModuleID = 'test.cpp'
source_filename = "test.cpp"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this

Comment on lines 48 to 53
!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{!"clang version 7.0.0"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the metadata

@@ -0,0 +1,17 @@
; assert in DAGlegalizer with fake use of half precision float.
; Changes to half float promotion.
; RUN: llc -O2 -stop-after=finalize-isel -filetype=asm -o - %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-O2 and -filetype=asm are the defaults


!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{!"clang version 10.0.0"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at end of file

@@ -0,0 +1,40 @@
; RUN: llc < %s -stop-after=finalize-isel | FileCheck %s --implicit-check-not=FAKE_USE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/CodeGen/MIR is for testing of the MIR printer/parser, not tests that check MIR output. This should go to test/CodeGen/X86

@@ -0,0 +1,95 @@
# RUN: llc < %s -x mir -run-pass=codegenprepare | FileCheck %s --implicit-check-not="llvm.fake.use"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't make sense, and I'm surprised this doesn't error on you. You're trying to run an IR pass on MIR, and the IR is supposed to just be a constant here and the MIR should not exist yet. You should move this to test/CodeGenPrepare and run it with opt

@OCHyams
Copy link
Contributor

OCHyams commented Jun 6, 2024

I said
Does the script handle exprloc DW_AT_locations (i.e., single location that implicitly covers the scope with no explicit range)?

@SLTozer said
No, although I think that shouldn't be possible? I think we only emit that for constants or declares.

It's definitely possible:

$ cat test.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-unknown"

define i32 @f(i32 %a) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %a, metadata !8, metadata !DIExpression()), !dbg !10
  ret i32 0, !dbg !10
}

declare void @llvm.dbg.value(metadata, metadata, metadata)

!llvm.dbg.cu = !{!0}
!llvm.debugify = !{!2, !2}
!llvm.module.flags = !{!3}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "test.ll", directory: "/")
!2 = !{i32 1}
!3 = !{i32 2, !"Debug Info Version", i32 3}
!4 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !5, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !7)
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !{!8}
!8 = !DILocalVariable(name: "1", scope: !4, file: !1, line: 1, type: !9)
!9 = !DIBasicType(name: "ty32", size: 32, encoding: DW_ATE_unsigned)
!10 = !DILocation(line: 1, column: 1, scope: !4)
$ llc test.ll --filetype=obj -o - | llvm-dwarfdump -
<...>
0x00000043:     DW_TAG_variable
                  DW_AT_location        (DW_OP_reg5 RDI)
                  DW_AT_name    ("1")
                  DW_AT_decl_file       ("/test.ll")
                  DW_AT_decl_line       (1)
                  DW_AT_type    (0x00000051 "ty32")

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 6, 2024

It's definitely possible:

Ah that's true, if it's going to be live at a single location for every instruction in the function then it'll emit with a fixed location. That shouldn't happen in this test since we have the big register kill operation, but no point in making assumptions when we can add coverage instead - will update.

Comment on lines 1134 to 1137
// If this is a reload of a spilled register that only feeds into a FAKE_USE
// instruction, meaning the load value has no effect on the program and has
// only been kept alive for debugging; since it is still available on the
// stack, we can skip the load itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment in this context. The load isn't being skipped, it's just skipping the comments about the instruction

attributes #1 = { nocallback nofree nosync nounwind willreturn }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop this

%0:gr64 = COPY $rdi
%1:gr64_with_sub_8bit = MOVSX64rm32 $rip, 1, $noreg, @glb, $noreg :: (dereferenceable load (s32) from @glb, align 16, !tbaa !2)
MOV32mr %0, 1, $noreg, 0, $noreg, %1.sub_32bit :: (store (s32) into %ir.p, !tbaa !2)
%3:gr64_with_sub_8bit = MOVSX64rm32 $rip, 1, $noreg, @glb + 4, $noreg :: (dereferenceable load (s32) from `ptr getelementptr inbounds ([100 x i32], ptr @glb, i64 0, i64 1)`, !tbaa !2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the IR references?

liveins:
- { reg: '$rdi', virtual-reg: '%0' }
frameInfo:
isFrameAddressTaken: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove most of this, -simplify-mir helps

@@ -0,0 +1,223 @@
# Prevent the machine scheduler from moving instructions past FAKE_USE.
# RUN: llc -run-pass machine-scheduler -debug-only=machine-scheduler 2>&1 -o - %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs in test/CodeGen/X86, not under MIR as it is not a MIR support test

Comment on lines 1120 to 1125
for (const MachineOperand &MO : NextI->operands()) {
// Return true if we came across the register from the
// previous spill instruction that is killed in NextI.
if (MO.isReg() && MO.isUse() && MO.isKill() && MO.getReg() == Reg)
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred method is to maintain LiveRegUnits while walking from the end of the block (which is the wrong direction for the asmprinter...)

Comment on lines 1887 to 1888
if (isLoadFeedingIntoFakeUse(MI, TII))
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty inappropriate for an instruction to disappear in the AsmPrinter. Can you handle this optimization the same way we do for kill?

@dwblaikie
Copy link
Collaborator

Since python is already a dependency for running the tests I don't see a problem with it. While I am not sure we have tests implemented in python for clang, lldb has a lot of them so it's not unprecedented. I am interested in hearing what the objections would be in this case.

I can't speak for @dwblaikie but my objection was based on the fact that I would personally rather see a more brittle test that is simpler. It's not an objection against using python, just a question of whether it's really needed here.

Yeah, I jumped in the middle a bit - I haven't looked at the python script/tests in question yet.

But perhaps one example to discuss would be useful?

& yeah, for sufficiently narrow/single-transform tests I don't think it'd be that bad to test for the exact result...

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 6, 2024

Thank you @arsenm for the thorough reviews - I'll fix up the remaining comments (regarding AsmPrinter) early tomorrow.

But perhaps one example to discuss would be useful?

The script only applies to one test right now, though it's run in 4 different ways; the script is specialized for the test file, but it could be parameterized. The purpose of this test is essentially checking that FAKE_USE does what it says on the tin, that a variable that should otherwise by all rights be killed is available for its full scope.

The difficult part is that the test still isn't narrow; the test runs llc from IR to object with optimizations enabled and checks at the end that the variable under test is still available for its full scope. We could try making the checks exact, but it'd mean 4 different checks (or more if we increase coverage further) that could each change if any pass changes its behaviour. The actual check is simple I think - we read the output of dwarfdump to confirm that the parameter has a continuous location range that extends from [prologue_end, epilogue_begin). Ideally we'd have a proper way of doing this, but I think it'd be messy and brittle to substitute exact checks.

I think I'd be cautious on principle about adding special scripts to run a single test, but I don't see any easy alternatives without major drawbacks, and I think this is worth testing for; ultimately if the verdict comes down against the python script, my backup proposal is to make it a Dexter test (which will need to be moved to the subsequent patch which adds front-end support).

@SLTozer
Copy link
Contributor Author

SLTozer commented Jun 7, 2024

It ended up taking a little longer than I expected to make the AsmPrinter changes; I couldn't see a sensible place to put the "remove loads feeding into fake use" behaviour, so I made it into a pass. The main goal here was to have this take place very late, so that no subsequent pass takes advantage of the removed load/fake_use to shorten the stack lifetime of a variable, but before LiveDebugValues, so that LDV has an accurate understanding of which machine locations house the required values.

The pass itself runs only if the function has optdebug enabled, since those should be the only functions with fake uses (tests have been updated to reflect this); it's not an error if we don't run this pass for a function that has fake use instructions though. I'm happy to move it to a separate patch if it makes it easier for reviewers - even outside of test files, this patch is already very large.


SmallDenseMap<Register, SmallVector<MachineInstr *>> RegFakeUses;
LivePhysRegs.init(*TRI);
dbgs() << "Running the pass!\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray debug printing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up the removal for those already, but this was some very quick reviewing!

skipFunction(MF.getFunction()))
return false;

dbgs() << "We could run the pass!\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray debug printing

Comment on lines 124 to 126
dbgs() << "Reg is live: ";
TRI->dumpReg(Reg);
dbgs() << "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More leftover debugging

INITIALIZE_PASS_END(RemoveLoadsIntoFakeUses, DEBUG_TYPE,
"Remove Loads Into Fake Uses", false, false)

bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can just do this in ExpandPostRAPseudos and save having yet another pass.

This would also bring the side effect where I think we should have ExpandPostRAPseudos run in reverse and lazily maintain LiveRegUnits

Copy link
Contributor Author

@SLTozer SLTozer Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to move it into another pass, and patch up that pass as I go through it - just a quick question though: Might this step too far outside the scope of what ExpandPostRAPseudos is expected to do? If so, would that justify changing the name slightly, or do we prefer to keep those fixed over time?

And also a minor unrelated, I noticed that passes may be added to TargetPassConfig::addMachinePasses or to CodeGenPassBuilder<Derived>::addMachinePasses, and most (but not all) passes are in both - is there an important distinction between them for determining where a new pass should go?

Edit: Although one concern about moving it to ExpandPostRAPseudos would be if there are passes between that and LiveDebugValues that might shorten the stack lifetime of a variable - I don't see any obvious candidates, but if you know any off the top of your head that would be a good argument for not moving this behaviour into there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe we have an elide-reload-of-kill optimization but a 1 minute scan didn't find it, wherever that happens would be the best place.
TargetPassConfig::addMachinePasses = OldPM CodeGenPassBuilder = NewPM and incomplete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be MachineLateInstrsCleanup? I see that that relies on isKill() for liveness checks as well, but might otherwise make sense as a location. It's hard to tell whether there are any optimizations afterwards that might cause an issue; it'd probably be faster to implement and test it than scan through the source of all the applicable passes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not MachineLateInstrsCleanup (which is also running in the wrong direction). It's looking for killed register operands, not KILL instructions

@OCHyams
Copy link
Contributor

OCHyams commented Jun 7, 2024

Ah that's true, if it's going to be live at a single location for every instruction in the function then it'll emit with a fixed location. That shouldn't happen in this test since we have the big register kill operation, but no point in making assumptions when we can add coverage instead - will update.

Ah ok if it can't occur in the test then no need. We can extend it as and when necessary (that's better than having additional complexity that isn't used in the script).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet