Skip to content

Commit

Permalink
[CSSPGO] IR intrinsic for pseudo-probe block instrumentation
Browse files Browse the repository at this point in the history
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @Foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @Foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
  • Loading branch information
htyu committed Nov 20, 2020
1 parent da02327 commit f3c4456
Show file tree
Hide file tree
Showing 21 changed files with 154 additions and 63 deletions.
1 change: 1 addition & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Expand Up @@ -526,6 +526,7 @@ class TargetTransformInfoImplBase {
case Intrinsic::annotation:
case Intrinsic::assume:
case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:
case Intrinsic::dbg_declare:
case Intrinsic::dbg_value:
case Intrinsic::dbg_label:
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/CodeGen/BasicTTIImpl.h
Expand Up @@ -1436,6 +1436,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end:
case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:
return 0;
case Intrinsic::masked_store: {
Type *Ty = Tys[0];
Expand Down
35 changes: 21 additions & 14 deletions llvm/include/llvm/IR/BasicBlock.h
Expand Up @@ -165,19 +165,24 @@ class BasicBlock final : public Value, // Basic blocks are data objects also
}

/// Returns a pointer to the first instruction in this block that is not a
/// PHINode or a debug intrinsic.
const Instruction* getFirstNonPHIOrDbg() const;
Instruction* getFirstNonPHIOrDbg() {
/// PHINode or a debug intrinsic, or any pseudo operation if \c SkipPseudoOp
/// is true.
const Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = false) const;
Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = false) {
return const_cast<Instruction *>(
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbg());
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbg(
SkipPseudoOp));
}

/// Returns a pointer to the first instruction in this block that is not a
/// PHINode, a debug intrinsic, or a lifetime intrinsic.
const Instruction* getFirstNonPHIOrDbgOrLifetime() const;
Instruction* getFirstNonPHIOrDbgOrLifetime() {
/// PHINode, a debug intrinsic, or a lifetime intrinsic, or any pseudo
/// operation if \c SkipPseudoOp is true.
const Instruction *
getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = false) const;
Instruction *getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = false) {
return const_cast<Instruction *>(
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbgOrLifetime());
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbgOrLifetime(
SkipPseudoOp));
}

/// Returns an iterator to the first instruction in this block that is
Expand All @@ -191,16 +196,18 @@ class BasicBlock final : public Value, // Basic blocks are data objects also
}

/// Return a const iterator range over the instructions in the block, skipping
/// any debug instructions.
/// any debug instructions. Skip any pseudo operations as well if \c
/// SkipPseudoOp is true.
iterator_range<filter_iterator<BasicBlock::const_iterator,
std::function<bool(const Instruction &)>>>
instructionsWithoutDebug() const;
instructionsWithoutDebug(bool SkipPseudoOp = false) const;

/// Return an iterator range over the instructions in the block, skipping any
/// debug instructions.
iterator_range<filter_iterator<BasicBlock::iterator,
std::function<bool(Instruction &)>>>
instructionsWithoutDebug();
/// debug instructions. Skip and any pseudo operations as well if \c
/// SkipPseudoOp is true.
iterator_range<
filter_iterator<BasicBlock::iterator, std::function<bool(Instruction &)>>>
instructionsWithoutDebug(bool SkipPseudoOp = false);

/// Return the size of the basic block ignoring debug instructions
filter_iterator<BasicBlock::const_iterator,
Expand Down
22 changes: 14 additions & 8 deletions llvm/include/llvm/IR/Instruction.h
Expand Up @@ -651,19 +651,25 @@ class Instruction : public User,
bool isLifetimeStartOrEnd() const;

/// Return a pointer to the next non-debug instruction in the same basic
/// block as 'this', or nullptr if no such instruction exists.
const Instruction *getNextNonDebugInstruction() const;
Instruction *getNextNonDebugInstruction() {
/// block as 'this', or nullptr if no such instruction exists. Skip any pseudo
/// operations if \c SkipPseudoOp is true.
const Instruction *
getNextNonDebugInstruction(bool SkipPseudoOp = false) const;
Instruction *getNextNonDebugInstruction(bool SkipPseudoOp = false) {
return const_cast<Instruction *>(
static_cast<const Instruction *>(this)->getNextNonDebugInstruction());
static_cast<const Instruction *>(this)->getNextNonDebugInstruction(
SkipPseudoOp));
}

/// Return a pointer to the previous non-debug instruction in the same basic
/// block as 'this', or nullptr if no such instruction exists.
const Instruction *getPrevNonDebugInstruction() const;
Instruction *getPrevNonDebugInstruction() {
/// block as 'this', or nullptr if no such instruction exists. Skip any pseudo
/// operations if \c SkipPseudoOp is true.
const Instruction *
getPrevNonDebugInstruction(bool SkipPseudoOp = false) const;
Instruction *getPrevNonDebugInstruction(bool SkipPseudoOp = false) {
return const_cast<Instruction *>(
static_cast<const Instruction *>(this)->getPrevNonDebugInstruction());
static_cast<const Instruction *>(this)->getPrevNonDebugInstruction(
SkipPseudoOp));
}

/// Create a copy of 'this' instruction that is identical in all ways except
Expand Down
22 changes: 22 additions & 0 deletions llvm/include/llvm/IR/IntrinsicInst.h
Expand Up @@ -967,6 +967,28 @@ class InstrProfValueProfileInst : public IntrinsicInst {
}
};

class PseudoProbeInst : public IntrinsicInst {
public:
static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::pseudoprobe;
}

static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}

ConstantInt *getFuncGuid() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(0)));
}

ConstantInt *getAttributes() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(2)));
}

ConstantInt *getIndex() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(1)));
}
};
} // end namespace llvm

#endif // LLVM_IR_INTRINSICINST_H
7 changes: 7 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Expand Up @@ -1277,6 +1277,13 @@ def int_donothing : DefaultAttrsIntrinsic<[], [], [IntrNoMem, IntrWillReturn]>;
// which specify that infinite loops must be preserved.
def int_sideeffect : DefaultAttrsIntrinsic<[], [], [IntrInaccessibleMemOnly, IntrWillReturn]>;

// The pseudoprobe intrinsic works as a place holder to the block it probes.
// Like the sideeffect intrinsic defined above, this intrinsic is treated by the
// optimizer as having opaque side effects so that it won't be get rid of or moved
// out of the block it probes.
def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],
[IntrInaccessibleMemOnly, IntrWillReturn]>;

// Intrinsics to support half precision floating point format
let IntrProperties = [IntrNoMem, IntrWillReturn] in {
def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Analysis/AliasSetTracker.cpp
Expand Up @@ -439,6 +439,7 @@ void AliasSetTracker::addUnknown(Instruction *Inst) {
// FIXME: Add lifetime/invariant intrinsics (See: PR30807).
case Intrinsic::assume:
case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:
return;
}
}
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Analysis/InlineCost.cpp
Expand Up @@ -1911,6 +1911,10 @@ CallAnalyzer::analyzeBlock(BasicBlock *BB,
if (isa<DbgInfoIntrinsic>(I))
continue;

// Skip pseudo-probes.
if (isa<PseudoProbeInst>(I))
continue;

// Skip ephemeral values.
if (EphValues.count(&*I))
continue;
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Analysis/ValueTracking.cpp
Expand Up @@ -527,6 +527,7 @@ bool llvm::isAssumeLikeIntrinsic(const Instruction *I) {
// FIXME: This list is repeated from NoTTI::getIntrinsicCost.
case Intrinsic::assume:
case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:
case Intrinsic::dbg_declare:
case Intrinsic::dbg_value:
case Intrinsic::dbg_label:
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Analysis/VectorUtils.cpp
Expand Up @@ -125,7 +125,7 @@ Intrinsic::ID llvm::getVectorIntrinsicIDForCall(const CallInst *CI,

if (isTriviallyVectorizable(ID) || ID == Intrinsic::lifetime_start ||
ID == Intrinsic::lifetime_end || ID == Intrinsic::assume ||
ID == Intrinsic::sideeffect)
ID == Intrinsic::sideeffect || ID == Intrinsic::pseudoprobe)
return ID;
return Intrinsic::not_intrinsic;
}
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/Analysis.cpp
Expand Up @@ -537,6 +537,9 @@ bool llvm::isInTailCallPosition(const CallBase &Call, const TargetMachine &TM) {
// Debug info intrinsics do not get in the way of tail call optimization.
if (isa<DbgInfoIntrinsic>(BBI))
continue;
// Pseudo probe intrinsics do not block tail call optimization either.
if (isa<PseudoProbeInst>(BBI))
continue;
// A lifetime end or assume intrinsic should not stop tail call
// optimization.
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))
Expand Down
25 changes: 9 additions & 16 deletions llvm/lib/CodeGen/CodeGenPrepare.cpp
Expand Up @@ -2241,13 +2241,12 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB, bool &ModifiedDT
// Skip over debug and the bitcast.
do {
++BI;
} while (isa<DbgInfoIntrinsic>(BI) || &*BI == BCI || &*BI == EVI);
} while (isa<DbgInfoIntrinsic>(BI) || &*BI == BCI || &*BI == EVI ||
isa<PseudoProbeInst>(BI));
if (&*BI != RetI)
return false;
} else {
BasicBlock::iterator BI = BB->begin();
while (isa<DbgInfoIntrinsic>(BI)) ++BI;
if (&*BI != RetI)
if (BB->getFirstNonPHIOrDbg(true) != RetI)
return false;
}

Expand All @@ -2272,18 +2271,12 @@ bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB, bool &ModifiedDT
for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB); PI != PE; ++PI) {
if (!VisitedBBs.insert(*PI).second)
continue;

BasicBlock::InstListType &InstList = (*PI)->getInstList();
BasicBlock::InstListType::reverse_iterator RI = InstList.rbegin();
BasicBlock::InstListType::reverse_iterator RE = InstList.rend();
do { ++RI; } while (RI != RE && isa<DbgInfoIntrinsic>(&*RI));
if (RI == RE)
continue;

CallInst *CI = dyn_cast<CallInst>(&*RI);
if (CI && CI->use_empty() && TLI->mayBeEmittedAsTailCall(CI) &&
attributesPermitTailCall(F, CI, RetI, *TLI))
TailCallBBs.push_back(*PI);
if (Instruction *I = (*PI)->rbegin()->getPrevNonDebugInstruction(true)) {
CallInst *CI = dyn_cast<CallInst>(I);
if (CI && CI->use_empty() && TLI->mayBeEmittedAsTailCall(CI) &&
attributesPermitTailCall(F, CI, RetI, *TLI))
TailCallBBs.push_back(*PI);
}
}
}

Expand Down
38 changes: 25 additions & 13 deletions llvm/lib/IR/BasicBlock.cpp
Expand Up @@ -97,18 +97,20 @@ void BasicBlock::setParent(Function *parent) {

iterator_range<filter_iterator<BasicBlock::const_iterator,
std::function<bool(const Instruction &)>>>
BasicBlock::instructionsWithoutDebug() const {
std::function<bool(const Instruction &)> Fn = [](const Instruction &I) {
return !isa<DbgInfoIntrinsic>(I);
BasicBlock::instructionsWithoutDebug(bool SkipPseudoOp) const {
std::function<bool(const Instruction &)> Fn = [=](const Instruction &I) {
return !isa<DbgInfoIntrinsic>(I) &&
!(SkipPseudoOp && isa<PseudoProbeInst>(I));
};
return make_filter_range(*this, Fn);
}

iterator_range<filter_iterator<BasicBlock::iterator,
std::function<bool(Instruction &)>>>
BasicBlock::instructionsWithoutDebug() {
std::function<bool(Instruction &)> Fn = [](Instruction &I) {
return !isa<DbgInfoIntrinsic>(I);
iterator_range<
filter_iterator<BasicBlock::iterator, std::function<bool(Instruction &)>>>
BasicBlock::instructionsWithoutDebug(bool SkipPseudoOp) {
std::function<bool(Instruction &)> Fn = [=](Instruction &I) {
return !isa<DbgInfoIntrinsic>(I) &&
!(SkipPseudoOp && isa<PseudoProbeInst>(I));
};
return make_filter_range(*this, Fn);
}
Expand Down Expand Up @@ -218,21 +220,31 @@ const Instruction* BasicBlock::getFirstNonPHI() const {
return nullptr;
}

const Instruction* BasicBlock::getFirstNonPHIOrDbg() const {
for (const Instruction &I : *this)
if (!isa<PHINode>(I) && !isa<DbgInfoIntrinsic>(I))
return &I;
const Instruction *BasicBlock::getFirstNonPHIOrDbg(bool SkipPseudoOp) const {
for (const Instruction &I : *this) {
if (isa<PHINode>(I) || isa<DbgInfoIntrinsic>(I))
continue;

if (SkipPseudoOp && isa<PseudoProbeInst>(I))
continue;

return &I;
}
return nullptr;
}

const Instruction* BasicBlock::getFirstNonPHIOrDbgOrLifetime() const {
const Instruction *
BasicBlock::getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp) const {
for (const Instruction &I : *this) {
if (isa<PHINode>(I) || isa<DbgInfoIntrinsic>(I))
continue;

if (I.isLifetimeStartOrEnd())
continue;

if (SkipPseudoOp && isa<PseudoProbeInst>(I))
continue;

return &I;
}
return nullptr;
Expand Down
10 changes: 6 additions & 4 deletions llvm/lib/IR/Instruction.cpp
Expand Up @@ -641,16 +641,18 @@ bool Instruction::isLifetimeStartOrEnd() const {
return ID == Intrinsic::lifetime_start || ID == Intrinsic::lifetime_end;
}

const Instruction *Instruction::getNextNonDebugInstruction() const {
const Instruction *
Instruction::getNextNonDebugInstruction(bool SkipPseudoOp) const {
for (const Instruction *I = getNextNode(); I; I = I->getNextNode())
if (!isa<DbgInfoIntrinsic>(I))
if (!isa<DbgInfoIntrinsic>(I) && !(SkipPseudoOp && isa<PseudoProbeInst>(I)))
return I;
return nullptr;
}

const Instruction *Instruction::getPrevNonDebugInstruction() const {
const Instruction *
Instruction::getPrevNonDebugInstruction(bool SkipPseudoOp) const {
for (const Instruction *I = getPrevNode(); I; I = I->getPrevNode())
if (!isa<DbgInfoIntrinsic>(I))
if (!isa<DbgInfoIntrinsic>(I) && !(SkipPseudoOp && isa<PseudoProbeInst>(I)))
return I;
return nullptr;
}
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Transforms/Scalar/JumpThreading.cpp
Expand Up @@ -543,6 +543,10 @@ static unsigned getJumpThreadDuplicationCost(BasicBlock *BB,
// Debugger intrinsics don't incur code size.
if (isa<DbgInfoIntrinsic>(I)) continue;

// Pseudo-probes don't incur code size.
if (isa<PseudoProbeInst>(I))
continue;

// If this is a pointer->pointer bitcast, it is free.
if (isa<BitCastInst>(I) && I->getType()->isPointerTy())
continue;
Expand Down
8 changes: 6 additions & 2 deletions llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
Expand Up @@ -240,7 +240,11 @@ static bool markTails(Function &F, bool &AllCallsAreTailCalls,
Escaped = ESCAPED;

CallInst *CI = dyn_cast<CallInst>(&I);
if (!CI || CI->isTailCall() || isa<DbgInfoIntrinsic>(&I))
// A PseudoProbeInst has the IntrInaccessibleMemOnly tag hence it is
// considered accessing memory and will be marked as a tail call if we
// don't bail out here.
if (!CI || CI->isTailCall() || isa<DbgInfoIntrinsic>(&I) ||
isa<PseudoProbeInst>(&I))
continue;

bool IsNoTail = CI->isNoTailCall() || CI->hasOperandBundles();
Expand Down Expand Up @@ -752,7 +756,7 @@ bool TailRecursionEliminator::processBlock(
return false;

BasicBlock *Succ = BI->getSuccessor(0);
ReturnInst *Ret = dyn_cast<ReturnInst>(Succ->getFirstNonPHIOrDbg());
ReturnInst *Ret = dyn_cast<ReturnInst>(Succ->getFirstNonPHIOrDbg(true));

if (!Ret)
return false;
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Transforms/Utils/Evaluator.cpp
Expand Up @@ -551,6 +551,10 @@ bool Evaluator::EvaluateBlock(BasicBlock::iterator CurInst,
LLVM_DEBUG(dbgs() << "Skipping sideeffect intrinsic.\n");
++CurInst;
continue;
} else if (II->getIntrinsicID() == Intrinsic::pseudoprobe) {
LLVM_DEBUG(dbgs() << "Skipping pseudoprobe intrinsic.\n");
++CurInst;
continue;
}

LLVM_DEBUG(dbgs() << "Unknown intrinsic. Can not evaluate.\n");
Expand Down

0 comments on commit f3c4456

Please sign in to comment.