Skip to content

Commit

Permalink
[CSSPGO] Pseudo probes for function calls.
Browse files Browse the repository at this point in the history
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
  • Loading branch information
htyu committed Dec 2, 2020
1 parent dad5d95 commit 24d4291
Show file tree
Hide file tree
Showing 17 changed files with 292 additions and 19 deletions.
1 change: 1 addition & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Expand Up @@ -555,6 +555,7 @@ static bool initTargetOptions(DiagnosticsEngine &Diags,
Options.ForceDwarfFrameSection = CodeGenOpts.ForceDwarfFrameSection;
Options.EmitCallSiteInfo = CodeGenOpts.EmitCallSiteInfo;
Options.EnableAIXExtendedAltivecABI = CodeGenOpts.EnableAIXExtendedAltivecABI;
Options.PseudoProbeForProfiling = CodeGenOpts.PseudoProbeForProfiling;
Options.ValueTrackingVariableLocations =
CodeGenOpts.ValueTrackingVariableLocations;
Options.XRayOmitFunctionIndex = CodeGenOpts.XRayOmitFunctionIndex;
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/CodeGen/CommandFlags.h
Expand Up @@ -127,6 +127,8 @@ bool getEnableMachineFunctionSplitter();

bool getEnableDebugEntryValues();

bool getPseudoProbeForProfiling();

bool getValueTrackingVariableLocations();

bool getForceDwarfFrameSection();
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/Passes.h
Expand Up @@ -475,6 +475,9 @@ namespace llvm {
/// Create Hardware Loop pass. \see HardwareLoops.cpp
FunctionPass *createHardwareLoopsPass();

/// This pass inserts pseudo probe annotation for callsite profiling.
FunctionPass *createPseudoProbeInserter();

/// Create IR Type Promotion pass. \see TypePromotion.cpp
FunctionPass *createTypePromotionPass();

Expand Down
12 changes: 12 additions & 0 deletions llvm/include/llvm/IR/DebugInfoMetadata.h
Expand Up @@ -1698,6 +1698,18 @@ class DILocation : public MDNode {

inline unsigned getDiscriminator() const;

// For the regular discriminator, it stands for all empty components if all
// the lowest 3 bits are non-zero and all higher 29 bits are unused(zero by
// default). Here we fully leverage the higher 29 bits for pseudo probe use.
// This is the format:
// [2:0] - 0x7
// [31:3] - pseudo probe fields guaranteed to be non-zero as a whole
// So if the lower 3 bits is non-zero and the others has at least one
// non-zero bit, it guarantees to be a pseudo probe discriminator
inline static bool isPseudoProbeDiscriminator(unsigned Discriminator) {
return ((Discriminator & 0x7) == 0x7) && (Discriminator & 0xFFFFFFF8);
}

/// Returns a new DILocation with updated \p Discriminator.
inline const DILocation *cloneWithDiscriminator(unsigned Discriminator) const;

Expand Down
52 changes: 52 additions & 0 deletions llvm/include/llvm/IR/PseudoProbe.h
@@ -0,0 +1,52 @@
//===- PseudoProbe.h - Pseudo Probe IR Helpers ------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// Pseudo probe IR intrinsic and dwarf discriminator manipulation routines.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_IR_PSEUDOPROBE_H
#define LLVM_IR_PSEUDOPROBE_H

#include <cassert>
#include <cstdint>

namespace llvm {

enum class PseudoProbeType { Block = 0, IndirectCall, DirectCall };

struct PseudoProbeDwarfDiscriminator {
// The following APIs encodes/decodes per-probe information to/from a
// 32-bit integer which is organized as:
// [2:0] - 0x7, this is reserved for regular discriminator,
// see DWARF discriminator encoding rule
// [18:3] - probe id
// [25:19] - reserved
// [28:26] - probe type, see PseudoProbeType
// [31:29] - reserved for probe attributes
static uint32_t packProbeData(uint32_t Index, uint32_t Type) {
assert(Index <= 0xFFFF && "Probe index too big to encode, exceeding 2^16");
assert(Type <= 0x7 && "Probe type too big to encode, exceeding 7");
return (Index << 3) | (Type << 26) | 0x7;
}

static uint32_t extractProbeIndex(uint32_t Value) {
return (Value >> 3) & 0xFFFF;
}

static uint32_t extractProbeType(uint32_t Value) {
return (Value >> 26) & 0x7;
}

static uint32_t extractProbeAttributes(uint32_t Value) {
return (Value >> 29) & 0x7;
}
};
} // end namespace llvm

#endif // LLVM_IR_PSEUDOPROBE_H
1 change: 1 addition & 0 deletions llvm/include/llvm/InitializePasses.h
Expand Up @@ -361,6 +361,7 @@ void initializeProfileSummaryInfoWrapperPassPass(PassRegistry&);
void initializePromoteLegacyPassPass(PassRegistry&);
void initializePruneEHPass(PassRegistry&);
void initializeRABasicPass(PassRegistry&);
void initializePseudoProbeInserterPass(PassRegistry &);
void initializeRAGreedyPass(PassRegistry&);
void initializeReachingDefAnalysisPass(PassRegistry&);
void initializeReassociateLegacyPassPass(PassRegistry&);
Expand Down
8 changes: 8 additions & 0 deletions llvm/include/llvm/Passes/PassBuilder.h
Expand Up @@ -63,6 +63,14 @@ struct PGOOptions {
// PseudoProbeForProfiling needs to be true.
assert(this->Action != NoAction || this->CSAction != NoCSAction ||
this->DebugInfoForProfiling || this->PseudoProbeForProfiling);

// Pseudo probe emission does work with -fdebug-info-for-profiling since
// they both use the discriminator field of debug lines but for different
// purposes.
if (this->DebugInfoForProfiling && this->PseudoProbeForProfiling) {
report_fatal_error(
"Pseudo probes cannot be used with -debug-info-for-profiling", false);
}
}
std::string ProfileFile;
std::string CSProfileGenFile;
Expand Down
7 changes: 5 additions & 2 deletions llvm/include/llvm/Target/TargetOptions.h
Expand Up @@ -138,8 +138,8 @@ namespace llvm {
EnableMachineFunctionSplitter(false), SupportsDefaultOutlining(false),
EmitAddrsig(false), EmitCallSiteInfo(false),
SupportsDebugEntryValues(false), EnableDebugEntryValues(false),
ValueTrackingVariableLocations(false), ForceDwarfFrameSection(false),
XRayOmitFunctionIndex(false),
PseudoProbeForProfiling(false), ValueTrackingVariableLocations(false),
ForceDwarfFrameSection(false), XRayOmitFunctionIndex(false),
FPDenormalMode(DenormalMode::IEEE, DenormalMode::IEEE) {}

/// DisableFramePointerElim - This returns true if frame pointer elimination
Expand Down Expand Up @@ -309,6 +309,9 @@ namespace llvm {
/// production.
bool ShouldEmitDebugEntryValues() const;

/// Emit pseudo probes into the binary for sample profiling
unsigned PseudoProbeForProfiling : 1;

// When set to true, use experimental new debug variable location tracking,
// which seeks to follow the values of variables rather than their location,
// post isel.
Expand Down
8 changes: 7 additions & 1 deletion llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h
Expand Up @@ -17,6 +17,7 @@

#include "llvm/ADT/DenseMap.h"
#include "llvm/IR/PassManager.h"
#include "llvm/IR/PseudoProbe.h"
#include "llvm/Target/TargetMachine.h"
#include <unordered_map>

Expand All @@ -25,10 +26,10 @@ namespace llvm {
class Module;

using BlockIdMap = std::unordered_map<BasicBlock *, uint32_t>;
using InstructionIdMap = std::unordered_map<Instruction *, uint32_t>;

enum class PseudoProbeReservedId { Invalid = 0, Last = Invalid };

enum class PseudoProbeType { Block = 0 };

/// Sample profile pseudo prober.
///
Expand All @@ -42,13 +43,18 @@ class SampleProfileProber {
private:
Function *getFunction() const { return F; }
uint32_t getBlockId(const BasicBlock *BB) const;
uint32_t getCallsiteId(const Instruction *Call) const;
void computeProbeIdForBlocks();
void computeProbeIdForCallsites();

Function *F;

/// Map basic blocks to the their pseudo probe ids.
BlockIdMap BlockProbeIds;

/// Map indirect calls to the their pseudo probe ids.
InstructionIdMap CallProbeIds;

/// The ID of the last probe, Can be used to number a new probe.
uint32_t LastProbeId;
};
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/CMakeLists.txt
Expand Up @@ -122,6 +122,7 @@ add_llvm_component_library(LLVMCodeGen
PreISelIntrinsicLowering.cpp
ProcessImplicitDefs.cpp
PrologEpilogInserter.cpp
PseudoProbeInserter.cpp
PseudoSourceValue.cpp
RDFGraph.cpp
RDFLiveness.cpp
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/CodeGen/CommandFlags.cpp
Expand Up @@ -91,6 +91,7 @@ CGOPT(bool, EnableAddrsig)
CGOPT(bool, EmitCallSiteInfo)
CGOPT(bool, EnableMachineFunctionSplitter)
CGOPT(bool, EnableDebugEntryValues)
CGOPT(bool, PseudoProbeForProfiling)
CGOPT(bool, ValueTrackingVariableLocations)
CGOPT(bool, ForceDwarfFrameSection)
CGOPT(bool, XRayOmitFunctionIndex)
Expand Down Expand Up @@ -434,6 +435,11 @@ codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() {
cl::init(false));
CGBINDOPT(EnableDebugEntryValues);

static cl::opt<bool> PseudoProbeForProfiling(
"pseudo-probe-for-profiling", cl::desc("Emit pseudo probes for AutoFDO"),
cl::init(false));
CGBINDOPT(PseudoProbeForProfiling);

static cl::opt<bool> ValueTrackingVariableLocations(
"experimental-debug-variable-locations",
cl::desc("Use experimental new value-tracking variable locations"),
Expand Down Expand Up @@ -548,6 +554,7 @@ codegen::InitTargetOptionsFromCodeGenFlags(const Triple &TheTriple) {
Options.EmitAddrsig = getEnableAddrsig();
Options.EmitCallSiteInfo = getEmitCallSiteInfo();
Options.EnableDebugEntryValues = getEnableDebugEntryValues();
Options.PseudoProbeForProfiling = getPseudoProbeForProfiling();
Options.ValueTrackingVariableLocations = getValueTrackingVariableLocations();
Options.ForceDwarfFrameSection = getForceDwarfFrameSection();
Options.XRayOmitFunctionIndex = getXRayOmitFunctionIndex();
Expand Down
95 changes: 95 additions & 0 deletions llvm/lib/CodeGen/PseudoProbeInserter.cpp
@@ -0,0 +1,95 @@
//===- PseudoProbeInserter.cpp - Insert annotation for callsite profiling -===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements PseudoProbeInserter pass, which inserts pseudo probe
// annotations for call instructions with a pseudo-probe-specific dwarf
// discriminator. such discriminator indicates that the call instruction comes
// with a pseudo probe, and the discriminator value holds information to
// identify the corresponding counter.
//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/PseudoProbe.h"
#include "llvm/InitializePasses.h"
#include "llvm/Target/TargetMachine.h"
#include <unordered_map>

#define DEBUG_TYPE "pseudo-probe-inserter"

using namespace llvm;

namespace {
class PseudoProbeInserter : public MachineFunctionPass {
public:
static char ID;

PseudoProbeInserter() : MachineFunctionPass(ID) {
initializePseudoProbeInserterPass(*PassRegistry::getPassRegistry());
}

StringRef getPassName() const override { return "Pseudo Probe Inserter"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesAll();
MachineFunctionPass::getAnalysisUsage(AU);
}

bool runOnMachineFunction(MachineFunction &MF) override {
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
bool Changed = false;
for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {
if (MI.isCall()) {
if (DILocation *DL = MI.getDebugLoc()) {
auto Value = DL->getDiscriminator();
if (DILocation::isPseudoProbeDiscriminator(Value)) {
BuildMI(MBB, MI, DL, TII->get(TargetOpcode::PSEUDO_PROBE))
.addImm(getFuncGUID(MF.getFunction().getParent(), DL))
.addImm(
PseudoProbeDwarfDiscriminator::extractProbeIndex(Value))
.addImm(
PseudoProbeDwarfDiscriminator::extractProbeType(Value))
.addImm(PseudoProbeDwarfDiscriminator::extractProbeAttributes(
Value));
Changed = true;
}
}
}
}
}

return Changed;
}

private:
uint64_t getFuncGUID(Module *M, DILocation *DL) {
auto *SP = DL->getScope()->getSubprogram();
auto Name = SP->getLinkageName();
if (Name.empty())
Name = SP->getName();
return Function::getGUID(Name);
}
};
} // namespace

char PseudoProbeInserter::ID = 0;
INITIALIZE_PASS_BEGIN(PseudoProbeInserter, DEBUG_TYPE,
"Insert pseudo probe annotations for value profiling",
false, false)
INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
INITIALIZE_PASS_END(PseudoProbeInserter, DEBUG_TYPE,
"Insert pseudo probe annotations for value profiling",
false, false)

FunctionPass *llvm::createPseudoProbeInserter() {
return new PseudoProbeInserter();
}
3 changes: 2 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
Expand Up @@ -26,6 +26,7 @@
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/PseudoProbe.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
Expand Down Expand Up @@ -1133,7 +1134,7 @@ EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
BuildMI(*MBB, InsertPos, Node->getDebugLoc(), TII->get(TarOp))
.addImm(Guid)
.addImm(Index)
.addImm(0) // 0 for block probes
.addImm((uint8_t)PseudoProbeType::Block)
.addImm(Attr);
break;
}
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/TargetPassConfig.cpp
Expand Up @@ -1040,6 +1040,10 @@ void TargetPassConfig::addMachinePasses() {
// Add passes that directly emit MI after all other MI passes.
addPreEmitPass2();

// Insert pseudo probe annotation for callsite profiling
if (TM->Options.PseudoProbeForProfiling)
addPass(createPseudoProbeInserter());

AddingMachinePasses = false;
}

Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/X86/X86TargetMachine.cpp
Expand Up @@ -83,6 +83,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeX86Target() {
initializeX86LoadValueInjectionRetHardeningPassPass(PR);
initializeX86OptimizeLEAPassPass(PR);
initializeX86PartialReductionPass(PR);
initializePseudoProbeInserterPass(PR);
}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
Expand Down

0 comments on commit 24d4291

Please sign in to comment.