Skip to content

Commit

Permalink
[PGO] Context sensitive PGO (part 1)
Browse files Browse the repository at this point in the history
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.

In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.

A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use

This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 354930
  • Loading branch information
xur-llvm committed Feb 26, 2019
1 parent fa49c3a commit 35d2d51
Show file tree
Hide file tree
Showing 14 changed files with 309 additions and 75 deletions.
1 change: 1 addition & 0 deletions llvm/include/llvm/InitializePasses.h
Expand Up @@ -299,6 +299,7 @@ void initializePEIPass(PassRegistry&);
void initializePGOIndirectCallPromotionLegacyPassPass(PassRegistry&);
void initializePGOInstrumentationGenLegacyPassPass(PassRegistry&);
void initializePGOInstrumentationUseLegacyPassPass(PassRegistry&);
void initializePGOInstrumentationGenCreateVarLegacyPassPass(PassRegistry&);
void initializePGOMemOPSizeOptLegacyPassPass(PassRegistry&);
void initializePHIEliminationPass(PassRegistry&);
void initializePartialInlinerLegacyPassPass(PassRegistry&);
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/LTO/Config.h
Expand Up @@ -55,6 +55,9 @@ struct Config {
/// Disable entirely the optimizer, including importing for ThinLTO
bool CodeGenOnly = false;

/// Run PGO context sensitive IR instrumentation.
bool RunCSIRInstr = false;

/// If this field is set, the set of passes run in the middle-end optimizer
/// will be the one specified by the string. Only works with the new pass
/// manager as the old one doesn't have this ability.
Expand All @@ -73,6 +76,9 @@ struct Config {
/// with this triple.
std::string DefaultTriple;

/// Context Sensitive PGO profile path.
std::string CSIRProfile;

/// Sample PGO profile path.
std::string SampleProfile;

Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/LinkAllPasses.h
Expand Up @@ -102,6 +102,7 @@ namespace {
(void) llvm::createGCOVProfilerPass();
(void) llvm::createPGOInstrumentationGenLegacyPass();
(void) llvm::createPGOInstrumentationUseLegacyPass();
(void) llvm::createPGOInstrumentationGenCreateVarLegacyPass();
(void) llvm::createPGOIndirectCallPromotionLegacyPass();
(void) llvm::createPGOMemOPSizeOptLegacyPass();
(void) llvm::createInstrProfilingLegacyPass();
Expand Down
16 changes: 16 additions & 0 deletions llvm/include/llvm/ProfileData/InstrProf.h
Expand Up @@ -767,10 +767,20 @@ struct NamedInstrProfRecord : InstrProfRecord {
StringRef Name;
uint64_t Hash;

// We reserve this bit as the flag for context sensitive profile record.
static const int CS_FLAG_IN_FUNC_HASH = 60;

NamedInstrProfRecord() = default;
NamedInstrProfRecord(StringRef Name, uint64_t Hash,
std::vector<uint64_t> Counts)
: InstrProfRecord(std::move(Counts)), Name(Name), Hash(Hash) {}

static bool hasCSFlagInHash(uint64_t FuncHash) {
return ((FuncHash >> CS_FLAG_IN_FUNC_HASH) & 1);
}
static void setCSFlagInHash(uint64_t &FuncHash) {
FuncHash |= ((uint64_t)1 << CS_FLAG_IN_FUNC_HASH);
}
};

uint32_t InstrProfRecord::getNumValueKinds() const {
Expand Down Expand Up @@ -1004,6 +1014,8 @@ namespace RawInstrProf {
// from control data struct is changed from raw pointer to Name's MD5 value.
// Version 4: ValueDataBegin and ValueDataSizes fields are removed from the
// raw header.
// Version 5: Bit 60 of FuncHash is reserved for the flag for the context
// sensitive records.
const uint64_t Version = INSTR_PROF_RAW_VERSION;

template <class IntPtrT> inline uint64_t getMagic();
Expand Down Expand Up @@ -1040,6 +1052,10 @@ struct Header {
void getMemOPSizeRangeFromOption(StringRef Str, int64_t &RangeStart,
int64_t &RangeLast);

// Create a COMDAT variable INSTR_PROF_RAW_VERSION_VAR to make the runtime
// aware this is an ir_level profile so it can set the version flag.
void createIRLevelProfileFlagVar(Module &M, bool IsCS);

// Create the variable for the profile file name.
void createProfileFileNameVar(Module &M, StringRef InstrProfileOutput);

Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/ProfileData/InstrProfData.inc
Expand Up @@ -635,10 +635,12 @@ serializeValueProfDataFrom(ValueProfRecordClosure *Closure,
* version for other variants of profile. We set the lowest bit of the upper 8
* bits (i.e. bit 56) to 1 to indicate if this is an IR-level instrumentaiton
* generated profile, and 0 if this is a Clang FE generated profile.
* 1 in bit 57 indicates there are context-sensitive records in the profile.
*/
#define VARIANT_MASKS_ALL 0xff00000000000000ULL
#define GET_VERSION(V) ((V) & ~VARIANT_MASKS_ALL)
#define VARIANT_MASK_IR_PROF (0x1ULL << 56)
#define VARIANT_MASK_CSIR_PROF (0x1ULL << 57)
#define INSTR_PROF_RAW_VERSION_VAR __llvm_profile_raw_version
#define INSTR_PROF_PROFILE_RUNTIME_VAR __llvm_profile_runtime

Expand Down
18 changes: 13 additions & 5 deletions llvm/include/llvm/Transforms/Instrumentation.h
Expand Up @@ -87,10 +87,14 @@ struct GCOVOptions {
ModulePass *createGCOVProfilerPass(const GCOVOptions &Options =
GCOVOptions::getDefault());

// PGO Instrumention
ModulePass *createPGOInstrumentationGenLegacyPass();
// PGO Instrumention. Parameter IsCS indicates if this is the context senstive
// instrumentation.
ModulePass *createPGOInstrumentationGenLegacyPass(bool IsCS = false);
ModulePass *
createPGOInstrumentationUseLegacyPass(StringRef Filename = StringRef(""));
createPGOInstrumentationUseLegacyPass(StringRef Filename = StringRef(""),
bool IsCS = false);
ModulePass *createPGOInstrumentationGenCreateVarLegacyPass(
StringRef CSInstrName = StringRef(""));
ModulePass *createPGOIndirectCallPromotionLegacyPass(bool InLTO = false,
bool SamplePGO = false);
FunctionPass *createPGOMemOPSizeOptLegacyPass();
Expand Down Expand Up @@ -132,15 +136,19 @@ struct InstrProfOptions {
// Use atomic profile counter increments.
bool Atomic = false;

// Use BFI to guide register promotion
bool UseBFIInPromotion = false;

// Name of the profile file to use as output
std::string InstrProfileOutput;

InstrProfOptions() = default;
};

/// Insert frontend instrumentation based profiling.
/// Insert frontend instrumentation based profiling. Parameter IsCS indicates if
// this is the context senstive instrumentation.
ModulePass *createInstrProfilingLegacyPass(
const InstrProfOptions &Options = InstrProfOptions());
const InstrProfOptions &Options = InstrProfOptions(), bool IsCS = false);

FunctionPass *createHWAddressSanitizerPass(bool CompileKernel = false,
bool Recover = false);
Expand Down
Expand Up @@ -35,7 +35,8 @@ using LoadStorePair = std::pair<Instruction *, Instruction *>;
class InstrProfiling : public PassInfoMixin<InstrProfiling> {
public:
InstrProfiling() = default;
InstrProfiling(const InstrProfOptions &Options) : Options(Options) {}
InstrProfiling(const InstrProfOptions &Options, bool IsCS)
: Options(Options), IsCS(IsCS) {}

PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
bool run(Module &M, const TargetLibraryInfo &TLI);
Expand All @@ -60,6 +61,9 @@ class InstrProfiling : public PassInfoMixin<InstrProfiling> {
GlobalVariable *NamesVar;
size_t NamesSize;

// Is this lowering for the context-sensitive instrumentation.
bool IsCS;

// vector of counter load/store pairs to be register promoted.
std::vector<LoadStorePair> PromotionCandidates;

Expand Down
Expand Up @@ -17,6 +17,7 @@

#include "llvm/ADT/ArrayRef.h"
#include "llvm/IR/PassManager.h"
#include "llvm/ProfileData/InstrProf.h"
#include <cstdint>
#include <string>

Expand All @@ -26,23 +27,51 @@ class Function;
class Instruction;
class Module;

/// The instrumentation (profile-instr-gen) pass for IR based PGO.
// We use this pass to create COMDAT profile variables for context
// sensitive PGO (CSPGO). The reason to have a pass for this is CSPGO
// can be run after LTO/ThinLTO linking. Lld linker needs to see
// all the COMDAT variables before linking. So we have this pass
// always run before linking for CSPGO.
class PGOInstrumentationGenCreateVar
: public PassInfoMixin<PGOInstrumentationGenCreateVar> {
public:
PGOInstrumentationGenCreateVar(std::string CSInstrName = "")
: CSInstrName(CSInstrName) {}
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM) {
createProfileFileNameVar(M, CSInstrName);
createIRLevelProfileFlagVar(M, /* IsCS */ true);
return PreservedAnalyses::all();
}

private:
std::string CSInstrName;
};

/// The instrumentation (profile-instr-gen) pass for IR based PGO.
class PGOInstrumentationGen : public PassInfoMixin<PGOInstrumentationGen> {
public:
PGOInstrumentationGen(bool IsCS = false) : IsCS(IsCS) {}
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);

private:
// If this is a context sensitive instrumentation.
bool IsCS;
};

/// The profile annotation (profile-instr-use) pass for IR based PGO.
class PGOInstrumentationUse : public PassInfoMixin<PGOInstrumentationUse> {
public:
PGOInstrumentationUse(std::string Filename = "",
std::string RemappingFilename = "");
std::string RemappingFilename = "", bool IsCS = false);

PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);

private:
std::string ProfileFileName;
std::string ProfileRemappingFileName;
// If this is a context sensitive instrumentation.
bool IsCS;
};

/// The indirect function call promotion pass.
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Passes/PassBuilder.cpp
Expand Up @@ -569,7 +569,8 @@ void PassBuilder::addPGOInstrPasses(ModulePassManager &MPM, bool DebugLogging,
if (!ProfileGenFile.empty())
Options.InstrProfileOutput = ProfileGenFile;
Options.DoCounterPromotion = true;
MPM.addPass(InstrProfiling(Options));
Options.UseBFIInPromotion = false;
MPM.addPass(InstrProfiling(Options, false));
}

if (!ProfileUseFile.empty())
Expand Down
19 changes: 19 additions & 0 deletions llvm/lib/ProfileData/InstrProf.cpp
Expand Up @@ -1011,6 +1011,25 @@ void getMemOPSizeRangeFromOption(StringRef MemOPSizeRange, int64_t &RangeStart,
assert(RangeLast >= RangeStart);
}

// Create a COMDAT variable INSTR_PROF_RAW_VERSION_VAR to make the runtime
// aware this is an ir_level profile so it can set the version flag.
void createIRLevelProfileFlagVar(Module &M, bool IsCS) {
const StringRef VarName(INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR));
Type *IntTy64 = Type::getInt64Ty(M.getContext());
uint64_t ProfileVersion = (INSTR_PROF_RAW_VERSION | VARIANT_MASK_IR_PROF);
if (IsCS)
ProfileVersion |= VARIANT_MASK_CSIR_PROF;
auto IRLevelVersionVariable = new GlobalVariable(
M, IntTy64, true, GlobalValue::WeakAnyLinkage,
Constant::getIntegerValue(IntTy64, APInt(64, ProfileVersion)), VarName);
IRLevelVersionVariable->setVisibility(GlobalValue::DefaultVisibility);
Triple TT(M.getTargetTriple());
if (TT.supportsCOMDAT()) {
IRLevelVersionVariable->setLinkage(GlobalValue::ExternalLinkage);
IRLevelVersionVariable->setComdat(M.getOrInsertComdat(VarName));
}
}

// Create the variable for the profile file name.
void createProfileFileNameVar(Module &M, StringRef InstrProfileOutput) {
if (InstrProfileOutput.empty())
Expand Down
53 changes: 43 additions & 10 deletions llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
Expand Up @@ -18,6 +18,8 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Attributes.h"
Expand Down Expand Up @@ -147,8 +149,8 @@ class InstrProfilingLegacyPass : public ModulePass {
static char ID;

InstrProfilingLegacyPass() : ModulePass(ID) {}
InstrProfilingLegacyPass(const InstrProfOptions &Options)
: ModulePass(ID), InstrProf(Options) {}
InstrProfilingLegacyPass(const InstrProfOptions &Options, bool IsCS)
: ModulePass(ID), InstrProf(Options, IsCS) {}

StringRef getPassName() const override {
return "Frontend instrumentation-based coverage lowering";
Expand Down Expand Up @@ -232,9 +234,9 @@ class PGOCounterPromoter {
public:
PGOCounterPromoter(
DenseMap<Loop *, SmallVector<LoadStorePair, 8>> &LoopToCands,
Loop &CurLoop, LoopInfo &LI)
Loop &CurLoop, LoopInfo &LI, BlockFrequencyInfo *BFI)
: LoopToCandidates(LoopToCands), ExitBlocks(), InsertPts(), L(CurLoop),
LI(LI) {
LI(LI), BFI(BFI) {

SmallVector<BasicBlock *, 8> LoopExitBlocks;
SmallPtrSet<BasicBlock *, 8> BlockSet;
Expand Down Expand Up @@ -263,6 +265,20 @@ class PGOCounterPromoter {
SSAUpdater SSA(&NewPHIs);
Value *InitVal = ConstantInt::get(Cand.first->getType(), 0);

// If BFI is set, we will use it to guide the promotions.
if (BFI) {
auto *BB = Cand.first->getParent();
auto InstrCount = BFI->getBlockProfileCount(BB);
if (!InstrCount)
continue;
auto PreheaderCount = BFI->getBlockProfileCount(L.getLoopPreheader());
// If the average loop trip count is not greater than 1.5, we skip
// promotion.
if (PreheaderCount &&
(PreheaderCount.getValue() * 3) >= (InstrCount.getValue() * 2))
continue;
}

PGOCounterPromoterHelper Promoter(Cand.first, Cand.second, SSA, InitVal,
L.getLoopPreheader(), ExitBlocks,
InsertPts, LoopToCandidates, LI);
Expand Down Expand Up @@ -312,6 +328,11 @@ class PGOCounterPromoter {

SmallVector<BasicBlock *, 8> ExitingBlocks;
LP->getExitingBlocks(ExitingBlocks);

// If BFI is set, we do more aggressive promotions based on BFI.
if (BFI)
return (unsigned)-1;

// Not considierered speculative.
if (ExitingBlocks.size() == 1)
return MaxNumOfPromotionsPerLoop;
Expand Down Expand Up @@ -343,6 +364,7 @@ class PGOCounterPromoter {
SmallVector<Instruction *, 8> InsertPts;
Loop &L;
LoopInfo &LI;
BlockFrequencyInfo *BFI;
};

} // end anonymous namespace
Expand All @@ -365,8 +387,9 @@ INITIALIZE_PASS_END(
"Frontend instrumentation-based coverage lowering.", false, false)

ModulePass *
llvm::createInstrProfilingLegacyPass(const InstrProfOptions &Options) {
return new InstrProfilingLegacyPass(Options);
llvm::createInstrProfilingLegacyPass(const InstrProfOptions &Options,
bool IsCS) {
return new InstrProfilingLegacyPass(Options, IsCS);
}

static InstrProfIncrementInst *castToIncrementInst(Instruction *Instr) {
Expand Down Expand Up @@ -415,6 +438,13 @@ void InstrProfiling::promoteCounterLoadStores(Function *F) {
LoopInfo LI(DT);
DenseMap<Loop *, SmallVector<LoadStorePair, 8>> LoopPromotionCandidates;

std::unique_ptr<BlockFrequencyInfo> BFI;
if (Options.UseBFIInPromotion) {
std::unique_ptr<BranchProbabilityInfo> BPI;
BPI.reset(new BranchProbabilityInfo(*F, LI, TLI));
BFI.reset(new BlockFrequencyInfo(*F, *BPI, LI));
}

for (const auto &LoadStore : PromotionCandidates) {
auto *CounterLoad = LoadStore.first;
auto *CounterStore = LoadStore.second;
Expand All @@ -430,7 +460,7 @@ void InstrProfiling::promoteCounterLoadStores(Function *F) {
// Do a post-order traversal of the loops so that counter updates can be
// iteratively hoisted outside the loop nest.
for (auto *Loop : llvm::reverse(Loops)) {
PGOCounterPromoter Promoter(LoopPromotionCandidates, *Loop, LI);
PGOCounterPromoter Promoter(LoopPromotionCandidates, *Loop, LI, BFI.get());
Promoter.run(&TotalCountersPromoted);
}
}
Expand Down Expand Up @@ -681,7 +711,6 @@ static bool needsRuntimeRegistrationOfSectionRange(const Triple &TT) {
// Don't do this for Darwin. compiler-rt uses linker magic.
if (TT.isOSDarwin())
return false;

// Use linker script magic to get data/cnts/name start/end.
if (TT.isOSLinux() || TT.isOSFreeBSD() || TT.isOSNetBSD() ||
TT.isOSFuchsia() || TT.isPS4CPU() || TT.isOSWindows())
Expand Down Expand Up @@ -985,8 +1014,12 @@ void InstrProfiling::emitUses() {
}

void InstrProfiling::emitInitialization() {
// Create variable for profile name.
createProfileFileNameVar(*M, Options.InstrProfileOutput);
// Create ProfileFileName variable. Don't don't this for the
// context-sensitive instrumentation lowering: This lowering is after
// LTO/ThinLTO linking. Pass PGOInstrumentationGenCreateVar should
// have already create the variable before LTO/ThinLTO linking.
if (!IsCS)
createProfileFileNameVar(*M, Options.InstrProfileOutput);
Function *RegisterF = M->getFunction(getInstrProfRegFuncsName());
if (!RegisterF)
return;
Expand Down

0 comments on commit 35d2d51

Please sign in to comment.