Skip to content

Commit

Permalink
[VPlan] Implement initial vector code generation support for simple o…
Browse files Browse the repository at this point in the history
…uter loops.

Summary:
[VPlan] Implement vector code generation support for simple outer loops.

Context: Patch Series #1 for outer loop vectorization support in LV  using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
                                                          
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:

  - force vector code generation using explicit vectorize_width

  - add conservative early returns in cost model and other places for VPlanNativePath

  - add code for setting up outer loop inductions 

  - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches

  - support for generating uniform inner branches

We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. 

Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal

Reviewed By: fhahn, hsaito

Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D50820

llvm-svn: 342197
  • Loading branch information
hidekisaito committed Sep 14, 2018
1 parent ce9e296 commit ea7f303
Show file tree
Hide file tree
Showing 9 changed files with 486 additions and 15 deletions.
Expand Up @@ -332,6 +332,11 @@ class LoopVectorizationLegality {
/// If false, good old LV code.
bool canVectorizeLoopNestCFG(Loop *Lp, bool UseVPlanNativePath);

/// Set up outer loop inductions by checking Phis in outer loop header for
/// supported inductions (int inductions). Return false if any of these Phis
/// is not a supported induction or if we fail to find an induction.
bool setupOuterLoopInductions();

/// Return true if the pre-header, exiting and latch blocks of \p Lp
/// (non-recursive) are considered legal for vectorization.
/// Temporarily taking UseVPlanNativePath parameter. If true, take
Expand Down
38 changes: 38 additions & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
Expand Up @@ -516,6 +516,18 @@ bool LoopVectorizationLegality::canVectorizeOuterLoop() {
return false;
}

// Check whether we are able to set up outer loop induction.
if (!setupOuterLoopInductions()) {
LLVM_DEBUG(
dbgs() << "LV: Not vectorizing: Unsupported outer loop Phi(s).\n");
ORE->emit(createMissedAnalysis("UnsupportedPhi")
<< "Unsupported outer loop Phi(s)");
if (DoExtraAnalysis)
Result = false;
else
return false;
}

return Result;
}

Expand Down Expand Up @@ -571,6 +583,32 @@ void LoopVectorizationLegality::addInductionPhi(
LLVM_DEBUG(dbgs() << "LV: Found an induction variable.\n");
}

bool LoopVectorizationLegality::setupOuterLoopInductions() {
BasicBlock *Header = TheLoop->getHeader();

// Returns true if a given Phi is a supported induction.
auto isSupportedPhi = [&](PHINode &Phi) -> bool {
InductionDescriptor ID;
if (InductionDescriptor::isInductionPHI(&Phi, TheLoop, PSE, ID) &&
ID.getKind() == InductionDescriptor::IK_IntInduction) {
addInductionPhi(&Phi, ID, AllowedExit);
return true;
} else {
// Bail out for any Phi in the outer loop header that is not a supported
// induction.
LLVM_DEBUG(
dbgs()
<< "LV: Found unsupported PHI for outer loop vectorization.\n");
return false;
}
};

if (llvm::all_of(Header->phis(), isSupportedPhi))
return true;
else
return false;
}

bool LoopVectorizationLegality::canVectorizeInstrs() {
BasicBlock *Header = TheLoop->getHeader();

Expand Down
159 changes: 147 additions & 12 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Expand Up @@ -58,6 +58,7 @@
#include "LoopVectorizationPlanner.h"
#include "VPRecipeBuilder.h"
#include "VPlanHCFGBuilder.h"
#include "VPlanHCFGTransforms.h"
#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
Expand Down Expand Up @@ -234,7 +235,7 @@ static cl::opt<unsigned> MaxNestedScalarReductionIC(
cl::desc("The maximum interleave count to use when interleaving a scalar "
"reduction in a nested loop."));

static cl::opt<bool> EnableVPlanNativePath(
cl::opt<bool> EnableVPlanNativePath(
"enable-vplan-native-path", cl::init(false), cl::Hidden,
cl::desc("Enable VPlan-native vectorization path with "
"support for outer loop vectorization."));
Expand Down Expand Up @@ -419,6 +420,9 @@ class InnerLoopVectorizer {
/// the instruction.
void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);

/// Fix the non-induction PHIs in the OrigPHIsToFix vector.
void fixNonInductionPHIs(void);

protected:
friend class LoopVectorizationPlanner;

Expand Down Expand Up @@ -686,6 +690,10 @@ class InnerLoopVectorizer {
// Holds the end values for each induction variable. We save the end values
// so we can later fix-up the external users of the induction variables.
DenseMap<PHINode *, Value *> IVEndValues;

// Vector of original scalar PHIs whose corresponding widened PHIs need to be
// fixed up at the end of vector code generation.
SmallVector<PHINode *, 8> OrigPHIsToFix;
};

class InnerLoopUnroller : public InnerLoopVectorizer {
Expand Down Expand Up @@ -888,6 +896,12 @@ class LoopVectorizationCostModel {
/// vectorization factor \p VF.
bool isProfitableToScalarize(Instruction *I, unsigned VF) const {
assert(VF > 1 && "Profitable to scalarize relevant only for VF > 1.");

// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.
if (EnableVPlanNativePath)
return false;

auto Scalars = InstsToScalarize.find(VF);
assert(Scalars != InstsToScalarize.end() &&
"VF not yet analyzed for scalarization profitability");
Expand All @@ -898,6 +912,12 @@ class LoopVectorizationCostModel {
bool isUniformAfterVectorization(Instruction *I, unsigned VF) const {
if (VF == 1)
return true;

// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.
if (EnableVPlanNativePath)
return false;

auto UniformsPerVF = Uniforms.find(VF);
assert(UniformsPerVF != Uniforms.end() &&
"VF not yet analyzed for uniformity");
Expand All @@ -908,6 +928,12 @@ class LoopVectorizationCostModel {
bool isScalarAfterVectorization(Instruction *I, unsigned VF) const {
if (VF == 1)
return true;

// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.
if (EnableVPlanNativePath)
return false;

auto ScalarsPerVF = Scalars.find(VF);
assert(ScalarsPerVF != Scalars.end() &&
"Scalar values are not calculated for VF");
Expand Down Expand Up @@ -962,6 +988,12 @@ class LoopVectorizationCostModel {
/// through the cost modeling.
InstWidening getWideningDecision(Instruction *I, unsigned VF) {
assert(VF >= 2 && "Expected VF >=2");

// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.
if (EnableVPlanNativePath)
return CM_GatherScatter;

std::pair<Instruction *, unsigned> InstOnVF = std::make_pair(I, VF);
auto Itr = WideningDecisions.find(InstOnVF);
if (Itr == WideningDecisions.end())
Expand Down Expand Up @@ -1397,8 +1429,16 @@ struct LoopVectorize : public FunctionPass {
AU.addRequired<LoopAccessLegacyAnalysis>();
AU.addRequired<DemandedBitsWrapperPass>();
AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();

// We currently do not preserve loopinfo/dominator analyses with outer loop
// vectorization. Until this is addressed, mark these analyses as preserved
// only for non-VPlan-native path.
// TODO: Preserve Loop and Dominator analyses for VPlan-native path.
if (!EnableVPlanNativePath) {
AU.addPreserved<LoopInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();
}

AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();
}
Expand Down Expand Up @@ -1749,8 +1789,9 @@ Value *InnerLoopVectorizer::getOrCreateVectorValue(Value *V, unsigned Part) {
assert(!V->getType()->isVectorTy() && "Can't widen a vector");
assert(!V->getType()->isVoidTy() && "Type does not produce a value");

// If we have a stride that is replaced by one, do it here.
if (Legal->hasStride(V))
// If we have a stride that is replaced by one, do it here. Defer this for
// the VPlan-native path until we start running Legal checks in that path.
if (!EnableVPlanNativePath && Legal->hasStride(V))
V = ConstantInt::get(V->getType(), 1);

// If we have a vector mapped to this value, return it.
Expand Down Expand Up @@ -2416,6 +2457,10 @@ void InnerLoopVectorizer::emitSCEVChecks(Loop *L, BasicBlock *Bypass) {
}

void InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L, BasicBlock *Bypass) {
// VPlan-native path does not do any analysis for runtime checks currently.
if (EnableVPlanNativePath)
return;

BasicBlock *BB = L->getLoopPreheader();

// Generate the code that checks in runtime if arrays overlap. We put the
Expand Down Expand Up @@ -3060,6 +3105,13 @@ void InnerLoopVectorizer::fixVectorizedLoop() {
if (VF > 1)
truncateToMinimalBitwidths();

// Fix widened non-induction PHIs by setting up the PHI operands.
if (OrigPHIsToFix.size()) {
assert(EnableVPlanNativePath &&
"Unexpected non-induction PHIs for fixup in non VPlan-native path");
fixNonInductionPHIs();
}

// At this point every instruction in the original loop is widened to a
// vector form. Now we need to fix the recurrences in the loop. These PHI
// nodes are currently empty because we did not want to introduce cycles.
Expand Down Expand Up @@ -3532,12 +3584,62 @@ void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
} while (Changed);
}

void InnerLoopVectorizer::fixNonInductionPHIs() {
for (PHINode *OrigPhi : OrigPHIsToFix) {
PHINode *NewPhi =
cast<PHINode>(VectorLoopValueMap.getVectorValue(OrigPhi, 0));
unsigned NumIncomingValues = OrigPhi->getNumIncomingValues();

SmallVector<BasicBlock *, 2> ScalarBBPredecessors(
predecessors(OrigPhi->getParent()));
SmallVector<BasicBlock *, 2> VectorBBPredecessors(
predecessors(NewPhi->getParent()));
assert(ScalarBBPredecessors.size() == VectorBBPredecessors.size() &&
"Scalar and Vector BB should have the same number of predecessors");

// The insertion point in Builder may be invalidated by the time we get
// here. Force the Builder insertion point to something valid so that we do
// not run into issues during insertion point restore in
// getOrCreateVectorValue calls below.
Builder.SetInsertPoint(NewPhi);

// The predecessor order is preserved and we can rely on mapping between
// scalar and vector block predecessors.
for (unsigned i = 0; i < NumIncomingValues; ++i) {
BasicBlock *NewPredBB = VectorBBPredecessors[i];

// When looking up the new scalar/vector values to fix up, use incoming
// values from original phi.
Value *ScIncV =
OrigPhi->getIncomingValueForBlock(ScalarBBPredecessors[i]);

// Scalar incoming value may need a broadcast
Value *NewIncV = getOrCreateVectorValue(ScIncV, 0);
NewPhi->addIncoming(NewIncV, NewPredBB);
}
}
}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
unsigned VF) {
PHINode *P = cast<PHINode>(PN);
if (EnableVPlanNativePath) {
// Currently we enter here in the VPlan-native path for non-induction
// PHIs where all control flow is uniform. We simply widen these PHIs.
// Create a vector phi with no operands - the vector phi operands will be
// set at the end of vector code generation.
Type *VecTy =
(VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
Value *VecPhi = Builder.CreatePHI(VecTy, PN->getNumOperands(), "vec.phi");
VectorLoopValueMap.setVectorValue(P, 0, VecPhi);
OrigPHIsToFix.push_back(P);

return;
}

assert(PN->getParent() == OrigLoop->getHeader() &&
"Non-header phis should have been handled elsewhere");

PHINode *P = cast<PHINode>(PN);
// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #1: We create a new vector PHI node with no incoming edges. We'll use
Expand Down Expand Up @@ -3893,6 +3995,10 @@ void InnerLoopVectorizer::updateAnalysis() {
// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);

// DT is not kept up-to-date for outer loop vectorization
if (EnableVPlanNativePath)
return;

// Update the dominator tree information.
assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&
"Entry does not dominate exit.");
Expand Down Expand Up @@ -6527,6 +6633,13 @@ LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();

SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanHCFGTransforms::VPInstructionsToVPRecipes(
Plan, Legal->getInductionVars(), DeadInstructions);

for (unsigned VF = Range.Start; VF < Range.End; VF *= 2)
Plan->addVF(VF);

return Plan;
}

Expand Down Expand Up @@ -6728,11 +6841,26 @@ static bool processLoopInVPlanNativePath(
Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();

// Plan how to best vectorize, return the best VF and its cost.
LVP.planInVPlanNativePath(OptForSize, UserVF);
VectorizationFactor VF = LVP.planInVPlanNativePath(OptForSize, UserVF);

// Returning false. We are currently not generating vector code in the VPlan
// native path.
return false;
// If we are stress testing VPlan builds, do not attempt to generate vector
// code.
if (VPlanBuildStressTest)
return false;

LVP.setBestPlan(VF.Width, 1);

InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, UserVF, 1, LVL,
&CM);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
<< L->getHeader()->getParent()->getName() << "\"\n");
LVP.executePlan(LB, DT);

// Mark the loop as already vectorized to avoid vectorizing again.
Hints.setAlreadyVectorized();

LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));
return true;
}

bool LoopVectorizePass::processLoop(Loop *L) {
Expand Down Expand Up @@ -7123,8 +7251,15 @@ PreservedAnalyses LoopVectorizePass::run(Function &F,
if (!Changed)
return PreservedAnalyses::all();
PreservedAnalyses PA;
PA.preserve<LoopAnalysis>();
PA.preserve<DominatorTreeAnalysis>();

// We currently do not preserve loopinfo/dominator analyses with outer loop
// vectorization. Until this is addressed, mark these analyses as preserved
// only for non-VPlan-native path.
// TODO: Preserve Loop and Dominator analyses for VPlan-native path.
if (!EnableVPlanNativePath) {
PA.preserve<LoopAnalysis>();
PA.preserve<DominatorTreeAnalysis>();
}
PA.preserve<BasicAA>();
PA.preserve<GlobalsAA>();
return PA;
Expand Down

0 comments on commit ea7f303

Please sign in to comment.