Skip to content

Commit

Permalink
[LV] Vectorize cases with larger number of RT checks, execute only if…
Browse files Browse the repository at this point in the history
… profitable.

This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.

The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.

To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.

Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.

The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.

On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.

```
CFP2006/447.dealII/447.dealII    -1.9%
CINT2017rate/525.x264_r/525.x264_r    -2.2%
ASC_Sequoia/IRSmk/IRSmk       -9.2%
Rodinia/srad/srad     -36.1%
```

`size` regressions on AArch64 with -O3 are

```
MultiSource/Applications/hbd/hbd                 90256.00   106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000     240676.00   257268.00  6.9%
MultiSourc...enchmarks/mafft/pairlocalalign     472603.00   489131.00  3.5%
External/S...2017rate/525.x264_r/525.x264_r     613831.00   630343.00  2.7%
External/S...NT2006/464.h264ref/464.h264ref     818920.00   835448.00  2.0%
External/S...te/538.imagick_r/538.imagick_r    1994730.00  2027754.00  1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4    1236471.00  1253015.00  1.3%
MultiSource/Applications/oggenc/oggenc         2108147.00  2124675.00  0.8%
External/S.../CFP2006/447.dealII/447.dealII    4742999.00  4759559.00  0.3%
External/S...rate/510.parest_r/510.parest_r   14206377.00 14239433.00  0.2%
```

Reviewed By: lebedev.ri, ebrevnov, dmgreen

Differential Revision: https://reviews.llvm.org/D109368
  • Loading branch information
fhahn committed Jul 4, 2022
1 parent aa78c52 commit 644a965
Show file tree
Hide file tree
Showing 14 changed files with 276 additions and 101 deletions.
Expand Up @@ -219,16 +219,9 @@ class LoopVectorizationRequirements {
ExactFPMathInst = I;
}

void addRuntimePointerChecks(unsigned Num) { NumRuntimePointerChecks = Num; }

Instruction *getExactFPInst() { return ExactFPMathInst; }

unsigned getNumRuntimePointerChecks() const {
return NumRuntimePointerChecks;
}

private:
unsigned NumRuntimePointerChecks = 0;
Instruction *ExactFPMathInst = nullptr;
};

Expand Down
Expand Up @@ -993,7 +993,6 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
}
}

Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
PSE.addPredicate(LAI->getPSE().getPredicate());
return true;
}
Expand Down
10 changes: 5 additions & 5 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
Expand Up @@ -33,7 +33,6 @@ class LoopInfo;
class LoopVectorizationLegality;
class LoopVectorizationCostModel;
class PredicatedScalarEvolution;
class LoopVectorizationRequirements;
class LoopVectorizeHints;
class OptimizationRemarkEmitter;
class TargetTransformInfo;
Expand Down Expand Up @@ -191,6 +190,10 @@ struct VectorizationFactor {
/// Cost of the scalar loop.
InstructionCost ScalarCost;

/// The minimum trip count required to make vectorization profitable, e.g. due
/// to runtime checks.
ElementCount MinProfitableTripCount;

VectorizationFactor(ElementCount Width, InstructionCost Cost,
InstructionCost ScalarCost)
: Width(Width), Cost(Cost), ScalarCost(ScalarCost) {}
Expand Down Expand Up @@ -268,8 +271,6 @@ class LoopVectorizationPlanner {

const LoopVectorizeHints &Hints;

LoopVectorizationRequirements &Requirements;

OptimizationRemarkEmitter *ORE;

SmallVector<VPlanPtr, 4> VPlans;
Expand All @@ -285,10 +286,9 @@ class LoopVectorizationPlanner {
InterleavedAccessInfo &IAI,
PredicatedScalarEvolution &PSE,
const LoopVectorizeHints &Hints,
LoopVectorizationRequirements &Requirements,
OptimizationRemarkEmitter *ORE)
: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM), IAI(IAI),
PSE(PSE), Hints(Hints), Requirements(Requirements), ORE(ORE) {}
PSE(PSE), Hints(Hints), ORE(ORE) {}

/// Plan how to best vectorize, return the best VF and its cost, or None if
/// vectorization and interleaving should be avoided up front.
Expand Down

0 comments on commit 644a965

Please sign in to comment.