Skip to content

Commit

Permalink
[AggressiveInstCombine] convert rotate with guard branch into funnel …
Browse files Browse the repository at this point in the history
…shift (PR34924)

Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. 
In the worst case (a target that doesn't have rotate instructions), we will expand this into a 
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very 
likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, 
CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to 
have the IR converted before we reach things like vectorization because the reduced code can make a 
loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes 
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). 
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. 
This only runs at -O3, but that seems like a reasonable limitation given that source code has many 
options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, 
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).

Differential Revision: https://reviews.llvm.org/D55604

llvm-svn: 349396
  • Loading branch information
rotateright committed Dec 17, 2018
1 parent a9bcf5b commit 200885e
Show file tree
Hide file tree
Showing 3 changed files with 381 additions and 47 deletions.
Expand Up @@ -59,6 +59,99 @@ class AggressiveInstCombinerLegacyPass : public FunctionPass {
};
} // namespace

/// Match a pattern for a bitwise rotate operation that partially guards
/// against undefined behavior by branching around the rotation when the shift
/// amount is 0.
static bool foldGuardedRotateToFunnelShift(Instruction &I) {
if (I.getOpcode() != Instruction::PHI || I.getNumOperands() != 2)
return false;

// As with the one-use checks below, this is not strictly necessary, but we
// are being cautious to avoid potential perf regressions on targets that
// do not actually have a rotate instruction (where the funnel shift would be
// expanded back into math/shift/logic ops).
if (!isPowerOf2_32(I.getType()->getScalarSizeInBits()))
return false;

// Match V to funnel shift left/right and capture the source operand and
// shift amount in X and Y.
auto matchRotate = [](Value *V, Value *&X, Value *&Y) {
Value *L0, *L1, *R0, *R1;
unsigned Width = V->getType()->getScalarSizeInBits();
auto Sub = m_Sub(m_SpecificInt(Width), m_Value(R1));

// rotate_left(X, Y) == (X << Y) | (X >> (Width - Y))
auto RotL = m_OneUse(m_c_Or(m_Shl(m_Value(L0), m_Value(L1)),
m_LShr(m_Value(R0), Sub)));
if (RotL.match(V) && L0 == R0 && L1 == R1) {
X = L0;
Y = L1;
return Intrinsic::fshl;
}

// rotate_right(X, Y) == (X >> Y) | (X << (Width - Y))
auto RotR = m_OneUse(m_c_Or(m_LShr(m_Value(L0), m_Value(L1)),
m_Shl(m_Value(R0), Sub)));
if (RotR.match(V) && L0 == R0 && L1 == R1) {
X = L0;
Y = L1;
return Intrinsic::fshr;
}

return Intrinsic::not_intrinsic;
};

// One phi operand must be a rotate operation, and the other phi operand must
// be the source value of that rotate operation:
// phi [ rotate(RotSrc, RotAmt), RotBB ], [ RotSrc, GuardBB ]
PHINode &Phi = cast<PHINode>(I);
Value *P0 = Phi.getOperand(0), *P1 = Phi.getOperand(1);
Value *RotSrc, *RotAmt;
Intrinsic::ID IID = matchRotate(P0, RotSrc, RotAmt);
if (IID == Intrinsic::not_intrinsic || RotSrc != P1) {
IID = matchRotate(P1, RotSrc, RotAmt);
if (IID == Intrinsic::not_intrinsic || RotSrc != P0)
return false;
assert((IID == Intrinsic::fshl || IID == Intrinsic::fshr) &&
"Pattern must match funnel shift left or right");
}

// The incoming block with our source operand must be the "guard" block.
// That must contain a cmp+branch to avoid the rotate when the shift amount
// is equal to 0. The other incoming block is the block with the rotate.
BasicBlock *GuardBB = Phi.getIncomingBlock(RotSrc == P1);
BasicBlock *RotBB = Phi.getIncomingBlock(RotSrc != P1);
Instruction *TermI = GuardBB->getTerminator();
BasicBlock *TrueBB, *FalseBB;
ICmpInst::Predicate Pred;
if (!match(TermI, m_Br(m_ICmp(Pred, m_Specific(RotAmt), m_ZeroInt()),
TrueBB, FalseBB)))
return false;

BasicBlock *PhiBB = Phi.getParent();
if (Pred != CmpInst::ICMP_EQ || TrueBB != PhiBB || FalseBB != RotBB)
return false;

// We matched a variation of this IR pattern:
// GuardBB:
// %cmp = icmp eq i32 %RotAmt, 0
// br i1 %cmp, label %PhiBB, label %RotBB
// RotBB:
// %sub = sub i32 32, %RotAmt
// %shr = lshr i32 %X, %sub
// %shl = shl i32 %X, %RotAmt
// %rot = or i32 %shr, %shl
// br label %PhiBB
// PhiBB:
// %cond = phi i32 [ %rot, %RotBB ], [ %X, %GuardBB ]
// -->
// llvm.fshl.i32(i32 %X, i32 %RotAmt)
IRBuilder<> Builder(PhiBB, PhiBB->getFirstInsertionPt());
Function *F = Intrinsic::getDeclaration(Phi.getModule(), IID, Phi.getType());
Phi.replaceAllUsesWith(Builder.CreateCall(F, {RotSrc, RotSrc, RotAmt}));
return true;
}

/// This is used by foldAnyOrAllBitsSet() to capture a source value (Root) and
/// the bit indexes (Mask) needed by a masked compare. If we're matching a chain
/// of 'and' ops, then we also need to capture the fact that we saw an
Expand Down Expand Up @@ -174,8 +267,10 @@ static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {
// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend()))
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange |= foldAnyOrAllBitsSet(I);
MadeChange |= foldGuardedRotateToFunnelShift(I);
}
}

// We're done with transforms, so remove dead instructions.
Expand Down

0 comments on commit 200885e

Please sign in to comment.