Skip to content

Commit

Permalink
[X86] Fold (icmp ult (add x,-C),2) -> `(or (icmp eq X,C), (icmp eq …
Browse files Browse the repository at this point in the history
…X,C+1))` for Vectors

This is undoing a middle-end transform which does the opposite. Since
X86 doesn't have unsigned vector comparison instructions pre-AVX512,
the simplified form gets worse codegen.

Fixes #66479

Proofs: https://alive2.llvm.org/ce/z/UCz3wt

Closes #84104
Closes #66479
  • Loading branch information
goldsteinn committed Mar 7, 2024
1 parent 3e73a08 commit 9f96db8
Show file tree
Hide file tree
Showing 2 changed files with 214 additions and 223 deletions.
63 changes: 63 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53441,6 +53441,69 @@ static SDValue combineSetCC(SDNode *N, SelectionDAG &DAG,
truncateAVX512SetCCNoBWI(VT, OpVT, LHS, RHS, CC, DL, DAG, Subtarget))
return R;

// In the middle end transforms:
// `(or (icmp eq X, C), (icmp eq X, C+1))`
// -> `(icmp ult (add x, -C), 2)`
// Likewise inverted cases with `ugt`.
//
// Since x86, pre avx512, doesn't have unsigned vector compares, this results
// in worse codegen. So, undo the middle-end transform and go back to `(or
// (icmp eq), (icmp eq))` form.
// Also skip AVX1 with ymm vectors, as the umin approach combines better than
// the xmm approach.
//
// NB: We don't handle the similiar simplication of `(and (icmp ne), (icmp
// ne))` as it doesn't end up instruction positive.
// TODO: We might want to do this for avx512 as well if we `sext` the result.
if (VT.isVector() && OpVT.isVector() && OpVT.isInteger() &&
ISD::isUnsignedIntSetCC(CC) && LHS.getOpcode() == ISD::ADD &&
!Subtarget.hasAVX512() &&
(OpVT.getSizeInBits() <= 128 || !Subtarget.hasAVX() ||
Subtarget.hasAVX2()) &&
LHS.hasOneUse()) {

APInt CmpC;
SDValue AddC = LHS.getOperand(1);
if (ISD::isConstantSplatVector(RHS.getNode(), CmpC) &&
DAG.isConstantIntBuildVectorOrConstantInt(AddC)) {
// See which form we have depending on the constant/condition.
SDValue C0 = SDValue();
SDValue C1 = SDValue();

// If we had `(add x, -1)` and can lower with `umin`, don't transform as
// we will end up generating an additional constant. Keeping in the
// current form has a slight latency cost, but it probably worth saving a
// constant.
if (ISD::isConstantSplatVectorAllOnes(AddC.getNode()) &&
DAG.getTargetLoweringInfo().isOperationLegal(ISD::UMIN, OpVT)) {
// Pass
}
// Normal Cases
else if ((CC == ISD::SETULT && CmpC == 2) ||
(CC == ISD::SETULE && CmpC == 1)) {
// These will constant fold.
C0 = DAG.getNegative(AddC, DL, OpVT);
C1 = DAG.getNode(ISD::SUB, DL, OpVT, C0,
DAG.getAllOnesConstant(DL, OpVT));
}
// Inverted Cases
else if ((CC == ISD::SETUGT && (-CmpC) == 3) ||
(CC == ISD::SETUGE && (-CmpC) == 2)) {
// These will constant fold.
C0 = DAG.getNOT(DL, AddC, OpVT);
C1 = DAG.getNode(ISD::ADD, DL, OpVT, C0,
DAG.getAllOnesConstant(DL, OpVT));
}
if (C0 && C1) {
SDValue NewLHS =
DAG.getSetCC(DL, VT, LHS.getOperand(0), C0, ISD::SETEQ);
SDValue NewRHS =
DAG.getSetCC(DL, VT, LHS.getOperand(0), C1, ISD::SETEQ);
return DAG.getNode(ISD::OR, DL, VT, NewLHS, NewRHS);
}
}
}

// For an SSE1-only target, lower a comparison of v4f32 to X86ISD::CMPP early
// to avoid scalarization via legalization because v4i32 is not a legal type.
if (Subtarget.hasSSE1() && !Subtarget.hasSSE2() && VT == MVT::v4i32 &&
Expand Down

0 comments on commit 9f96db8

Please sign in to comment.