Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26585,6 +26585,26 @@ performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
return DAG.getNode(SOpc, DL, N->getValueType(0), Ext);
}

// Sign extend of CSET -> CSETM.
if (Opc == AArch64ISD::CSEL &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having a hasOneUse() check for the CSEL? Unlikely perhaps but somebody might want the zero and sign extended results at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the CSEL has more uses, I think this shouldn't make things worse: it shouldn't increase the instruction count since we avoid the sign extension, and should help reduce the latency of the sign-extended result since we compute it from the CSEL operands, right? Are you concerned about some other effect that emitting both CSET and CSETM might have? (I'm happy to add the hasOneUse() check, I'm just confirming I'm not missing anything. :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not backed by any science I just wasn't sure if having two instructions requiring the same CC flags might hinder scheduling more? It seems an unlikely scenario so I'm happy to defer the decision until we have a real world use case to analyse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that I believe we may already generate multiple instructions requiring the same CC flags elsewhere, for example when selecting CSINC from (add x, CSEL (...)) patterns, and it wasn't obvious to me how doing this in a combine would hypothetically be worse.

If you're happy with the current approach I'll go ahead and merge it on the basis that we can tighten the conditions if this turns out to be an issue. :)

cast<VTSDNode>(N->getOperand(1))->getVT() == MVT::i1) {
EVT VT = N->getValueType(0);
SDValue TVal = Src.getOperand(0);
SDValue FVal = Src.getOperand(1);

// SIGN_EXTEND_INREG (CSEL 0, 1, cc, NZCV), i1 --> CSEL 0, -1, cc, NZCV
if (isNullConstant(TVal) && isOneConstant(FVal))
return DAG.getNode(AArch64ISD::CSEL, DL, VT, TVal,
DAG.getAllOnesConstant(DL, VT), Src.getOperand(2),
Src.getOperand(3));

// SIGN_EXTEND_INREG (CSEL 1, 0, cc, NZCV), i1 --> CSEL -1, 0, cc, NZCV
if (isOneConstant(TVal) && isNullConstant(FVal))
return DAG.getNode(AArch64ISD::CSEL, DL, VT,
DAG.getAllOnesConstant(DL, VT), FVal,
Src.getOperand(2), Src.getOperand(3));
}

if (DCI.isBeforeLegalizeOps())
return SDValue();

Expand Down
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AArch64/arm64-ccmp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -632,11 +632,11 @@ define i64 @select_noccmp2(i64 %v1, i64 %v2, i64 %v3, i64 %r) {
; CHECK-SD-NEXT: cmp x0, #0
; CHECK-SD-NEXT: ccmp x0, #13, #0, pl
; CHECK-SD-NEXT: cset w8, gt
; CHECK-SD-NEXT: csetm w9, gt
; CHECK-SD-NEXT: cmp w8, #0
; CHECK-SD-NEXT: csel x0, xzr, x3, ne
; CHECK-SD-NEXT: sbfx w8, w8, #0, #1
; CHECK-SD-NEXT: adrp x9, _g@PAGE
; CHECK-SD-NEXT: str w8, [x9, _g@PAGEOFF]
; CHECK-SD-NEXT: adrp x8, _g@PAGE
; CHECK-SD-NEXT: str w9, [x8, _g@PAGEOFF]
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: select_noccmp2:
Expand Down
4 changes: 1 addition & 3 deletions llvm/test/CodeGen/AArch64/extract-vector-cmp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -142,14 +142,12 @@ for.cond.cleanup:
}


; TODO: Combine the sbfx(cset) into a csetm
define i32 @issue_121372(<4 x i32> %v) {
; CHECK-LABEL: issue_121372:
; CHECK: // %bb.0:
; CHECK-NEXT: fmov w8, s0
; CHECK-NEXT: cmp w8, #0
; CHECK-NEXT: cset w8, eq
; CHECK-NEXT: sbfx w8, w8, #0, #1
; CHECK-NEXT: csetm w8, eq
; CHECK-NEXT: cmp w8, #1
; CHECK-NEXT: csetm w0, lt
; CHECK-NEXT: ret
Expand Down
Loading