Skip to content

Commit

Permalink
[DAG] fold FP binops with undef operands to NaN
Browse files Browse the repository at this point in the history
This is the FP sibling of D43141 with the corresponding IR change in rL327212.

We can't propagate undef here because if a variable operand is a NaN, these 
binops must propagate NaN. Neither global nor node-level fast-math makes a 
difference. If we have 'nnan', I think later folds can turn the NaN into undef.

The tests in X86/fp-undef.ll are meant to be the definitive verification for 
these folds - everything reduces identically now.

The other test changes are collateral damage. They may need to be altered to
preserve their intent.

Differential Revision: https://reviews.llvm.org/D47026

llvm-svn: 332920
  • Loading branch information
rotateright committed May 21, 2018
1 parent 5cb2e30 commit 17a870f
Show file tree
Hide file tree
Showing 8 changed files with 299 additions and 512 deletions.
23 changes: 12 additions & 11 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4767,6 +4767,18 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
}
}

// Any FP binop with an undef operand is folded to NaN. This matches the
// behavior of the IR optimizer.
switch (Opcode) {
case ISD::FADD:
case ISD::FSUB:
case ISD::FMUL:
case ISD::FDIV:
case ISD::FREM:
if (N1.isUndef() || N2.isUndef())
return getConstantFP(APFloat::getNaN(EVTToAPFloatSemantics(VT)), DL, VT);
}

// Canonicalize an UNDEF to the RHS, even over a constant.
if (N1.isUndef()) {
if (TLI->isCommutativeBinOp(Opcode)) {
Expand All @@ -4776,9 +4788,6 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
case ISD::FP_ROUND_INREG:
case ISD::SIGN_EXTEND_INREG:
case ISD::SUB:
case ISD::FSUB:
case ISD::FDIV:
case ISD::FREM:
return getUNDEF(VT); // fold op(undef, arg2) -> undef
case ISD::UDIV:
case ISD::SDIV:
Expand Down Expand Up @@ -4813,14 +4822,6 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
case ISD::SRL:
case ISD::SHL:
return getUNDEF(VT); // fold op(arg1, undef) -> undef
case ISD::FADD:
case ISD::FSUB:
case ISD::FMUL:
case ISD::FDIV:
case ISD::FREM:
if (getTarget().Options.UnsafeFPMath)
return N2;
break;
case ISD::MUL:
case ISD::AND:
return getConstant(0, DL, VT); // fold op(arg1, undef) -> 0
Expand Down
3 changes: 1 addition & 2 deletions llvm/test/CodeGen/AArch64/fcvt_combine.ll
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,8 @@ define <2 x i32> @test9(<2 x float> %f) {
ret <2 x i32> %vcvt.i
}

; Don't combine all undefs.
; Combine all undefs.
; CHECK-LABEL: test10
; CHECK: fmul.2s v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; CHECK: fcvtzu.2s v{{[0-9]+}}, v{{[0-9]+}}
; CHECK: ret
define <2 x i32> @test10(<2 x float> %f) {
Expand Down
7 changes: 5 additions & 2 deletions llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,13 @@ define <2 x half> @v_mad_mix_v2f32_clamp_postcvt(<2 x half> %src0, <2 x half> %s
; FIXME: Should be packed into 2 registers per argument?
; GCN-LABEL: {{^}}v_mad_mix_v3f32_clamp_postcvt:
; GCN: s_waitcnt
; GFX9-NEXT: v_mad_mixlo_f16 v2, v2, v5, v8 op_sel_hi:[1,1,1] clamp
; GFX9-NEXT: v_mad_mixhi_f16 v2, v0, v0, v0 clamp
; GFX9-NEXT: v_mad_mixlo_f16 v2, v2, v5, v8 op_sel_hi:[1,1,1]
; GFX9-NEXT: v_mad_mixlo_f16 v0, v0, v3, v6 op_sel_hi:[1,1,1] clamp
; GFX9-NEXT: s_movk_i32 s6, 0x7e00
; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX9-NEXT: v_lshl_or_b32 v2, s6, 16, v2
; GFX9-NEXT: v_mad_mixhi_f16 v0, v1, v4, v7 op_sel_hi:[1,1,1] clamp
; GFX9-NEXT: v_pk_max_f16 v2, v2, v2 clamp
; GFX9-NEXT: v_lshrrev_b32_e32 v1, 16, v0
; GFX9-NEXT: s_setpc_b64
define <3 x half> @v_mad_mix_v3f32_clamp_postcvt(<3 x half> %src0, <3 x half> %src1, <3 x half> %src2) #0 {
Expand Down
9 changes: 0 additions & 9 deletions llvm/test/CodeGen/NVPTX/implicit-def.ll

This file was deleted.

Loading

0 comments on commit 17a870f

Please sign in to comment.