Skip to content

Commit

Permalink
IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
Browse files Browse the repository at this point in the history
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
  • Loading branch information
arsenm committed Jun 6, 2023
1 parent 5d361ad commit eece6ba
Show file tree
Hide file tree
Showing 71 changed files with 3,780 additions and 422 deletions.
9 changes: 7 additions & 2 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17146,8 +17146,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
return emitUnaryBuiltin(*this, E, Intrinsic::amdgcn_log_clamp);
case AMDGPU::BI__builtin_amdgcn_ldexp:
case AMDGPU::BI__builtin_amdgcn_ldexpf:
case AMDGPU::BI__builtin_amdgcn_ldexph:
return emitFPIntBuiltin(*this, E, Intrinsic::amdgcn_ldexp);
case AMDGPU::BI__builtin_amdgcn_ldexph: {
llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
llvm::Function *F =
CGM.getIntrinsic(Intrinsic::ldexp, {Src0->getType(), Src1->getType()});
return Builder.CreateCall(F, {Src0, Src1});
}
case AMDGPU::BI__builtin_amdgcn_frexp_mant:
case AMDGPU::BI__builtin_amdgcn_frexp_mantf:
case AMDGPU::BI__builtin_amdgcn_frexp_manth:
Expand Down
2 changes: 1 addition & 1 deletion clang/test/CodeGenOpenCL/builtins-amdgcn-vi.cl
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ void test_cos_f16(global half* out, half a)
}

// CHECK-LABEL: @test_ldexp_f16
// CHECK: call half @llvm.amdgcn.ldexp.f16
// CHECK: call half @llvm.ldexp.f16.i32
void test_ldexp_f16(global half* out, half a, int b)
{
*out = __builtin_amdgcn_ldexph(a, b);
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGenOpenCL/builtins-amdgcn.cl
Original file line number Diff line number Diff line change
Expand Up @@ -180,14 +180,14 @@ void test_log_clamp_f32(global float* out, float a)
}

// CHECK-LABEL: @test_ldexp_f32
// CHECK: call float @llvm.amdgcn.ldexp.f32
// CHECK: call float @llvm.ldexp.f32.i32
void test_ldexp_f32(global float* out, float a, int b)
{
*out = __builtin_amdgcn_ldexpf(a, b);
}

// CHECK-LABEL: @test_ldexp_f64
// CHECK: call double @llvm.amdgcn.ldexp.f64
// CHECK: call double @llvm.ldexp.f64.i32
void test_ldexp_f64(global double* out, double a, int b)
{
*out = __builtin_amdgcn_ldexp(a, b);
Expand Down
82 changes: 82 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14713,6 +14713,47 @@ trapping or setting ``errno``.
When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.

'``llvm.ldexp.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

This is an overloaded intrinsic. You can use ``llvm.ldexp`` on any
floating point or vector of floating point type. Not all targets support
all types however.

::

declare float @llvm.ldexp.f32.i32(float %Val, i32 %Exp)
declare double @llvm.ldexp.f64.i32(double %Val, i32 %Exp)
declare x86_fp80 @llvm.ldexp.f80.i32(x86_fp80 %Val, i32 %Exp)
declare fp128 @llvm.ldexp.f128.i32(fp128 %Val, i32 %Exp)
declare ppc_fp128 @llvm.ldexp.ppcf128.i32(ppc_fp128 %Val, i32 %Exp)
declare <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> %Val, <2 x i32> %Exp)

Overview:
"""""""""

The '``llvm.ldexp.*``' intrinsics perform the ldexp function.

Arguments:
""""""""""

The first argument and the return value are :ref:`floating-point
<t_floating>` or :ref:`vector <t_vector>` of floating-point values of
the same type. The second argument is an integer with the same number
of elements.

Semantics:
""""""""""

This function multiplies the first argument by 2 raised to the second
argument's power. If the first argument is NaN or infinite, the same
value is returned. If the result underflows a zero with the same sign
is returned. If the result overflows, the result is an infinity with
the same sign.

'``llvm.log.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -24306,6 +24347,47 @@ This function returns the first value raised to the second power with an
unspecified sequence of rounding operations.


'``llvm.experimental.constrained.ldexp``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

declare <type0>
@llvm.experimental.constrained.ldexp(<type0> <op1>, <type1> <op2>,
metadata <rounding mode>,
metadata <exception behavior>)

Overview:
"""""""""

The '``llvm.experimental.constrained.ldexp``' performs the ldexp function.


Arguments:
""""""""""

The first argument and the return value are :ref:`floating-point
<t_floating>` or :ref:`vector <t_vector>` of floating-point values of
the same type. The second argument is an integer with the same number
of elements.


The third and fourth arguments specify the rounding mode and exception
behavior as described above.

Semantics:
""""""""""

This function multiplies the first argument by 2 raised to the second
argument's power. If the first argument is NaN or infinite, the same
value is returned. If the result underflows a zero with the same sign
is returned. If the result overflows, the result is an infinity with
the same sign.


'``llvm.experimental.constrained.sin``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
2 changes: 2 additions & 0 deletions llvm/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Changes to the LLVM IR
* The ``nofpclass`` attribute was introduced. This allows more
optimizations around special floating point value comparisons.

* Introduced new ``llvm.ldexp`` and ``llvm.experimental.constrained.ldexp`` intrinsics.

* The constant expression variants of the following instructions have been
removed:

Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Analysis/TargetLibraryInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,7 @@ class TargetLibraryInfo {
case LibFunc_trunc: case LibFunc_truncf: case LibFunc_truncl:
case LibFunc_log2: case LibFunc_log2f: case LibFunc_log2l:
case LibFunc_exp2: case LibFunc_exp2f: case LibFunc_exp2l:
case LibFunc_ldexp: case LibFunc_ldexpf: case LibFunc_ldexpl:
case LibFunc_memcpy: case LibFunc_memset: case LibFunc_memmove:
case LibFunc_memcmp: case LibFunc_bcmp: case LibFunc_strcmp:
case LibFunc_strcpy: case LibFunc_stpcpy: case LibFunc_strlen:
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,7 @@ class LegalizerHelper {
LegalizeResult narrowScalarCTLZ(MachineInstr &MI, unsigned TypeIdx, LLT Ty);
LegalizeResult narrowScalarCTTZ(MachineInstr &MI, unsigned TypeIdx, LLT Ty);
LegalizeResult narrowScalarCTPOP(MachineInstr &MI, unsigned TypeIdx, LLT Ty);
LegalizeResult narrowScalarFLDEXP(MachineInstr &MI, unsigned TypeIdx, LLT Ty);

/// Perform Bitcast legalize action on G_EXTRACT_VECTOR_ELT.
LegalizeResult bitcastExtractVectorElt(MachineInstr &MI, unsigned TypeIdx,
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -1825,6 +1825,13 @@ class MachineIRBuilder {
return buildInstr(TargetOpcode::G_FPOW, {Dst}, {Src0, Src1}, Flags);
}

/// Build and insert \p Dst = G_FLDEXP \p Src0, \p Src1
MachineInstrBuilder
buildFLdexp(const DstOp &Dst, const SrcOp &Src0, const SrcOp &Src1,
std::optional<unsigned> Flags = std::nullopt) {
return buildInstr(TargetOpcode::G_FLDEXP, {Dst}, {Src0, Src1}, Flags);
}

/// Build and insert \p Res = G_FCOPYSIGN \p Op0, \p Op1
MachineInstrBuilder buildFCopysign(const DstOp &Dst, const SrcOp &Src0,
const SrcOp &Src1) {
Expand Down
5 changes: 4 additions & 1 deletion llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,7 @@ enum NodeType {
STRICT_FSQRT,
STRICT_FPOW,
STRICT_FPOWI,
STRICT_FLDEXP,
STRICT_FSIN,
STRICT_FCOS,
STRICT_FEXP,
Expand Down Expand Up @@ -926,8 +927,10 @@ enum NodeType {
FCBRT,
FSIN,
FCOS,
FPOWI,
FPOW,
FPOWI,
/// FLDEXP - ldexp, inspired by libm (op0 * 2**op1).
FLDEXP,
FLOG,
FLOG2,
FLOG10,
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/CodeGen/RuntimeLibcalls.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ namespace RTLIB {
/// UNKNOWN_LIBCALL if there is none.
Libcall getPOWI(EVT RetVT);

/// getLDEXP - Return the LDEXP_* value for the given types, or
/// UNKNOWN_LIBCALL if there is none.
Libcall getLDEXP(EVT RetVT);

/// Return the SYNC_FETCH_AND_* value for the given opcode and type, or
/// UNKNOWN_LIBCALL if there is none.
Libcall getSYNC(unsigned Opc, MVT VT);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/IR/ConstrainedOps.def
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ DAG_FUNCTION(minimum, 2, 0, experimental_constrained_minimum, FMINIMU
DAG_FUNCTION(nearbyint, 1, 1, experimental_constrained_nearbyint, FNEARBYINT)
DAG_FUNCTION(pow, 2, 1, experimental_constrained_pow, FPOW)
DAG_FUNCTION(powi, 2, 1, experimental_constrained_powi, FPOWI)
DAG_FUNCTION(ldexp, 2, 1, experimental_constrained_ldexp, FLDEXP)
DAG_FUNCTION(rint, 1, 1, experimental_constrained_rint, FRINT)
DAG_FUNCTION(round, 1, 0, experimental_constrained_round, FROUND)
DAG_FUNCTION(roundeven, 1, 0, experimental_constrained_roundeven, FROUNDEVEN)
Expand Down
9 changes: 9 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -1037,6 +1037,10 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
def int_llround : DefaultAttrsIntrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty]>;
def int_lrint : DefaultAttrsIntrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty]>;
def int_llrint : DefaultAttrsIntrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty]>;

// TODO: int operand should be constrained to same number of elements as the result.
def int_ldexp : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
llvm_anyint_ty]>;
}

def int_minnum : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
Expand Down Expand Up @@ -1168,6 +1172,11 @@ let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn] in {
llvm_i32_ty,
llvm_metadata_ty,
llvm_metadata_ty ]>;
def int_experimental_constrained_ldexp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,
llvm_anyint_ty,
llvm_metadata_ty,
llvm_metadata_ty ]>;
def int_experimental_constrained_sin : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,
llvm_metadata_ty,
Expand Down
5 changes: 5 additions & 0 deletions llvm/include/llvm/IR/RuntimeLibcalls.def
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,11 @@ HANDLE_LIBCALL(LLRINT_F64, "llrint")
HANDLE_LIBCALL(LLRINT_F80, "llrintl")
HANDLE_LIBCALL(LLRINT_F128, "llrintl")
HANDLE_LIBCALL(LLRINT_PPCF128, "llrintl")
HANDLE_LIBCALL(LDEXP_F32, "ldexpf")
HANDLE_LIBCALL(LDEXP_F64, "ldexp")
HANDLE_LIBCALL(LDEXP_F80, "ldexpl")
HANDLE_LIBCALL(LDEXP_F128, "ldexpl")
HANDLE_LIBCALL(LDEXP_PPCF128, "ldexpl")

// Floating point environment
HANDLE_LIBCALL(FEGETENV, "fegetenv")
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Support/TargetOpcodes.def
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,9 @@ HANDLE_TARGET_OPCODE(G_FLOG2)
/// Floating point base-10 logarithm of a value.
HANDLE_TARGET_OPCODE(G_FLOG10)

/// Floating point x * 2^n
HANDLE_TARGET_OPCODE(G_FLDEXP)

/// Generic FP negation.
HANDLE_TARGET_OPCODE(G_FNEG)

Expand Down Expand Up @@ -762,6 +765,7 @@ HANDLE_TARGET_OPCODE(G_STRICT_FDIV)
HANDLE_TARGET_OPCODE(G_STRICT_FREM)
HANDLE_TARGET_OPCODE(G_STRICT_FMA)
HANDLE_TARGET_OPCODE(G_STRICT_FSQRT)
HANDLE_TARGET_OPCODE(G_STRICT_FLDEXP)

/// read_register intrinsic
HANDLE_TARGET_OPCODE(G_READ_REGISTER)
Expand Down
8 changes: 8 additions & 0 deletions llvm/include/llvm/Target/GenericOpcodes.td
Original file line number Diff line number Diff line change
Expand Up @@ -923,6 +923,13 @@ def G_FLOG10 : GenericInstruction {
let hasSideEffects = false;
}

// Floating point x * 2^n
def G_FLDEXP : GenericInstruction {
let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$src0, type1:$src1);
let hasSideEffects = false;
}

// Floating point ceiling of a value.
def G_FCEIL : GenericInstruction {
let OutOperandList = (outs type0:$dst);
Expand Down Expand Up @@ -1384,6 +1391,7 @@ def G_STRICT_FDIV : ConstrainedInstruction<G_FDIV>;
def G_STRICT_FREM : ConstrainedInstruction<G_FREM>;
def G_STRICT_FMA : ConstrainedInstruction<G_FMA>;
def G_STRICT_FSQRT : ConstrainedInstruction<G_FSQRT>;
def G_STRICT_FLDEXP : ConstrainedInstruction<G_FLDEXP>;

//------------------------------------------------------------------------------
// Memory intrinsics
Expand Down
2 changes: 2 additions & 0 deletions llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ def : GINodeEquiv<G_FREM, frem>;
def : GINodeEquiv<G_FPOW, fpow>;
def : GINodeEquiv<G_FEXP2, fexp2>;
def : GINodeEquiv<G_FLOG2, flog2>;
def : GINodeEquiv<G_FLDEXP, fldexp>;
def : GINodeEquiv<G_FCANONICALIZE, fcanonicalize>;
def : GINodeEquiv<G_IS_FPCLASS, is_fpclass>;
def : GINodeEquiv<G_INTRINSIC, intrinsic_wo_chain>;
Expand Down Expand Up @@ -158,6 +159,7 @@ def : GINodeEquiv<G_STRICT_FDIV, strict_fdiv>;
def : GINodeEquiv<G_STRICT_FREM, strict_frem>;
def : GINodeEquiv<G_STRICT_FMA, strict_fma>;
def : GINodeEquiv<G_STRICT_FSQRT, strict_fsqrt>;
def : GINodeEquiv<G_STRICT_FLDEXP, strict_fldexp>;

// Broadly speaking G_LOAD is equivalent to ISD::LOAD but there are some
// complications that tablegen must take care of. For example, Predicates such
Expand Down
9 changes: 9 additions & 0 deletions llvm/include/llvm/Target/TargetSelectionDAG.td
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,9 @@ def SDTFPToIntOp : SDTypeProfile<1, 1, [ // fp_to_[su]int
def SDTFPToIntSatOp : SDTypeProfile<1, 2, [ // fp_to_[su]int_sat
SDTCisInt<0>, SDTCisFP<1>, SDTCisSameNumEltsAs<0, 1>, SDTCisVT<2, OtherVT>
]>;
def SDTFPExpOp : SDTypeProfile<1, 2, [ // ldexp
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisInt<2>
]>;
def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg
SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,
SDTCisVTSmallerThanOp<2, 1>
Expand Down Expand Up @@ -499,6 +502,7 @@ def fcos : SDNode<"ISD::FCOS" , SDTFPUnaryOp>;
def fexp2 : SDNode<"ISD::FEXP2" , SDTFPUnaryOp>;
def fpow : SDNode<"ISD::FPOW" , SDTFPBinOp>;
def flog2 : SDNode<"ISD::FLOG2" , SDTFPUnaryOp>;
def fldexp : SDNode<"ISD::FLDEXP" , SDTFPExpOp>;
def frint : SDNode<"ISD::FRINT" , SDTFPUnaryOp>;
def ftrunc : SDNode<"ISD::FTRUNC" , SDTFPUnaryOp>;
def fceil : SDNode<"ISD::FCEIL" , SDTFPUnaryOp>;
Expand Down Expand Up @@ -549,6 +553,8 @@ def strict_fexp2 : SDNode<"ISD::STRICT_FEXP2",
SDTFPUnaryOp, [SDNPHasChain]>;
def strict_fpow : SDNode<"ISD::STRICT_FPOW",
SDTFPBinOp, [SDNPHasChain]>;
def strict_fldexp : SDNode<"ISD::STRICT_FLDEXP",
SDTFPExpOp, [SDNPHasChain]>;
def strict_flog2 : SDNode<"ISD::STRICT_FLOG2",
SDTFPUnaryOp, [SDNPHasChain]>;
def strict_frint : SDNode<"ISD::STRICT_FRINT",
Expand Down Expand Up @@ -1449,6 +1455,9 @@ def any_fexp2 : PatFrags<(ops node:$src),
def any_fpow : PatFrags<(ops node:$lhs, node:$rhs),
[(strict_fpow node:$lhs, node:$rhs),
(fpow node:$lhs, node:$rhs)]>;
def any_fldexp : PatFrags<(ops node:$lhs, node:$rhs),
[(strict_fldexp node:$lhs, node:$rhs),
(fldexp node:$lhs, node:$rhs)]>;
def any_flog2 : PatFrags<(ops node:$src),
[(strict_flog2 node:$src),
(flog2 node:$src)]>;
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1759,6 +1759,8 @@ unsigned IRTranslator::getSimpleIntrinsicOpcode(Intrinsic::ID ID) {
return TargetOpcode::G_FLOG2;
case Intrinsic::log10:
return TargetOpcode::G_FLOG10;
case Intrinsic::ldexp:
return TargetOpcode::G_FLDEXP;
case Intrinsic::nearbyint:
return TargetOpcode::G_FNEARBYINT;
case Intrinsic::pow:
Expand Down Expand Up @@ -1851,6 +1853,8 @@ static unsigned getConstrainedOpcode(Intrinsic::ID ID) {
return TargetOpcode::G_STRICT_FMA;
case Intrinsic::experimental_constrained_sqrt:
return TargetOpcode::G_STRICT_FSQRT;
case Intrinsic::experimental_constrained_ldexp:
return TargetOpcode::G_STRICT_FLDEXP;
default:
return 0;
}
Expand Down

0 comments on commit eece6ba

Please sign in to comment.