-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[ISel] Introduce llvm.clmul intrinsic #168731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-llvm-ir Author: Ramkumar Ramachandra (artagnon) ChangesIn line with a std proposal to introduce the llvm.clmul[rh] family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsic works, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Patch is 3.14 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168731.diff 17 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 734778f73af5f..03eacbc3f0730 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -18291,8 +18291,6 @@ then the result is the size in bits of the type of ``src`` if
``is_zero_poison == 0`` and ``poison`` otherwise. For example,
``llvm.cttz(2) = 1``.
-.. _int_overflow:
-
.. _int_fshl:
'``llvm.fshl.*``' Intrinsic
@@ -18389,6 +18387,153 @@ Example:
%r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
%r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
+.. _int_clmul:
+
+'``llvm.clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul`` on any integer
+or vectors of integer elements.
+
+::
+
+ declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+ declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+ declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+ declare <4 x i32> @llvm.clmul.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview:
+"""""""""
+
+The '``llvm.clmul``' family of intrinsic functions performs carry-less
+multiplication, or XOR multiplication, on the two arguments, and returns
+the low-bits.
+
+Arguments:
+""""""""""
+
+The arguments may be any integer type or vector of integer type. Both arguments
+and result must have the same type.
+
+Semantics:
+""""""""""
+
+The '``llvm.clmul``' intrinsic computes carry-less multiply of its arguments,
+which is the result of applying the standard Eucledian multiplication algorithm,
+where all of the additions are replaced with XORs, and returns the low-bits.
+The vector variants operate lane-wise.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+ %r = call i4 @llvm.clmul.i4(i4 1, i4 2) ; %r = 2
+ %r = call i4 @llvm.clmul.i4(i4 5, i4 6) ; %r = 14
+ %r = call i4 @llvm.clmul.i4(i4 -4, i4 2) ; %r = -8
+ %r = call i4 @llvm.clmul.i4(i4 -4, i4 -5) ; %r = 4
+
+'``llvm.clmulr.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmulr`` on any integer
+or vectors of integer elements.
+
+::
+
+ declare i16 @llvm.clmulr.i16(i16 %a, i16 %b)
+ declare i32 @llvm.clmulr.i32(i32 %a, i32 %b)
+ declare i64 @llvm.clmulr.i64(i64 %a, i64 %b)
+ declare <4 x i32> @llvm.clmulr.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview:
+"""""""""
+
+The '``llvm.clmulr``' family of intrinsic functions performs reversed
+carry-less multiplication on the two arguments.
+
+Arguments:
+""""""""""
+
+The arguments may be any integer type or vector of integer type. Both arguments
+and result must have the same type.
+
+Semantics:
+""""""""""
+
+The '``llvm.clmulr``' intrinsic computes reversed carry-less multiply of its
+arguments. The vector variants operate lane-wise.
+
+.. codeblock:: text
+
+ clmulr(%a, %b) = bitreverse(clmul(bitreverse(%a), bitreverse(%b)))
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+ %r = call i4 @llvm.clmulr.i4(i4 1, i4 2) ; %r = 0
+ %r = call i4 @llvm.clmulr.i4(i4 5, i4 6) ; %r = 3
+ %r = call i4 @llvm.clmulr.i4(i4 -4, i4 2) ; %r = 3
+ %r = call i4 @llvm.clmulr.i4(i4 -4, i4 -5) ; %r = -2
+
+'``llvm.clmulh.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmulh`` on any integer
+or vectors of integer elements.
+
+::
+
+ declare i16 @llvm.clmulh.i16(i16 %a, i16 %b)
+ declare i32 @llvm.clmulh.i32(i32 %a, i32 %b)
+ declare i64 @llvm.clmulh.i64(i64 %a, i64 %b)
+ declare <4 x i32> @llvm.clmulh.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview:
+"""""""""
+
+The '``llvm.clmulh``' family of intrinsic functions performs carry-less
+multiplication on the two arguments, and returns the high-bits.
+
+Arguments:
+""""""""""
+
+The arguments may be any integer type or vector of integer type. Both arguments
+and result must have the same type.
+
+Semantics:
+""""""""""
+
+The '``llvm.clmulh``' intrinsic computes carry-less multiply of its arguments,
+and returns the high-bits. The vector variants operate lane-wise.
+
+.. codeblock:: text
+
+ clmulh(%a, %b) = clmulr(%a, %b) >> 1
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+ %r = call i4 @llvm.clmulh.i4(i4 1, i4 2) ; %r = 0
+ %r = call i4 @llvm.clmulh.i4(i4 5, i4 6) ; %r = 1
+ %r = call i4 @llvm.clmulh.i4(i4 -4, i4 2) ; %r = 1
+ %r = call i4 @llvm.clmulh.i4(i4 -4, i4 -5) ; %r = 7
+
+.. _int_overflow:
+
Arithmetic with Overflow Intrinsics
-----------------------------------
diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index cdaa916548c25..08d87f7e7b266 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -767,6 +767,11 @@ enum NodeType {
FSHL,
FSHR,
+ /// Carry-less multiplication operations.
+ CLMUL,
+ CLMULR,
+ CLMULH,
+
/// Byte Swap and Counting operators.
BSWAP,
CTTZ,
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 4c932c523e423..b38bd6729f34d 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5455,6 +5455,11 @@ class LLVM_ABI TargetLowering : public TargetLoweringBase {
/// \returns The expansion if successful, SDValue() otherwise
SDValue expandFunnelShift(SDNode *N, SelectionDAG &DAG) const;
+ /// Expand carryless multiply.
+ /// \param N Node to expand
+ /// \returns The expansion if successful, SDValue() otherwise
+ SDValue expandCLMUL(SDNode *N, SelectionDAG &DAG) const;
+
/// Expand rotations.
/// \param N Node to expand
/// \param AllowVectorOps expand vector rotate, this should only be performed
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index 8f3cc54747074..11fc73829b026 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1465,6 +1465,12 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrNoCreateUndefOrPoison] in
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
def int_fshr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
+ def int_clmul : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>]>;
+ def int_clmulr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>]>;
+ def int_clmulh : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>]>;
}
let IntrProperties = [IntrNoMem, IntrSpeculatable,
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index a9750a5ab03f9..c50afe3bfe4bb 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -441,6 +441,12 @@ def sra_parts : SDNode<"ISD::SRA_PARTS" , SDTIntShiftPairOp>;
def srl_parts : SDNode<"ISD::SRL_PARTS" , SDTIntShiftPairOp>;
def fshl : SDNode<"ISD::FSHL" , SDTIntShiftDOp>;
def fshr : SDNode<"ISD::FSHR" , SDTIntShiftDOp>;
+def clmul : SDNode<"ISD::CLMUL" , SDTIntBinOp,
+ [SDNPCommutative, SDNPAssociative]>;
+def clmulr : SDNode<"ISD::CLMULR" , SDTIntBinOp,
+ [SDNPCommutative, SDNPAssociative]>;
+def clmulh : SDNode<"ISD::CLMULH" , SDTIntBinOp,
+ [SDNPCommutative, SDNPAssociative]>;
def and : SDNode<"ISD::AND" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;
def or : SDNode<"ISD::OR" , SDTIntBinOp,
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 99d14a60c6ed1..4e9cbbb85c129 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -4095,6 +4095,12 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
if (SDValue Expanded = TLI.expandFunnelShift(Node, DAG))
Results.push_back(Expanded);
break;
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
+ if (SDValue Expanded = TLI.expandCLMUL(Node, DAG))
+ Results.push_back(Expanded);
+ break;
case ISD::ROTL:
case ISD::ROTR:
if (SDValue Expanded = TLI.expandROT(Node, true /*AllowVectorOps*/, DAG))
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 44e5a187c4281..3fb58ca52bf30 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -204,6 +204,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::ADD:
case ISD::SUB:
case ISD::MUL:
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
case ISD::VP_AND:
case ISD::VP_OR:
case ISD::VP_XOR:
@@ -3128,7 +3131,7 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::USHLSAT: ExpandIntRes_SHLSAT(N, Lo, Hi); break;
case ISD::AVGCEILS:
- case ISD::AVGCEILU:
+ case ISD::AVGCEILU:
case ISD::AVGFLOORS:
case ISD::AVGFLOORU: ExpandIntRes_AVG(N, Lo, Hi); break;
@@ -3162,6 +3165,12 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
ExpandIntRes_FunnelShift(N, Lo, Hi);
break;
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
+ ExpandIntRes_CLMUL(N, Lo, Hi);
+ break;
+
case ISD::VSCALE:
ExpandIntRes_VSCALE(N, Lo, Hi);
break;
@@ -5492,6 +5501,11 @@ void DAGTypeLegalizer::ExpandIntRes_FunnelShift(SDNode *N, SDValue &Lo,
Hi = DAG.getNode(Opc, DL, HalfVT, Select3, Select2, NewShAmt);
}
+void DAGTypeLegalizer::ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo, SDValue &Hi) {
+ SDValue Res = TLI.expandCLMUL(N, DAG);
+ SplitInteger(Res, Lo, Hi);
+}
+
void DAGTypeLegalizer::ExpandIntRes_VSCALE(SDNode *N, SDValue &Lo,
SDValue &Hi) {
EVT VT = N->getValueType(0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index ede522eff6df3..5ebda11cc189e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -513,6 +513,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
void ExpandIntRes_Rotate (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_FunnelShift (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_VSCALE (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_READ_REGISTER(SDNode *N, SDValue &Lo, SDValue &Hi);
@@ -912,6 +913,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue ScalarizeVecOp_UnaryOp(SDNode *N);
SDValue ScalarizeVecOp_UnaryOpWithExtraInput(SDNode *N);
SDValue ScalarizeVecOp_UnaryOp_StrictFP(SDNode *N);
+ SDValue ScalarizeVecOp_BinOp(SDNode *N);
SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);
SDValue ScalarizeVecOp_INSERT_SUBVECTOR(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 71eeee78bd868..a7b408f88e1d5 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -173,6 +173,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
case ISD::SMAX:
case ISD::UMIN:
case ISD::UMAX:
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
case ISD::SADDSAT:
case ISD::UADDSAT:
@@ -799,6 +802,11 @@ bool DAGTypeLegalizer::ScalarizeVectorOperand(SDNode *N, unsigned OpNo) {
case ISD::LLRINT:
Res = ScalarizeVecOp_UnaryOp(N);
break;
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
+ Res = ScalarizeVecOp_BinOp(N);
+ break;
case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:
Res = ScalarizeVecOp_UnaryOpWithExtraInput(N);
@@ -950,6 +958,20 @@ SDValue DAGTypeLegalizer::ScalarizeVecOp_UnaryOp_StrictFP(SDNode *N) {
return SDValue();
}
+/// If the input is a vector that needs to be scalarized, it must be <1 x ty>.
+/// Do the operation on the element instead.
+SDValue DAGTypeLegalizer::ScalarizeVecOp_BinOp(SDNode *N) {
+ assert(N->getValueType(0).getVectorNumElements() == 2 &&
+ "Unexpected vector type!");
+ SDValue L = GetScalarizedVector(N->getOperand(0));
+ SDValue R = GetScalarizedVector(N->getOperand(1));
+ SDValue Op = DAG.getNode(N->getOpcode(), SDLoc(N),
+ N->getValueType(0).getScalarType(), L, R);
+ // Revectorize the result so the types line up with what the uses of this
+ // expression expect.
+ return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Op);
+}
+
/// The vectors to concatenate have length one - use a BUILD_VECTOR instead.
SDValue DAGTypeLegalizer::ScalarizeVecOp_CONCAT_VECTORS(SDNode *N) {
SmallVector<SDValue, 8> Ops(N->getNumOperands());
@@ -1091,14 +1113,14 @@ SDValue DAGTypeLegalizer::ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo) {
return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res);
}
-SDValue DAGTypeLegalizer::ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N,
+SDValue DAGTypeLegalizer::ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N,
unsigned OpNo) {
assert(OpNo == 1 && "Wrong operand for scalarization!");
SDValue Elt = GetScalarizedVector(N->getOperand(1));
- SDValue Res = DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N),
- { N->getValueType(0).getVectorElementType(),
- MVT::Other },
- { N->getOperand(0), Elt, N->getOperand(2) });
+ SDValue Res =
+ DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N),
+ {N->getValueType(0).getVectorElementType(), MVT::Other},
+ {N->getOperand(0), Elt, N->getOperand(2)});
// Legalize the chain result - switch anything that used the old chain to
// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
@@ -1372,6 +1394,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::ADD: case ISD::VP_ADD:
case ISD::SUB: case ISD::VP_SUB:
case ISD::MUL: case ISD::VP_MUL:
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
case ISD::MULHS:
case ISD::MULHU:
case ISD::ABDS:
@@ -3842,16 +3867,16 @@ SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {
InVT.getVectorElementCount());
if (N->isStrictFPOpcode()) {
- Lo = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other },
- { N->getOperand(0), Lo });
- Hi = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other },
- { N->getOperand(0), Hi });
+ Lo = DAG.getNode(N->getOpcode(), dl, {OutVT, MVT::Other},
+ {N->getOperand(0), Lo});
+ Hi = DAG.getNode(N->getOpcode(), dl, {OutVT, MVT::Other},
+ {N->getOperand(0), Hi});
// Build a factor node to remember that this operation is independent
// of the other one.
SDValue Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));
-
+
// Legalize the chain result - switch anything that used the old chain to
// use the new one.
ReplaceValueWith(SDValue(N, 1), Ch);
@@ -4638,13 +4663,13 @@ SDValue DAGTypeLegalizer::SplitVecOp_FP_ROUND(SDNode *N) {
InVT.getVectorElementCount());
if (N->isStrictFPOpcode()) {
- Lo = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other },
- { N->getOperand(0), Lo, N->getOperand(2) });
- Hi = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other },
- { N->getOperand(0), Hi, N->getOperand(2) });
+ Lo = DAG.getNode(N->getOpcode(), DL, {OutVT, MVT::Other},
+ {N->getOperand(0), Lo, N->getOperand(2)});
+ Hi = DAG.getNode(N->getOpcode(), DL, {OutVT, MVT::Other},
+ {N->getOperand(0), Hi, N->getOperand(2)});
// Legalize the chain result - switch anything that used the old chain to
// use the new one.
- SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other,
+ SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other,
Lo.getValue(1), Hi.getValue(1));
ReplaceValueWith(SDValue(N, 1), NewChain);
} else if (N->getOpcode() == ISD::VP_FP_ROUND) {
@@ -4924,6 +4949,9 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
case ISD::SHL: case ISD::VP_SHL:
case ISD::SRA: case ISD::VP_SRA:
case ISD::SRL: case ISD::VP_SRL:
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
case ISD::FMINNUM:
case ISD::FMINNUM_IEEE:
case ISD::VP_FMINNUM:
@@ -5517,7 +5545,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_StrictFP(SDNode *N) {
EOps.push_back(Op);
}
- EVT WidenVT[] = {WidenEltVT, MVT::Other};
+ EVT WidenVT[] = {WidenEltVT, MVT::Other};
SDValue Oper = DAG.getNode(Opcode, dl, WidenVT, EOps);
ConcatOps[ConcatEnd++] = Oper;
Chains.push_back(Oper.getValue(1));
@@ -7064,6 +7092,9 @@ bool DAGTypeLegalizer::WidenVectorOperand(SDNode *N, unsigned OpNo) {
case ISD::LLROUND:
case ISD::LRINT:
case ISD::LLRINT:
+ case ISD::CLMUL:
+ case ISD::CLMULR:
+ case ISD::CLMULH:
Res = WidenVecOp_UnrollVectorOp(N);
break;
case ISD::IS_FPCLASS: Res = WidenVecOp_IS_FPCLASS(N); break;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 8827bff111c22..d587b581e0e8a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7121,6 +7121,34 @@ SDValue SelectionDAG::FoldConstantArithmetic(unsigned Opcode, const SDLoc &DL,
}
}
+ // Handle clmul special cases.
+ if (Opcode == ISD::CLMUL || Opcode == ISD::CLMULR || Opcode == ISD::CLMULH) {
+ assert(VT.isInteger() && "This operator only applies to integer types!");
+ assert(Ops[0].getValueType() == VT && Ops[1].getValueType() == VT &&
+ "CLMUL types must match!");
+
+ auto *C1 = dyn_cast<ConstantSDNode>(Ops[0]);
+ auto *C2 = dyn_cast<ConstantSDNode>(Ops[1]);
+
+ if (C1 && C2) {
+ const APInt &V1 = C1->getAPIntValue();
+ const APInt &V2 = C2->getAPIntValue();
+ APInt Res;
+ switch (Opcode) {
+ case ISD::CLMUL:
+ Res = APIntOps::clmul(V1, V2);
+ break;
+ case ISD::CLMULR:
+ Res = APIntOps::clmulr(V1, V2);
+ break;
+ case ISD::CLMULH:
+ Res = APIntOps::clmulh(V1, V2);
+ break;
+ }
+ return getConstant(Res, DL, VT);
+ }
+ }
+
// Handle fma/fmad special cases.
if (Opcode == ISD::FMA || Opcode == ISD::FMAD || Opcode == ISD::FMULADD) {
assert(VT.isFloatingPoint() && "This operator only applies to FP types!");
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 4f13f3b128ea4..155f014acbbdc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7283,6 +7283,26 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
}
return;
}
+ case Intrinsic::clmul:
+ case Intrinsic::clmulr:
+ case Intrinsic::clmulh: {
+ SDValue Op1 = getValue(I.getArgOperand(0));
+ SDValue Op2 = getValue(I.getArgOperand(1));
+ unsigned Opcode;
+ switch (Intrinsic) {
+ case Intrinsic::clmul:
+ Opcode = ISD::CLMUL;
+ break;
+ case Intrinsic::clmulr:
+ Opcode = ISD::CLMULR;
+ break;
+ case Intrinsic::clmulh:
+ Opcode = ISD::CLMULH;
+ break;
+ }
+ setValue(&I, DAG.getNode(Opcode, sdl, Op1.getValueType(), Op1, Op2));
+ return;
+ }
case Intrinsic::sadd_sat: {
SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDu...
[truncated]
|
🐧 Linux x64 Test Results
|
|
Should the intrinsics be first introduced into llvm.experimental namespace? |
artagnon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the intrinsics be first introduced into llvm.experimental namespace?
I thought we had a new policy to avoid introducing things into the experimental namespace? A quick search on Discourse?
There was a proposal to get rid of that namespace, but I'm not sure it reached a conclusion https://discourse.llvm.org/t/rfc-dont-use-llvm-experimental-intrinsics/85352 |
In line with a std proposal to introduce the llvm.clmul[rh] family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsic works, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
|
is it possible to teach llvm about the distributivity of clmul over xor? it might be a follow-up pr... |
In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics.
Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added.
Ref: https://isocpp.org/files/papers/P3642R3.html