Skip to content

Commit

Permalink
[AArch64] Add an all-in-one histogram intrinsic
Browse files Browse the repository at this point in the history
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
  • Loading branch information
huntergr-arm committed May 13, 2024
1 parent 7eeccc1 commit fbb37e9
Show file tree
Hide file tree
Showing 17 changed files with 523 additions and 0 deletions.
54 changes: 54 additions & 0 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19143,6 +19143,60 @@ will be on any later loop iteration.
This intrinsic will only return 0 if the input count is also 0. A non-zero input
count will produce a non-zero result.

'``llvm.experimental.vector.histogram.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These intrinsics are overloaded.

These intrinsics represent histogram-like operations; that is, updating values
in memory that may not be contiguous, and where multiple elements within a
single vector may be updating the same value in memory.

The update operation must be specified as part of the intrinsic name. For a
simple histogram like the following the ``add`` operation would be used.

.. code-block:: c

void simple_histogram(int *restrict buckets, unsigned *indices, int N, int inc) {
for (int i = 0; i < N; ++i)
buckets[indices[i]] += inc;
}

More update operation types may be added in the future.

::

declare <8 x i32> @llvm.experimental.vector.histogram.add.v8p0.i32(<8 x ptr> %ptrs, i32 %inc, <8 x i1> %mask)
declare <vscale x 2 x i64> @llvm.experimental.vector.histogram.add.nxv2p0.i64(<vscale x 2 x ptr> %ptrs, i64 %inc, <vscale x 2 x i1> %mask)

Arguments:
""""""""""

The first argument is a vector of pointers to the memory locations to be
updated. The second argument is a scalar used to update the value from
memory; it must match the type of value to be updated. The final argument
is a mask value to exclude locations from being modified.

Semantics:
""""""""""

The '``llvm.experimental.vector.histogram.*``' intrinsics are used to perform
updates on potentially overlapping values in memory. The intrinsics represent
the follow sequence of operations:

1. Gather load from the ``ptrs`` operand, with element type matching that of
the ``inc`` operand.
2. Update of the values loaded from memory. In the case of the ``add``
update operation, this means:

1. Perform a cross-vector histogram operation on the ``ptrs`` operand.
2. Multiply the result by the ``inc`` operand.
3. Add the result to the values loaded from memory
3. Scatter the result of the update operation to the memory locations from
the ``ptrs`` operand.

The ``mask`` operand will apply to at least the gather and scatter operations.

Matrix Intrinsics
-----------------

Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,9 @@ class TargetTransformInfo {
/// Return true if the target supports strided load.
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) const;

// Return true if the target supports masked vector histograms.
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const;

/// Return true if this is an alternating opcode pattern that can be lowered
/// to a single instruction on the target. In X86 this is for the addsub
/// instruction which corrsponds to a Shuffle + Fadd + FSub pattern in IR.
Expand Down Expand Up @@ -1883,6 +1886,7 @@ class TargetTransformInfo::Concept {
virtual bool isLegalMaskedCompressStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedExpandLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalStridedLoadStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) = 0;
virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,
const SmallBitVector &OpcodeMask) const = 0;
Expand Down Expand Up @@ -2386,6 +2390,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
bool isLegalStridedLoadStore(Type *DataType, Align Alignment) override {
return Impl.isLegalStridedLoadStore(DataType, Alignment);
}
bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) override {
return Impl.isLegalMaskedVectorHistogram(AddrType, DataType);
}
bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector &OpcodeMask) const override {
return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,10 @@ class TargetTransformInfoImplBase {
return false;
}

bool isLegalMaskedVectorHistogram(Type *AddrType, Type *DataType) const {
return false;
}

bool enableOrderedReductions() const { return false; }

bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }
Expand Down
5 changes: 5 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1402,6 +1402,11 @@ enum NodeType {
// which is later translated to an implicit use in the MIR.
CONVERGENCECTRL_GLUE,

// Experimental vector histogram intrinsic
// Operands: Input Chain, Inc, Mask, Base, Index, Scale, ID
// Output: Output Chain
EXPERIMENTAL_VECTOR_HISTOGRAM,

/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAG.h
Original file line number Diff line number Diff line change
Expand Up @@ -1526,6 +1526,9 @@ class SelectionDAG {
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
ISD::MemIndexType IndexType,
bool IsTruncating = false);
SDValue getMaskedHistogram(SDVTList VTs, EVT MemVT, const SDLoc &dl,
ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
ISD::MemIndexType IndexType);

SDValue getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr, EVT MemVT,
MachineMemOperand *MMO);
Expand Down
33 changes: 33 additions & 0 deletions llvm/include/llvm/CodeGen/SelectionDAGNodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,7 @@ BEGIN_TWO_BYTE_PACK()
friend class MaskedLoadStoreSDNode;
friend class MaskedGatherScatterSDNode;
friend class VPGatherScatterSDNode;
friend class MaskedHistogramSDNode;

uint16_t : NumMemSDNodeBits;

Expand All @@ -552,6 +553,7 @@ BEGIN_TWO_BYTE_PACK()
// MaskedLoadStoreBaseSDNode => enum ISD::MemIndexedMode
// VPGatherScatterSDNode => enum ISD::MemIndexType
// MaskedGatherScatterSDNode => enum ISD::MemIndexType
// MaskedHistogramSDNode => enum ISD::MemIndexType
uint16_t AddressingMode : 3;
};
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };
Expand All @@ -564,6 +566,7 @@ BEGIN_TWO_BYTE_PACK()
friend class MaskedLoadSDNode;
friend class MaskedGatherSDNode;
friend class VPGatherSDNode;
friend class MaskedHistogramSDNode;

uint16_t : NumLSBaseSDNodeBits;

Expand Down Expand Up @@ -1420,6 +1423,7 @@ class MemSDNode : public SDNode {
return getOperand(2);
case ISD::MGATHER:
case ISD::MSCATTER:
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return getOperand(3);
default:
return getOperand(1);
Expand Down Expand Up @@ -1468,6 +1472,7 @@ class MemSDNode : public SDNode {
case ISD::EXPERIMENTAL_VP_STRIDED_STORE:
case ISD::GET_FPENV_MEM:
case ISD::SET_FPENV_MEM:
case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return true;
default:
return N->isMemIntrinsic() || N->isTargetMemoryOpcode();
Expand Down Expand Up @@ -2953,6 +2958,34 @@ class MaskedScatterSDNode : public MaskedGatherScatterSDNode {
}
};

class MaskedHistogramSDNode : public MemSDNode {
public:
friend class SelectionDAG;

MaskedHistogramSDNode(unsigned Order, const DebugLoc &DL, SDVTList VTs,
EVT MemVT, MachineMemOperand *MMO,
ISD::MemIndexType IndexType)
: MemSDNode(ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, Order, DL, VTs, MemVT,
MMO) {
LSBaseSDNodeBits.AddressingMode = IndexType;
}

ISD::MemIndexType getIndexType() const {
return static_cast<ISD::MemIndexType>(LSBaseSDNodeBits.AddressingMode);
}

const SDValue &getBasePtr() const { return getOperand(3); }
const SDValue &getIndex() const { return getOperand(4); }
const SDValue &getMask() const { return getOperand(2); }
const SDValue &getScale() const { return getOperand(5); }
const SDValue &getInc() const { return getOperand(1); }
const SDValue &getIntID() const { return getOperand(6); }

static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::EXPERIMENTAL_VECTOR_HISTOGRAM;
}
};

class FPStateAccessSDNode : public MemSDNode {
public:
friend class SelectionDAG;
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/IR/Intrinsics.td
Original file line number Diff line number Diff line change
Expand Up @@ -1856,6 +1856,13 @@ def int_experimental_vp_strided_load : DefaultAttrsIntrinsic<[llvm_anyvector_ty
llvm_i32_ty],
[ NoCapture<ArgIndex<0>>, IntrNoSync, IntrReadMem, IntrWillReturn, IntrArgMemOnly ]>;

// Experimental histogram
def int_experimental_vector_histogram_add : DefaultAttrsIntrinsic<[],
[ llvm_anyvector_ty, // Vector of pointers
llvm_anyint_ty, // Increment
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>], // Mask
[ IntrArgMemOnly ]>;

// Operators
let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
// Integer arithmetic
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Analysis/TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,11 @@ bool TargetTransformInfo::isLegalStridedLoadStore(Type *DataType,
return TTIImpl->isLegalStridedLoadStore(DataType, Alignment);
}

bool TargetTransformInfo::isLegalMaskedVectorHistogram(Type *AddrType,
Type *DataType) const {
return TTIImpl->isLegalMaskedVectorHistogram(AddrType, DataType);
}

bool TargetTransformInfo::enableOrderedReductions() const {
return TTIImpl->enableOrderedReductions();
}
Expand Down
38 changes: 38 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9633,6 +9633,44 @@ SDValue SelectionDAG::getMaskedScatter(SDVTList VTs, EVT MemVT, const SDLoc &dl,
return V;
}

SDValue SelectionDAG::getMaskedHistogram(SDVTList VTs, EVT MemVT,
const SDLoc &dl, ArrayRef<SDValue> Ops,
MachineMemOperand *MMO,
ISD::MemIndexType IndexType) {
assert(Ops.size() == 7 && "Incompatible number of operands");

FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::EXPERIMENTAL_VECTOR_HISTOGRAM, VTs, Ops);
ID.AddInteger(MemVT.getRawBits());
ID.AddInteger(getSyntheticNodeSubclassData<MaskedHistogramSDNode>(
dl.getIROrder(), VTs, MemVT, MMO, IndexType));
ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
ID.AddInteger(MMO->getFlags());
void *IP = nullptr;
if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
cast<MaskedGatherSDNode>(E)->refineAlignment(MMO);
return SDValue(E, 0);
}

auto *N = newSDNode<MaskedHistogramSDNode>(dl.getIROrder(), dl.getDebugLoc(),
VTs, MemVT, MMO, IndexType);
createOperands(N, Ops);

assert(N->getMask().getValueType().getVectorElementCount() ==
N->getIndex().getValueType().getVectorElementCount() &&
"Vector width mismatch between mask and data");
assert(isa<ConstantSDNode>(N->getScale()) &&
N->getScale()->getAsAPIntVal().isPowerOf2() &&
"Scale should be a constant power of 2");
assert(N->getInc().getValueType().isInteger() && "Non integer update value");

CSEMap.InsertNode(N, IP);
InsertNode(N);
SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;
}

SDValue SelectionDAG::getGetFPEnv(SDValue Chain, const SDLoc &dl, SDValue Ptr,
EVT MemVT, MachineMemOperand *MMO) {
assert(Chain.getValueType() == MVT::Other && "Invalid chain type");
Expand Down
63 changes: 63 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6281,6 +6281,64 @@ void SelectionDAGBuilder::visitConvergenceControl(const CallInst &I,
}
}

void SelectionDAGBuilder::visitVectorHistogram(const CallInst &I,
unsigned IntrinsicID) {
// For now, we're only lowering an 'add' histogram.
// We can add others later, e.g. saturating adds, min/max.
assert(IntrinsicID == Intrinsic::experimental_vector_histogram_add &&
"Tried to lower unsupported histogram type");
SDLoc sdl = getCurSDLoc();
Value *Ptr = I.getOperand(0);
SDValue Inc = getValue(I.getOperand(1));
SDValue Mask = getValue(I.getOperand(2));

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
DataLayout TargetDL = DAG.getDataLayout();
EVT VT = Inc.getValueType();
Align Alignment = DAG.getEVTAlign(VT);

const MDNode *Ranges = getRangeMetadata(I);

SDValue Root = DAG.getRoot();
SDValue Base;
SDValue Index;
ISD::MemIndexType IndexType;
SDValue Scale;
bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
I.getParent(), VT.getScalarStoreSize());

unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(AS),
MachineMemOperand::MOLoad | MachineMemOperand::MOStore,
MemoryLocation::UnknownSize, Alignment, I.getAAMetadata(), Ranges);

if (!UniformBase) {
Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
Index = getValue(Ptr);
IndexType = ISD::SIGNED_SCALED;
Scale =
DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
}

EVT IdxVT = Index.getValueType();
EVT EltTy = IdxVT.getVectorElementType();
if (TLI.shouldExtendGSIndex(IdxVT, EltTy)) {
EVT NewIdxVT = IdxVT.changeVectorElementType(EltTy);
Index = DAG.getNode(ISD::SIGN_EXTEND, sdl, NewIdxVT, Index);
}

SDValue ID = DAG.getTargetConstant(IntrinsicID, sdl, MVT::i32);

SDValue Ops[] = {Root, Inc, Mask, Base, Index, Scale, ID};
SDValue Histogram = DAG.getMaskedHistogram(DAG.getVTList(MVT::Other), VT, sdl,
Ops, MMO, IndexType);

setValue(&I, Histogram);
DAG.setRoot(Histogram);
}

/// Lower the call to the specified intrinsic function.
void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
unsigned Intrinsic) {
Expand Down Expand Up @@ -7948,6 +8006,11 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::experimental_convergence_entry:
case Intrinsic::experimental_convergence_loop:
visitConvergenceControl(I, Intrinsic);
return;
case Intrinsic::experimental_vector_histogram_add: {
visitVectorHistogram(I, Intrinsic);
return;
}
}
}

Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,7 @@ class SelectionDAGBuilder {
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
void visitConvergenceControl(const CallInst &I, unsigned Intrinsic);
void visitVectorHistogram(const CallInst &I, unsigned IntrinsicID);
void visitVPLoad(const VPIntrinsic &VPIntrin, EVT VT,
const SmallVectorImpl<SDValue> &OpValues);
void visitVPStore(const VPIntrinsic &VPIntrin,
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::PATCHPOINT:
return "patchpoint";

case ISD::EXPERIMENTAL_VECTOR_HISTOGRAM:
return "histogram";

// Vector Predication
#define BEGIN_REGISTER_VP_SDNODE(SDID, LEGALARG, NAME, ...) \
case ISD::SDID: \
Expand Down
Loading

0 comments on commit fbb37e9

Please sign in to comment.