Skip to content

Commit

Permalink
[SLP] Vectorize jumbled memory loads.
Browse files Browse the repository at this point in the history
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Subscribers: mzolotukhin

Reviewed By: ayal

Differential Revision: https://reviews.llvm.org/D36130

Review comments updated accordingly

Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0

Added a TODO for sortLoadAccesses API

Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58

Modified the TODO for sortLoadAccesses API

Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565

Review comment update for using OpdNum to insert the mask in respective location

Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce

Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase

Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
  • Loading branch information
Mohammad Shahid authored and Mohammad Shahid committed Sep 20, 2017
1 parent 9aaaeb3 commit 2b281de
Show file tree
Hide file tree
Showing 7 changed files with 370 additions and 135 deletions.
15 changes: 15 additions & 0 deletions llvm/include/llvm/Analysis/LoopAccessAnalysis.h
Expand Up @@ -667,6 +667,21 @@ int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr, const Loop *Lp,
const ValueToValueMap &StridesMap = ValueToValueMap(),
bool Assume = false, bool ShouldCheckWrap = true);

/// \brief Attempt to sort the 'loads' in \p VL and return the sorted values in
/// \p Sorted.
///
/// Returns 'false' if sorting is not legal or feasible, otherwise returns
/// 'true'. If \p Mask is not null, it also returns the \p Mask which is the
/// shuffle mask for actual memory access order.
///
/// For example, for a given VL of memory accesses in program order, a[i+2],
/// a[i+0], a[i+1] and a[i+3], this function will sort the VL and save the
/// sorted value in 'Sorted' as a[i+0], a[i+1], a[i+2], a[i+3] and saves the
/// mask for actual memory accesses in program order in 'Mask' as <2,0,1,3>
bool sortLoadAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
ScalarEvolution &SE, SmallVectorImpl<Value *> &Sorted,
SmallVectorImpl<unsigned> *Mask = nullptr);

/// \brief Returns true if the memory operations \p A and \p B are consecutive.
/// This is a simple API that does not depend on the analysis pass.
bool isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
Expand Down
71 changes: 71 additions & 0 deletions llvm/lib/Analysis/LoopAccessAnalysis.cpp
Expand Up @@ -1107,6 +1107,77 @@ static unsigned getAddressSpaceOperand(Value *I) {
return -1;
}

// TODO:This API can be improved by using the permutation of given width as the
// accesses are entered into the map.
bool llvm::sortLoadAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
ScalarEvolution &SE,
SmallVectorImpl<Value *> &Sorted,
SmallVectorImpl<unsigned> *Mask) {
SmallVector<std::pair<int64_t, Value *>, 4> OffValPairs;
OffValPairs.reserve(VL.size());
Sorted.reserve(VL.size());

// Walk over the pointers, and map each of them to an offset relative to
// first pointer in the array.
Value *Ptr0 = getPointerOperand(VL[0]);
const SCEV *Scev0 = SE.getSCEV(Ptr0);
Value *Obj0 = GetUnderlyingObject(Ptr0, DL);
PointerType *PtrTy = dyn_cast<PointerType>(Ptr0->getType());
uint64_t Size = DL.getTypeAllocSize(PtrTy->getElementType());

for (auto *Val : VL) {
// The only kind of access we care about here is load.
if (!isa<LoadInst>(Val))
return false;

Value *Ptr = getPointerOperand(Val);
assert(Ptr && "Expected value to have a pointer operand.");
// If a pointer refers to a different underlying object, bail - the
// pointers are by definition incomparable.
Value *CurrObj = GetUnderlyingObject(Ptr, DL);
if (CurrObj != Obj0)
return false;

const SCEVConstant *Diff =
dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
// The pointers may not have a constant offset from each other, or SCEV
// may just not be smart enough to figure out they do. Regardless,
// there's nothing we can do.
if (!Diff || static_cast<unsigned>(Diff->getAPInt().abs().getSExtValue()) >
(VL.size() - 1) * Size)
return false;

OffValPairs.emplace_back(Diff->getAPInt().getSExtValue(), Val);
}
SmallVector<unsigned, 4> UseOrder(VL.size());
for (unsigned i = 0; i < VL.size(); i++) {
UseOrder[i] = i;
}

// Sort the memory accesses and keep the order of their uses in UseOrder.
std::sort(UseOrder.begin(), UseOrder.end(),
[&OffValPairs](unsigned Left, unsigned Right) {
return OffValPairs[Left].first < OffValPairs[Right].first;
});

for (unsigned i = 0; i < VL.size(); i++)
Sorted.emplace_back(OffValPairs[UseOrder[i]].second);

// Sort UseOrder to compute the Mask.
if (Mask) {
Mask->reserve(VL.size());
for (unsigned i = 0; i < VL.size(); i++)
Mask->emplace_back(i);
std::sort(Mask->begin(), Mask->end(),
[&UseOrder](unsigned Left, unsigned Right) {
return UseOrder[Left] < UseOrder[Right];
});
}

return true;
}


/// Returns true if the memory operations \p A and \p B are consecutive.
bool llvm::isConsecutiveAccess(Value *A, Value *B, const DataLayout &DL,
ScalarEvolution &SE, bool CheckType) {
Expand Down

0 comments on commit 2b281de

Please sign in to comment.