Skip to content

Commit

Permalink
[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
Browse files Browse the repository at this point in the history
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934
  • Loading branch information
rotateright committed Jan 6, 2018
1 parent b77bc6b commit 5a48aef
Show file tree
Hide file tree
Showing 5 changed files with 511 additions and 589 deletions.
8 changes: 2 additions & 6 deletions llvm/lib/CodeGen/ExpandMemCmp.cpp
Expand Up @@ -564,12 +564,8 @@ Value *MemCmpExpansion::getMemCmpOneBlock() {
// This function expands the memcmp call into an inline expansion and returns
// the memcmp result.
Value *MemCmpExpansion::getMemCmpExpansion() {
// A memcmp with zero-comparison with only one block of load and compare does
// not need to set up any extra blocks. This case could be handled in the DAG,
// but since we have all of the machinery to flexibly expand any memcpy here,
// we choose to handle this case too to avoid fragmented lowering.
if ((!IsUsedForZeroCmp && NumLoadsPerBlockForZeroCmp != 1) ||
getNumBlocks() != 1) {
// Create the basic block framework for a multi-block expansion.
if (getNumBlocks() != 1) {
BasicBlock *StartBlock = CI->getParent();
EndBlock = StartBlock->splitBasicBlock(CI, "endblock");
setupEndBlockPHINodes();
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.h
Expand Up @@ -829,6 +829,11 @@ namespace llvm {
/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.
MVT hasFastEqualityCompare(unsigned NumBits) const override;

/// Allow multiple load pairs per block for smaller and faster code.
unsigned getMemcmpEqZeroLoadsPerBlock() const override {
return 2;
}

/// Return the value type to use for ISD::SETCC.
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;
Expand Down

0 comments on commit 5a48aef

Please sign in to comment.