Skip to content

Commit

Permalink
[X86][SSE] Support variable-index float/double vector insertion on SS…
Browse files Browse the repository at this point in the history
…E41+ targets (PR47924)

Extends D95779 to permit insertion into float/doubles vectors while avoiding a lot of aliased memory traffic.

The scalar value is already on the simd unit, so we only need to transfer and splat the index value, then perform the select.

SSE4 codegen is a little bulky due to the tied register requirements of (non-VEX) BLENDPS/PD but the extra moves are cheap so shouldn't be an actual problem.

Differential Revision: https://reviews.llvm.org/D95866
  • Loading branch information
RKSimon committed Feb 3, 2021
1 parent 7a45f27 commit 32b7c2f
Show file tree
Hide file tree
Showing 2 changed files with 239 additions and 128 deletions.
6 changes: 4 additions & 2 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Expand Up @@ -18824,8 +18824,10 @@ SDValue X86TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,
if (!N2C) {
// Variable insertion indices, usually we're better off spilling to stack,
// but AVX512 can use a variable compare+select by comparing against all
// possible vector indices.
if (!(Subtarget.hasBWI() || (Subtarget.hasAVX512() && EltSizeInBits >= 32)))
// possible vector indices, and FP insertion has less gpr->simd traffic.
if (!(Subtarget.hasBWI() ||
(Subtarget.hasAVX512() && EltSizeInBits >= 32) ||
(Subtarget.hasSSE41() && VT.isFloatingPoint())))
return SDValue();

MVT IdxSVT = MVT::getIntegerVT(EltSizeInBits);
Expand Down

0 comments on commit 32b7c2f

Please sign in to comment.