Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion llvm/lib/Transforms/Vectorize/VectorCombine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3900,7 +3900,8 @@ bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
unsigned ElementSize = VT->getElementType()->getPrimitiveSizeInBits();
unsigned MaxVectorSize =
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
unsigned MaxElementsInVector = MaxVectorSize / ElementSize;
unsigned MaxElementsInVector =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we early exit if MaxElementsInVector <= 1? The trivial case isn't profitable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 an early-out seems the better approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I'd completely forgotten about this patch - I've raised #157430 with an early-out to handle a separate reported issue with the same cause.

std::max<unsigned>(1, MaxVectorSize / ElementSize);
// When there are multiple shufflevector operations on the same input,
// especially when the vector length is larger than the register size,
// identical shuffle patterns may occur across different groups of elements.
Expand Down
21 changes: 21 additions & 0 deletions llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
; RUN: opt -passes=vector-combine -mtriple=nvptx-- -S < %s | FileCheck %s


define ptx_kernel void @shuffle_ptx_i64() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this test into the NVPTX subdir. I cannot reproduce the issue with other targets (e.g., -mtriple=x86_64).

llvm/test/Transforms/VectorCombine
  NVPTX/
    fold-select-shuffle.ll
    lit.local.cfg

Content of NVPTX/lit.local.cfg:

if not "NVPTX" in config.root.targets:
    config.unsupported = True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For info: I posted an x86 reproducer in #157401 (comment)

; CHECK-LABEL: define ptx_kernel void @shuffle_ptx_i64() {
; CHECK-NEXT: [[_LR_PH:.*:]]
; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP2:%.*]] = or <8 x i64> [[TMP0]], [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = shl <8 x i64> [[TMP0]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i64> [[TMP2]], <8 x i64> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret void
;
.lr.ph:
%0 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
%1 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
%2 = or <8 x i64> %0, %1
%3 = shl <8 x i64> %0, %1
%4 = shufflevector <8 x i64> %2, <8 x i64> %3, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret void
}
Loading