Skip to content

[x86] improve cost model for oversized shuffles #55170

@rotateright

Description

@rotateright

I was trying some examples with https://reviews.llvm.org/D123494 and noticed that AArch64 seems smarter about decomposing shuffle costs via mask:

define void @cross_talk(<8 x i32> %a, <8 x i32> %b) {
  %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>
  ret void
}

If we don't care about element order, that can be turned into the much simpler (especially for a 128-bit vector target):

define void @identity_and_splat(<8 x i32> %a, <8 x i32> %b) {
  %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>
  ret void
}

That transform happens with AArch64, but that doesn't happen with x86 because:

% opt -mtriple=x86_64 -passes="print<cost-model>" -disable-output shufcost.ll 
Printing analysis 'Cost Model Analysis' for function 'cross_talk':
Cost Model: Found an estimated cost of 12 for instruction:   %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>

Printing analysis 'Cost Model Analysis' for function 'identity_and_splat':
Cost Model: Found an estimated cost of 12 for instruction:   %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions