-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Open
Labels
Description
I was trying some examples with https://reviews.llvm.org/D123494 and noticed that AArch64 seems smarter about decomposing shuffle costs via mask:
define void @cross_talk(<8 x i32> %a, <8 x i32> %b) {
%s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>
ret void
}
If we don't care about element order, that can be turned into the much simpler (especially for a 128-bit vector target):
define void @identity_and_splat(<8 x i32> %a, <8 x i32> %b) {
%s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>
ret void
}
That transform happens with AArch64, but that doesn't happen with x86 because:
% opt -mtriple=x86_64 -passes="print<cost-model>" -disable-output shufcost.ll
Printing analysis 'Cost Model Analysis' for function 'cross_talk':
Cost Model: Found an estimated cost of 12 for instruction: %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>
Printing analysis 'Cost Model Analysis' for function 'identity_and_splat':
Cost Model: Found an estimated cost of 12 for instruction: %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>