Skip to content

shufflevector %x, <every odd byte> should optimize to lshr %x, @splat(8)+trunc #111611

@Validark

Description

@Validark

Zig version (Godbolt link):

const std = @import("std");

export fn foo(v: @Vector(32, u16)) @Vector(32, u8) {
    return std.simd.deinterlace(2, @as(@Vector(64, u8), @bitCast(v)))[1];
}

export fn bar(v: @Vector(32, u16)) @Vector(32, u8) {
    return std.simd.deinterlace(2, @as(@Vector(64, u8), @bitCast(v >> @splat(8))))[0];
}

LLVM version (optimized):

define dso_local <32 x i8> @foo(<32 x i16> %0) local_unnamed_addr {
Entry:
  %1 = bitcast <32 x i16> %0 to <64 x i8>
  %2 = shufflevector <64 x i8> %1, <64 x i8> poison, <32 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 33, i32 35, i32 37, i32 39, i32 41, i32 43, i32 45, i32 47, i32 49, i32 51, i32 53, i32 55, i32 57, i32 59, i32 61, i32 63>
  ret <32 x i8> %2
}

define dso_local <32 x i8> @bar(<32 x i16> %0) local_unnamed_addr {
Entry:
  %1 = lshr <32 x i16> %0, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
  %2 = trunc nuw <32 x i16> %1 to <32 x i8>
  ret <32 x i8> %2
}

Assembly difference (Znver4):

.LCPI0_0:
        .byte   1
        .byte   3
        .byte   5
        .byte   7
        .byte   9
        .byte   11
        .byte   13
        .byte   15
        .byte   17
        .byte   19
        .byte   21
        .byte   23
        .byte   25
        .byte   27
        .byte   29
        .byte   31
        .byte   33
        .byte   35
        .byte   37
        .byte   39
        .byte   41
        .byte   43
        .byte   45
        .byte   47
        .byte   49
        .byte   51
        .byte   53
        .byte   55
        .byte   57
        .byte   59
        .byte   61
        .byte   63
foo:
        vmovdqa ymm1, ymmword ptr [rip + .LCPI0_0]
        vpermb  zmm0, zmm1, zmm0
        ret

bar:
        vpsrlw  zmm0, zmm0, 8
        vpmovwb ymm0, zmm0
        ret

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions