Skip to content

[AArch64] Use STR instruction when storing lane/byte 0 of vector #54737

@ilinpv

Description

@ilinpv
typedef char v8qi __attribute__ ((vector_size (8)));
typedef char v16qi __attribute__ ((vector_size (16)));

void store_lane_v8qi (v8qi x, char *y) { y[0] = x[8 - 1 - 0]; y[3] = x[0]; }
void store_lane_v16qi (v16qi x, char *y) { y[0] = x[16 - 1 - 0]; y[3] = x[0]; }

Currently with -O2 AArch64 LLVM generates:

store_lane_v8qi:                        // @store_lane_v8qi
        add     x8, x0, #3
        st1     { v0.b }[7], [x0]
        st1     { v0.b }[0], [x8]
        ret
store_lane_v16qi:                       // @store_lane_v16qi
        add     x8, x0, #3
        st1     { v0.b }[15], [x0]
        st1     { v0.b }[0], [x8]
        ret

When storing lane 0 we can use the STR instruction that has more flexible addressing modes. LLVM already does the right thing for wider types, but fails to do it for vectors of bytes. So we can get better code which GCC has already managed to produce:

store_lane_v8qi:
        st1     {v0.b}[7], [x0]
        str     b0, [x0, 3]
        ret
store_lane_v16qi:
        st1     {v0.b}[15], [x0]
        str     b0, [x0, 3]
        ret

https://godbolt.org/z/oz3qxfeTP

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions