Skip to content

[X86] 8-bit vector multiplication should use shift and add method for more constants #164200

@WalterKruger

Description

@WalterKruger

Vector multiplication by most 8-bit constants is currently implemented by a width extension to 16-bits:

multiplyBy10_clang:
        movdqa  xmm1, xmm0
        punpckhbw       xmm1, xmm1
        movdqa  xmm2, xmmword ptr [rip + .LCPI0_0]
        pmullw  xmm1, xmm2
        movdqa  xmm3, xmmword ptr [rip + .LCPI0_1]
        pand    xmm1, xmm3
        punpcklbw       xmm0, xmm0
        pmullw  xmm0, xmm2
        pand    xmm0, xmm3
        packuswb        xmm0, xmm1
        ret

However, it is often more efficient to instead perform a short sequence of shift-and-adds both in terms of size and dependency length. For example, x * 10 = (x << 3) + (x << 1):

multiplyBy10_shiftAndAdd:
        movdqa  xmm1, xmm0
        paddb   xmm0, xmm0
        psllw   xmm1, 3
        pand    xmm1, xmmword ptr [rip + .LCPI0_0]
        paddb   xmm0, xmm1
        ret

This method is currently implemented, but only for constants that are almost powers of two. Notably, gcc always use this method (although its sequences are often non-optimal).

https://godbolt.org/z/naKxr6z6a

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions