-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
backend:X86good first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributemissed-optimization
Description
Vector multiplication by most 8-bit constants is currently implemented by a width extension to 16-bits:
multiplyBy10_clang:
movdqa xmm1, xmm0
punpckhbw xmm1, xmm1
movdqa xmm2, xmmword ptr [rip + .LCPI0_0]
pmullw xmm1, xmm2
movdqa xmm3, xmmword ptr [rip + .LCPI0_1]
pand xmm1, xmm3
punpcklbw xmm0, xmm0
pmullw xmm0, xmm2
pand xmm0, xmm3
packuswb xmm0, xmm1
ret
However, it is often more efficient to instead perform a short sequence of shift-and-adds both in terms of size and dependency length. For example, x * 10 = (x << 3) + (x << 1)
:
multiplyBy10_shiftAndAdd:
movdqa xmm1, xmm0
paddb xmm0, xmm0
psllw xmm1, 3
pand xmm1, xmmword ptr [rip + .LCPI0_0]
paddb xmm0, xmm1
ret
This method is currently implemented, but only for constants that are almost powers of two. Notably, gcc always use this method (although its sequences are often non-optimal).
Metadata
Metadata
Assignees
Labels
backend:X86good first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributemissed-optimization