-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Open
Labels
Description
Bugzilla Link | 11023 |
Version | trunk |
OS | Linux |
Reporter | LLVM Bugzilla Contributor |
CC | @bcardosolopes |
Extended Description
I'm compiling LLVM IR code like this on x86-64:
define linkonce ccc <16 x float> @vector_add_float(<16 x float> %a.78, <16 x float> %a.79) align 8
{
entry:
%result.80 = fadd <16 x float> %a.78, %a.79
ret <18 x float> %result.80
}
This works really well when the vector length (16 in the above) is an integer multiple of the SSE vector register width (4) resulting in the following assembler code:
vector_add_float: # @vector_add_float
.Leh_func_begin0:
# BB#0: # %entry
addps %xmm4, %xmm0
addps %xmm5, %xmm1
addps %xmm6, %xmm2
addps %xmm7, %xmm3
ret
However, when the vector length is increased to say 18, the generated code is rather poor, or rather is code that could easily be improved by hand.