Description
Bugzilla Link | 2108 |
Resolution | FIXED |
Resolved on | Apr 10, 2008 00:38 |
Version | unspecified |
OS | Linux |
CC | @asl |
Extended Description
Testcase:
#include <xmmintrin.h>
__m128i doload64(unsigned long long x) { return _mm_loadl_epi64(&x);}
Generated il:
define <2 x i64> @doload64(i64 %x) nounwind {
entry:
%tmp717 = bitcast i64 %x to double
%tmp8 = insertelement <2 x double> undef, double %tmp717, i32 0
%tmp9 = insertelement <2 x double> %tmp8, double 0.000000e+00, i32 1
%tmp11 = bitcast <2 x double> %tmp9 to <2 x i64>
ret <2 x i64> %tmp11
}
On x86, the this codegens to the following:
doload64:
subl $12, %esp
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
movsd (%esp), %xmm0
addl $12, %esp
ret
which is 6 instructions longer than it needs to be. The code generator should really be smart enough to load from the original stack slot.