-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Description
| Bugzilla Link | 2647 |
| Resolution | FIXED |
| Resolved on | Oct 31, 2008 09:44 |
| Version | unspecified |
| OS | Windows NT |
| Reporter | LLVM Bugzilla Contributor |
| CC | @sunfishcode,@nlewycky |
Extended Description
The following LLVM IR compiles to suboptimal code on x86 CPUs with SSE4 support, but optimizes fine on older CPUs:
external global float, align 16 ; <float*>:0 [#uses=2]
define internal void @""() {
load float* @0, align 16 ; :1 [#uses=1]
insertelement <4 x float> undef, float %1, i32 0 ; <<4 x float>>:2 [#uses=1]
call <4 x float> @llvm.x86.sse.rsqrt.ss( <4 x float> %2 ) ; <<4 x float>>:3 [#uses=1]
extractelement <4 x float> %3, i32 0 ; :4 [#uses=1]
store float %4, float* @0, align 16
ret void
}
declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone
Here's the result on a Penryn CPU:
push ebp
mov ebp,esp
and esp,0FFFFFFF0h
rsqrtss xmm0,dword ptr ds:[1762ED0h]
extractps eax, xmm0
movd xmm0,eax
movss dword ptr ds:[1762ED0h],xmm0
mov esp,ebp
pop ebp
ret
And this is the lovable code I get on Conroe:
rsqrtss xmm0,dword ptr ds:[1762ED0h]
movss dword ptr ds:[1762ED0h],xmm0
ret
Ignoring the stack setup for now, it looks like extractps is selected too eagerly for an extractelement v4f32, 0.
P.S: To quickly test with and without SSE4 support just force X86SSELevel to the desired value in X86Subtarget::AutoDetectSubtargetFeatures().