Skip to content

extractps selected too eagerly #3019

@llvmbot

Description

@llvmbot
Bugzilla Link 2647
Resolution FIXED
Resolved on Oct 31, 2008 09:44
Version unspecified
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @sunfishcode,@nlewycky

Extended Description

The following LLVM IR compiles to suboptimal code on x86 CPUs with SSE4 support, but optimizes fine on older CPUs:

external global float, align 16 ; <float*>:0 [#uses=2]

define internal void @""() {
load float* @​0, align 16 ; :1 [#uses=1]
insertelement <4 x float> undef, float %1, i32 0 ; <<4 x float>>:2 [#uses=1]
call <4 x float> @​llvm.x86.sse.rsqrt.ss( <4 x float> %2 ) ; <<4 x float>>:3 [#uses=1]
extractelement <4 x float> %3, i32 0 ; :4 [#uses=1]
store float %4, float* @​0, align 16
ret void
}

declare <4 x float> @​llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone

Here's the result on a Penryn CPU:

push ebp
mov ebp,esp
and esp,0FFFFFFF0h
rsqrtss xmm0,dword ptr ds:[1762ED0h]
extractps eax, xmm0
movd xmm0,eax
movss dword ptr ds:[1762ED0h],xmm0
mov esp,ebp
pop ebp
ret

And this is the lovable code I get on Conroe:

rsqrtss xmm0,dword ptr ds:[1762ED0h]
movss dword ptr ds:[1762ED0h],xmm0
ret

Ignoring the stack setup for now, it looks like extractps is selected too eagerly for an extractelement v4f32, 0.

P.S: To quickly test with and without SSE4 support just force X86SSELevel to the desired value in X86Subtarget::AutoDetectSubtargetFeatures().

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugzillaIssues migrated from bugzilla

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions