-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
| Bugzilla Link | 9070 |
| Resolution | INVALID |
| Resolved on | Sep 21, 2011 15:23 |
| Version | trunk |
| OS | Windows XP |
| Attachments | Reproducer test |
| CC | @bcardosolopes |
Extended Description
Running llc on the following test (also attached) gives incorrect generated code:
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i686-pc-win32"
define void @test1(<8 x i32>* %source, <2 x i32>* %dest) nounwind {
%a149 = getelementptr inbounds <8 x i32>* %source
%a150 = load <8 x i32>* %a149, align 32
%a151 = shufflevector <8 x i32> %a150, <8 x i32> undef, <2 x i32> <i32 0, i32 5>
%a152 = shufflevector <2 x i32> %a151, <2 x i32> undef, <2 x i32> <i32 1, i32 0>
%a153 = getelementptr inbounds <2 x i32>* %dest
store <2 x i32> %a152, <2 x i32>* %a153, align 8
ret void
}
The test reads an <8 x i32> source vector from memory and writes a <2 x i32> dest vector to memory.
The two shuffles do:
temp.0 = source.0
temp.1 = source.5
dest.0 = temp.1
dest.1 = temp.0
Which is equivalent to:
dest.0 = source.5
dest.1 = source.0
Output:
llc < test-repro.ll
.def _test1;
.scl 2;
.type 32;
.endef
.text
.globl _test1
.align 16, 0x90
_test1: # @test1
BB#0:
movl 4(%esp), %eax
movaps 16(%eax), %xmm0
movlps (%eax), %xmm0
pshufd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,0]
movl 8(%esp), %eax
pextrd $1, %xmm0, 4(%eax)
movd %xmm0, (%eax)
ret
After the 'movaps':
XMM0 = [source.4 source.5 source.6 source.7]
After the 'movlps':
XMM0 = [source.0 source.1 source.6 source.7]
After the 'pshufd':
XMM0 = [source.1 source.0 source.0 source.0]
The 'pextrd' writes:
dest.1 = source.0
The 'movd' writes:
dest.0 = source.1 <== This is not correct see explanation of test above.
Removing the following pattern from X86InstrSSE.td gives correct (but inefficient) code:
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;