-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Description
| Bugzilla Link | 33914 |
| Resolution | FIXED |
| Resolved on | Aug 02, 2017 05:23 |
| Version | trunk |
| OS | Windows NT |
| Blocks | #33196 |
| CC | @topperc,@zmodem,@RKSimon,@rnk,@rotateright |
Extended Description
We observed a 13% degradation in an internal benchmark after commit 308322.
The minimal reproducer:
define i32 @foo(i8* %A, i8* %B) {
%res = call i32 @memcmp(i8* %A, i8* %B, i64 24)
ret i32 %res
}
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64) local_unnamed_addr #5
Before the commit the call to memcmp was lowered to a call to glibc's memcmp which was dispatched to __memcmp_sse4_1. The hot code in __memcmp_sse4_1 was doing 1-XMM-load pair+ptest+jcc+8-byte-load-pair+cmp+jcc:
...
│ movdqu -0x18(%rdi),%xmm2
│ movdqu -0x18(%rsi),%xmm1
11.11 │ pxor %xmm1,%xmm2
7.41 │ ptest %xmm2,%xmm0
14.81 │ ↓ jae 15e8
14.81 │ mov -0x8(%rsi),%rcx
│ mov -0x8(%rdi),%rax
│ cmp %rax,%rcx
│ ↓ jne 1603
│ xor %eax,%eax
11.11 │ ← retq
...
After the commit the memcmp is expanded inline to three 8-byte-load-pairs+cmp+jcc's:
...
# BB#0: # %loadbb
movbeq (%rdi), %rcx
movbeq (%rsi), %rdx
cmpq %rdx, %rcx
jne .LBB0_1
BB#2: # %loadbb1
movbeq 8(%rdi), %rcx
movbeq 8(%rsi), %rdx
cmpq %rdx, %rcx
jne .LBB0_1
BB#3: # %loadbb2
movbeq 16(%rdi), %rcx
movbeq 16(%rsi), %rdx
xorl %eax, %eax
cmpq %rdx, %rcx
jne .LBB0_1
BB#4: # %endblock
retq
.LBB0_1: # %res_block
cmpq %rdx, %rcx
movl $-1, %ecx
movl $1, %eax
cmovbl %ecx, %eax
retq
...
Options for fixing:
-
Improve the inline expansion to generate a similar sequence to glibc's: 1 16-byte pair load + ptest + jcc + 8-byte load + cmp + jmp
-
call libc's memcmp
I would like to request this commit be reverted until we get this issue fixed. Thanks.