-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
Description
Bugzilla Link | 50197 |
Version | trunk |
OS | All |
Attachments | Sample code |
CC | @nerh,@LebedevRI,@RKSimon,@rotateright |
Extended Description
See attached code, compiled with -O2. The condition inside the while () test is unnecessarily evaluated with a 128 bit shift instruction,
square: # @square
mov rax, rdi
.LBB0_1: # =>This Inner Loop Header: Depth=1
add rax, 1
adc rsi, 0
mov rcx, rsi
shld rcx, rax, 4
mov rdx, rsi
shr rdx, 60
or rdx, rcx
jne .LBB0_1
ret
even though a 64 bit shift instruction suffices. However, changing || to | in the logic condition yields the more efficient code below.
square: # @square
mov rax, rdi
.LBB0_1: # =>This Inner Loop Header: Depth=1
add rax, 1
adc rsi, 0
mov rcx, rax
shr rcx, 60
or rcx, rsi
jne .LBB0_1
ret
Found with clang-10 on Ubuntu 20.04 LTS, verified for clang 10, 11, and trunk using godbolt. Note that gcc -O2 handles both of these cases emitting the more efficient code.