Skip to content

128 bit arithmetic --- inefficient logic test #49541

@sqrmax

Description

@sqrmax
Bugzilla Link 50197
Version trunk
OS All
Attachments Sample code
CC @nerh,@LebedevRI,@RKSimon,@rotateright

Extended Description

See attached code, compiled with -O2. The condition inside the while () test is unnecessarily evaluated with a 128 bit shift instruction,

square:                                 # @square
        mov     rax, rdi
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        add     rax, 1
        adc     rsi, 0
        mov     rcx, rsi
        shld    rcx, rax, 4
        mov     rdx, rsi
        shr     rdx, 60
        or      rdx, rcx
        jne     .LBB0_1
        ret

even though a 64 bit shift instruction suffices. However, changing || to | in the logic condition yields the more efficient code below.

square:                                 # @square
        mov     rax, rdi
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        add     rax, 1
        adc     rsi, 0
        mov     rcx, rax
        shr     rcx, 60
        or      rcx, rsi
        jne     .LBB0_1
        ret

Found with clang-10 on Ubuntu 20.04 LTS, verified for clang 10, 11, and trunk using godbolt. Note that gcc -O2 handles both of these cases emitting the more efficient code.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions