Skip to content

[X64] Floating-point multiplication can get "optimized" into integer multiplication even though it's inefficient #162749

@zeux

Description

@zeux

Given code like this (extracted out of a larger example with similar flow):

__m128 square(__m128i data) {
    __m128i y = _mm_srai_epi32(data, 16);
    __m128i x = _mm_or_si128(y, _mm_set1_epi32(3)); 
    __m128 v = _mm_cvtepi32_ps(x);
    return _mm_mul_ps(v, v);
}

And targeting SSE2, I would expect a more or less straightforward 1-1 lowering into SSE2 instructions, modulo _mm_set1_epi32 which has a couple different options. Indeed, GCC generates this:

        pcmpeqd xmm1, xmm1
        psrad   xmm0, 16
        psrld   xmm1, 30
        por     xmm0, xmm1
        cvtdq2ps        xmm0, xmm0
        mulps   xmm0, xmm0

and MSVC generates this, opting to load 3 from memory:

        movdqu  xmm0, XMMWORD PTR [rcx]
        psrad   xmm0, 16
        orps    xmm0, XMMWORD PTR __xmm@00000003000000030000000300000003
        cvtdq2ps xmm0, xmm0
        mulps   xmm0, xmm0

clang, however, generates this, which is basically never a good idea:

        psrld   xmm0, 16
        por     xmm0, xmmword ptr [rip + .LCPI0_0]
        movdqa  xmm1, xmm0
        pmulhw  xmm1, xmm0
        pshuflw xmm1, xmm1, 232
        pshufhw xmm1, xmm1, 232
        pshufd  xmm1, xmm1, 232
        pmullw  xmm0, xmm0
        pshuflw xmm0, xmm0, 232
        pshufhw xmm0, xmm0, 232
        pshufd  xmm0, xmm0, 232
        punpcklwd       xmm0, xmm1
        cvtdq2ps        xmm0, xmm0

It looks like it decides that it would be a great idea to multiply the integer instead of multiplying the floating-point value, as it knows the range of the integer is small enough. This results in degraded performance.

Godbolt link for convenience: https://gcc.godbolt.org/z/746nGe1x5

Metadata

Metadata

Assignees

Labels

llvm:instcombineCovers the InstCombine, InstSimplify and AggressiveInstCombine passesmiscompilation

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions