Skip to content

Missing optimization in zip_float #114959

@junaire

Description

@junaire

Source:

#include <immintrin.h>

void zip_float(const double *src, double *dst) {
    __m256d s0 = _mm256_broadcast_pd((__m128d*)src);
    __m256d s1 = _mm256_broadcast_pd((__m128d*)src + 2);
    __m256d s = _mm256_shuffle_pd(s0, s1, 0xc);
    s = _mm256_mul_pd(s, s);
    _mm256_store_pd(dst, s);
}

LLVM:

zip_float:
        vmovupd xmm0, xmmword ptr [rdi]
        vmovupd xmm1, xmmword ptr [rdi + 32]
        vunpcklpd       xmm2, xmm0, xmm1
        vunpckhpd       xmm0, xmm0, xmm1
        vinsertf128     ymm0, ymm2, xmm0, 1
        vmulpd  ymm0, ymm0, ymm0
        vmovapd ymmword ptr [rsi], ymm0
        vzeroupper
        ret

GCC:

zip_float:
        vbroadcastf128  ymm0, XMMWORD PTR [rdi]
        vbroadcastf128  ymm1, XMMWORD PTR [rdi+32]
        vshufpd ymm0, ymm0, ymm1, 12
        vmulpd  ymm0, ymm0, ymm0
        vmovapd YMMWORD PTR [rsi], ymm0
        vzeroupper
        ret

Godbolt: https://godbolt.org/z/ffz1YEhPE
Tweeted by FFmpeg: https://x.com/FFmpeg/status/1853326818008514900

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions