Instead of a loop-reduction in simd, Vector.Dot is used #44

gfoidl · 2018-02-09T08:43:02Z

Fixes #43

dasm for the reduction:

G_M20916_IG01:
       C5F877               vzeroupper

G_M20916_IG02:
       C4E27D190524000000   vbroadcastsd ymm0, ymmword ptr[reloc @RWD00]        ; ymm0 = | 1 | 1 | 1 | 1 |
       C4E17D104F08         vmovupd  ymm1, ymmword ptr[rdi+8]                   ; ymm1 = | a | b | c | d |
       C4E17D59C1           vmulpd   ymm0, ymm1                                 ; ymm0 = | a | b | c | d |
       C4E17D7CC0           vhaddpd  ymm0, ymm0                                 ; ymm0 = | a + b | c + d | a + b | c + d |
       C4E37D19C201         vextractf128 ymm2, ymm0, 1                          ; ymm2 = | a + b | c + d | ----- | ----- |
       C4E17958C2           vaddpd   xmm0, xmm2                                 ; xmm0 = | a + b + c + d | ? |

G_M20916_IG03:
       C5F877               vzeroupper                                          ; xmm0 = | a + b + c + d | 0 |
       C3                   ret

gfoidl · 2018-02-09T17:36:07Z

Could be even better with just the horizontal add, but .net doesn't support this (now).

With C++ this could be writter as

double reduce_simd(double* arr, const int n)
{
    double sum = 0;

    __m256d* ptr  = reinterpret_cast<__m256d*>(arr);
    __m256d a {*ptr};
    __m256d tmp   = _mm256_hadd_pd(a, a);
    __m128d hi128 = _mm256_extractf128_pd(tmp, 1);
    __m128d lo128 = _mm256_extractf128_pd(tmp, 0);
    __m128d s     = _mm_add_pd(lo128, hi128);

    sum = _mm_cvtsd_f64(s);

    return sum;
}

yielding

vmovapd         ymm0, YMMWORD PTR [rdi]
vhaddpd         ymm0, ymm0, ymm0
vextractf128    xmm1, ymm0, 0x1
vaddpd          xmm0, xmm1, xmm0
vzeroupper
ret

thus saving the broadcast of 1 and the multiplication.

Instead of a loop-reduction in simd, the dotproduct is used

b1dad87

gfoidl added the performance label Feb 9, 2018

gfoidl added this to the v1.1.0 milestone Feb 9, 2018

gfoidl self-assigned this Feb 9, 2018

gfoidl merged commit 0e607fb into master Feb 9, 2018

gfoidl deleted the dot-for-sum-reduction branch February 9, 2018 08:45

This was referenced Apr 17, 2018

Use Vector.Dot for simd-reduction (sum) #43

Closed

Better Simd reduction #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instead of a loop-reduction in simd, Vector.Dot is used #44

Instead of a loop-reduction in simd, Vector.Dot is used #44

gfoidl commented Feb 9, 2018 •

edited

Loading

gfoidl commented Feb 9, 2018 •

edited

Loading

Instead of a loop-reduction in simd, Vector.Dot is used #44

Instead of a loop-reduction in simd, Vector.Dot is used #44

Conversation

gfoidl commented Feb 9, 2018 • edited Loading

gfoidl commented Feb 9, 2018 • edited Loading

gfoidl commented Feb 9, 2018 •

edited

Loading

gfoidl commented Feb 9, 2018 •

edited

Loading