SIMD: Add sum intrinsics for float/double. #17681

Qiyu8 · 2020-10-30T03:46:49Z

The origin PR is too large to review, So I will split to several small PRs, Here is the sum intrinsics that was about to used in einsum, The intrinsics has been fully discussed and tested.

mattip · 2020-11-02T05:53:08Z

@seiko2plus looks ok?

seiko2plus · 2020-11-02T18:42:54Z

numpy/core/src/common/simd/neon/arithmetic.h

+
+// Horizontal add: Calculates the sum of all vector elements.
+NPY_FINLINE float npyv_sum_f32(float32x4_t a)
+{
+    float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));
+    return vget_lane_f32(vpadd_f32(r, r), 0);
+}
+#ifdef __aarch64__
+    NPY_FINLINE double npyv_sum_f64(float64x2_t a)
+    {
+        return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0);
+    }
+#endif


Suggested change

// Horizontal add: Calculates the sum of all vector elements.

NPY_FINLINE float npyv_sum_f32(float32x4_t a)

{

float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));

return vget_lane_f32(vpadd_f32(r, r), 0);

}

#ifdef __aarch64__

NPY_FINLINE double npyv_sum_f64(float64x2_t a)

{

return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0);

}

#endif

// Horizontal add: Calculates the sum of all vector elements.

#if NPY_SIMD_F64

#define npyv_sum_f32 vaddvq_f32

#define npyv_sum_f64 vaddvq_f64

#else

NPY_FINLINE float npyv_sum_f32(npyv_f32 a)

{

float32x2_t r = vadd_f32(vget_high_f32(a), vget_low_f32(a));

return vget_lane_f32(vpadd_f32(r, r), 0);

}

#endif

EDIT: bring vpadd_f32 again as it was, It should perform better than extracting two scalars.

seiko2plus · 2020-11-02T18:46:59Z

numpy/core/src/common/simd/vsx/arithmetic.h

+NPY_FINLINE float npyv_sum_f32(npyv_f32 a)
+{
+    return vec_extract(a, 0) + vec_extract(a, 1) +
+    vec_extract(a, 2) + vec_extract(a, 3);
+}


Suggested change

NPY_FINLINE float npyv_sum_f32(npyv_f32 a)

{

return vec_extract(a, 0) + vec_extract(a, 1) +

vec_extract(a, 2) + vec_extract(a, 3);

}

NPY_FINLINE float npyv_sum_f32(npyv_f32 a)

{

npyv_f32 sum = vec_add(a, npyv_combineh_f32(a, a));

return vec_extract(sum, 0) + vec_extract(sum, 1);

}

EDIT: my bad, fix swaping the highest half

seiko2plus · 2020-11-02T18:49:44Z

numpy/core/src/common/simd/avx2/arithmetic.h

@@ -117,3 +117,23 @@
    }
 #endif // !NPY_HAVE_FMA3
 #endif // _NPY_SIMD_AVX2_ARITHMETIC_H
+
+// Horizontal add: Calculates the sum of all vector elements.


please, move the intrinsics inside the header guard _NPY_SIMD_AVX2_ARITHMETIC_H,
same thing for other SIMD extensions.

seiko2plus · 2020-11-02T19:33:09Z

I prefer adding a testing unit for any new intrinsics to keep things under control.
@Qiyu8 please could you add a testing case for them?
If yes, then you have to add new methods for the new intrinsics within _simd.dispatch.c.src. for example:

// try to follow the current defentions in the way of sorting the source
// this how you should define the new python methods
SIMD_IMPL_INTRIN_1(sum_f32, f32, vf32)
#if NPY_SIMD_F64
SIMD_IMPL_INTRIN_1(sum_f64, f64, vf64)
#endif
// and that how we attach them
SIMD_INTRIN_DEF(sum_f32)
#if NPY_SIMD_F64
SIMD_INTRIN_DEF(sum_f64)
#endif

Once you get done bringing the new methods to _simd module, go to test_simd.py and add testing cases for them. for example:

def test_reduce(self):
    data = self._data()
    vdata = self.load(data)
    # reduce sum
    data_sum = sum(data)
    vsum = self.sum(vdata)
    assert vsum == data_sum

You can also have some fun and try to use _simd module as a tool for designing and discovering, for example

# 1- bring the baseline via dict `targets` or attribute `baseline`
from numpy.core._simd import targets, baseline
npyv = targets["baseline"] # or baseline
# you can also dump `targets` to get the supported SIMD extensions
# by default build option `--simd-test` contains the most common SIMD extentions
if not npyv.simd: # equivalent to C def `NPY_SIMD`
    print((
        "How that possible? changed the default build settings?\n"
        "maybe you running it under armhf, then get targets['NEON']"
    ))
    return

a = npyv.load_f32(range(npyv.nlanes_f32))
print("sum of f32", npyv.sum_f32(a))

if npyv.simd_f64: # equivalent to C def `NPY_SIMD_F64`
    b = npyv.load_f64(range(npyv.nlanes_f64))
    print("sum of f64", npyv.sum_f64(b))

EDIT: improve the examples

Qiyu8 · 2020-11-03T09:18:34Z

@seiko2plus Thanks for your detailed recommendations, The _simd.dispatch.c.src provides perfect functional test interface for universal intrinsics, we should write a doc such as how to write a simd test because the additional complexity compared to regular apis.

seiko2plus

LGTM, Thank you.

mattip · 2020-11-03T11:28:53Z

numpy/core/tests/test_simd.py

+
+        data_sum = sum(data)
+        vsum = self.sum(vdata)
+        assert vsum == data_sum


mattip · 2020-11-03T11:29:13Z

Thanks @Qiyu8

Qiyu8 added 2 commits October 30, 2020 11:40

SIMD: Add sum intrinsics for float/double.

e866904

add a newline at the end of the file.

e17cdf5

seiko2plus reviewed Nov 2, 2020

View reviewed changes

charris added 01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Nov 2, 2020

improve intrinsics and add sum intrinsic test

1f0298d

seiko2plus approved these changes Nov 3, 2020

View reviewed changes

mattip reviewed Nov 3, 2020

View reviewed changes

numpy/core/tests/test_simd.py

data_sum = sum(data)

vsum = self.sum(vdata)

assert vsum == data_sum

Copy link

Member

mattip Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

mattip merged commit 671e8a0 into numpy:master Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD: Add sum intrinsics for float/double. #17681

SIMD: Add sum intrinsics for float/double. #17681

Qiyu8 commented Oct 30, 2020

mattip commented Nov 2, 2020

seiko2plus Nov 2, 2020 •

edited

Loading

seiko2plus Nov 2, 2020 •

edited

Loading

seiko2plus Nov 2, 2020

seiko2plus commented Nov 2, 2020 •

edited

Loading

Qiyu8 commented Nov 3, 2020

seiko2plus left a comment

mattip Nov 3, 2020

mattip commented Nov 3, 2020

SIMD: Add sum intrinsics for float/double. #17681

SIMD: Add sum intrinsics for float/double. #17681

Conversation

Qiyu8 commented Oct 30, 2020

mattip commented Nov 2, 2020

seiko2plus Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

seiko2plus Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

seiko2plus Nov 2, 2020

Choose a reason for hiding this comment

seiko2plus commented Nov 2, 2020 • edited Loading

Qiyu8 commented Nov 3, 2020

seiko2plus left a comment

Choose a reason for hiding this comment

mattip Nov 3, 2020

Choose a reason for hiding this comment

mattip commented Nov 3, 2020

seiko2plus Nov 2, 2020 •

edited

Loading

seiko2plus Nov 2, 2020 •

edited

Loading

seiko2plus commented Nov 2, 2020 •

edited

Loading