ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops #7980

juliantaylor · 2016-08-27T09:31:17Z

Added the ability in the umath generator to runtime select loops depending on cpu capabilities. It is only for the basic loops, but just because thats all I currently needed.

As an example I added specializations for the auto-vectorized integer loops (GCC only).
The results are not very impressive on my laptop i5 haswell, but you do get 10-20% better performance for cpu cache sized arrays.

Possible extensions to this might be for the vector math loops (sin, cos, exp, log, pow), here there are SSE4 and AVX2 variants available in glibc.

NPY_CPU_SUPPORTS_AVX2 checks at runtime if AVX2 is supported

Selected at runtime depending on CPU features.

juliantaylor · 2016-08-27T09:34:48Z

It might be a good idea to split the loops.c.src into separate files, it is getting quite large.

E.g. one for int, float and complex. The different types don't really share anything in the file besides some macros that can go into the header.

aeberspaecher · 2016-09-08T15:02:20Z

I'd love to see faster vector math loops. Is there any way I could help? I'm not familiar with the code base, but at least I could help testing (no AVX2 though).

juliantaylor · 2016-09-19T18:08:42Z

any thoughts on this?
its probably not the best way to implement the ufunc generation but its simple. As its internal so we can always change it later.

juliantaylor · 2016-09-24T12:09:21Z

I have added avx macros, not used yet but could be in future

I'll put it in tomorrow, so last chance for comments.

rgommers

Changes LGTM. Adding these optimizations one by one is fine I think, we don't have the capacity to do a whole set at once.

One thing I looked at just now is how OPTIONAL_INTRINSICS gets triggered - there's not enough comments/docs for it to be really clear, but it looks to me like for SSE/SSE2/SSE3 we use env vars when building Windows binaries (see _bdist_wininst in pavement.py) while on other platforms and for other builds everything else just gets turned if it's detected on the build machine. Not sure I got that right though, otherwise we should have run into some issues already with manylinux and conda linux builds.

juliantaylor · 2016-09-25T11:02:17Z

linux always builds generic binaries, this doesn't change this as the unsupported instructions will never be run if the cpu does not support it.

windows has some special case to build 3 variants with nosse, sse2 and sse3 and chooses at install time what to use.

juliantaylor added 3 commits August 27, 2016 10:36

MAINT: add avx __builtin_cpu_supports and target attribute checks

10723a9

NPY_CPU_SUPPORTS_AVX2 checks at runtime if AVX2 is supported

MAINT: add support for runtime selected ufunc SIMD loops

0a2276a

ENH: add some AVX2 optimized integer ufunc loops

37740eb

Selected at runtime depending on CPU features.

charris changed the title ~~add ability to runtime select ufunc loops, add AVX2 integer loops~~ ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops Aug 28, 2016

charris added 01 - Enhancement component: numpy._core labels Aug 28, 2016

juliantaylor added 2 commits September 24, 2016 13:15

MAINT: add runtime check for AVX macros

a860256

DOC: add release note entry for AVX2 integer loops

ae32e78

juliantaylor force-pushed the avx-runtime branch from 17dfcd1 to ae32e78 Compare September 24, 2016 11:23

rgommers approved these changes Sep 24, 2016

View reviewed changes

juliantaylor merged commit 0887da9 into numpy:master Sep 25, 2016

juliantaylor deleted the avx-runtime branch September 25, 2016 15:16

This was referenced Sep 25, 2016

BUG: shift operator cycles, fixes #2449 #7473

Closed

ENH: add contract: optimizing numpy's einsum expression #5488

Merged

rgommers mentioned this pull request Oct 10, 2016

master broken on OS X with Clang #8130

Closed

rgommers mentioned this pull request Mar 28, 2019

ENH: Use AVX for float32 implementation of np.exp & np.log #13134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops #7980

ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops #7980

juliantaylor commented Aug 27, 2016

juliantaylor commented Aug 27, 2016

aeberspaecher commented Sep 8, 2016

juliantaylor commented Sep 19, 2016

juliantaylor commented Sep 24, 2016

rgommers left a comment

juliantaylor commented Sep 25, 2016

ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops #7980

ENH: Add ability to runtime select ufunc loops, add AVX2 integer loops #7980

Conversation

juliantaylor commented Aug 27, 2016

juliantaylor commented Aug 27, 2016

aeberspaecher commented Sep 8, 2016

juliantaylor commented Sep 19, 2016

juliantaylor commented Sep 24, 2016

rgommers left a comment

Choose a reason for hiding this comment

juliantaylor commented Sep 25, 2016