Skip to content

Commit

Permalink
BUG: Fix AVX2 intrinsic npyv_store2_till_s64 on MSVC > 19.29
Browse files Browse the repository at this point in the history
  This a workaround for compiler bug, still the new changes
  doesn't affect on performance.
  • Loading branch information
seiko2plus authored and charris committed Jun 15, 2023
1 parent a70fc65 commit bda2ab1
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions numpy/core/src/common/simd/avx2/memory.h
Expand Up @@ -372,10 +372,29 @@ NPY_FINLINE void npyv_store2_till_s32(npy_int32 *ptr, npy_uintp nlane, npyv_s32
NPY_FINLINE void npyv_store2_till_s64(npy_int64 *ptr, npy_uintp nlane, npyv_s64 a)
{
assert(nlane > 0);
#ifdef _MSC_VER
/*
* Although this version is compatible with all other compilers,
* there is no performance benefit in retaining the other branch.
* However, it serves as evidence of a newly emerging bug in MSVC
* that started to appear since v19.30.
* For some reason, the MSVC optimizer chooses to ignore the lower store (128-bit mov)
* and replace with full mov counting on ymmword pointer.
*
* For more details, please refer to the discussion on https://github.com/numpy/numpy/issues/23896.
*/
if (nlane > 1) {
npyv_store_s64(ptr, a);
}
else {
npyv_storel_s64(ptr, a);
}
#else
npyv_storel_s64(ptr, a);
if (nlane > 1) {
npyv_storeh_s64(ptr + 2, a);
}
#endif
}
/*********************************
* Non-contiguous partial store
Expand Down

0 comments on commit bda2ab1

Please sign in to comment.