Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG, SIMD: Fix exp FP stack overflow when AVX512_SKX is enabled #20405

Merged
merged 1 commit into from
Nov 19, 2021

Conversation

seiko2plus
Copy link
Member

Don't count on the compiler for cast between mask and int registers.
On gcc7 with flags -march>=nocona -O3 can cause FP stack overflow
which may lead to putting NaN into certain HW/FP calculations.

For more details, please check the comments in #20356

closes #20356

@seiko2plus seiko2plus added 09 - Backport-Candidate PRs tagged should be backported component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Nov 18, 2021
@seiko2plus seiko2plus changed the title BUG, SIMD: Fix exp/log FP stack overflow when AVX512_SKX is enabled BUG, SIMD: Fix exp FP stack overflow when AVX512_SKX is enabled Nov 19, 2021
  Don't count on the compiler for cast between mask and int registers.
  On gcc7 with flags `-march>=nocona -O3` can cause FP stack overflow
  which may lead to putting NaN into certain HW/FP calculations.

  For more details, please check the comments in:
    - numpy#20356
@mattip
Copy link
Member

mattip commented Nov 19, 2021

I went through the use of overflow_mask, underflow_mask, invalid_mask, divide_by_zero_mask and it looks like you caught all the conversions to int. Cool.

@mattip mattip merged commit 0bb936c into numpy:main Nov 19, 2021
@mattip
Copy link
Member

mattip commented Nov 19, 2021

Thanks @seiko2plus

@seiko2plus
Copy link
Member Author

I went through the use of overflow_mask, underflow_mask, invalid_mask, divide_by_zero_mask and it looks like you caught all the conversions to int. Cool.

ops I didn't look deep enough, there're other areas:

glibc_mask = m1 & m2;
if (glibc_mask != 0xFF) {

/* call glibc's log func when x around 1.0f */
if (glibc_mask != 0) {

But my local tests passed, it seems that aggressive optimization didn't affect on them since they are inside the loop.
we need to get rid of any kind of raw SIMD, I will put this file into my highest priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Calling np.exp(0) causes many lapack calls to return NaNs on CPUs with AVX512 and MKL
3 participants