Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel_spr_sde_test fails now and then. #23545

Closed
charris opened this issue Apr 6, 2023 · 11 comments
Closed

intel_spr_sde_test fails now and then. #23545

charris opened this issue Apr 6, 2023 · 11 comments
Assignees
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@charris
Copy link
Member

charris commented Apr 6, 2023

The basic cause seems to be due to changing fp mode

    @pytest.fixture(scope="function", autouse=True)
    def check_fpu_mode(request):
        """
        Check FPU precision mode was not changed during the test.
        """
        old_mode = get_fpu_mode()
        yield
        new_mode = get_fpu_mode()
    
        if old_mode != new_mode:
>           raise AssertionError("FPU precision mode changed from {0:#x} to {1:#x}"
                                 " during the test".format(old_mode, new_mode))
E           AssertionError: FPU precision mode changed from 0x37f to 0x77f during the test

new_mode   = 1919
old_mode   = 895

There are also some wrong results, but I suspect they are related to the fp mode.
See https://github.com/numpy/numpy/actions/runs/4631776714/jobs/8195205969?pr=23542 for more detail.

@r-devulap Thoughts?

@r-devulap r-devulap self-assigned this Apr 6, 2023
@r-devulap
Copy link
Member

Interesting, it doesn't always fail. Not sure why this happens but I am taking a look.

@charris charris changed the title sde_simd_avx512_test fails now and then. intel_spr_sde_test fails now and then. Apr 6, 2023
@charris
Copy link
Member Author

charris commented Apr 6, 2023

I messed up the title here, the failing test is intel_spr_sde_test.

@r-devulap
Copy link
Member

disabling the ci job until I figure out what's happening #23566

@r-devulap
Copy link
Member

closing. #24268 fixes it.

@ngoldbaum
Copy link
Member

This is happening again, https://github.com/numpy/numpy/actions/runs/5716892291/job/15489817515?pr=24188

@ngoldbaum ngoldbaum reopened this Jul 31, 2023
@r-devulap
Copy link
Member

This is happening again, https://github.com/numpy/numpy/actions/runs/5716892291/job/15489817515?pr=24188

Taking a look.

@r-devulap
Copy link
Member

Looks like it was only partially fixed :( I found more AVX-512 use cases where the bug still persists. Will disable the test, again! Minimal reproducer:

import numpy as np
from numpy.core._multiarray_tests import get_fpu_mode

print("fpu mode before exp2", hex(get_fpu_mode()))
np.exp2(1)
print("fpu mode after exp2", hex(get_fpu_mode()))
print("fpu mode before log2", hex(get_fpu_mode()))
np.log2(1)
print("fpu mode after log2", hex(get_fpu_mode()))

Command to run in SDE: sde-external-9.24.0-2023-07-13-lin/sde64 -skx -- python3 temp.py

Output:

fpu mode before exp2 0x37f
fpu mode after exp2 0x77f
fpu mode before log2 0x77f
fpu mode after log2 0x37f

@seiko2plus
Copy link
Member

seiko2plus commented Aug 27, 2023

@r-devulap, The changes happens to reserved bits of x87 FPU control word, so get_fpu_mode() need to be masked to avoid testing these bits. So I'm going to enable SDE spr tests again after patching before/after get_fpu_mode() comparison on x86

@seiko2plus
Copy link
Member

seiko2plus commented Sep 3, 2023

My apologies; it wasn't the reserved bits, but rather the rounding control bits. Some AVX512(BW/DW) instructions exhibited a bug due to counting the FPU control to emulate on-the-fly(IMM rounding bits) rounding without restoring the previous state.

@melissawm melissawm added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 4, 2023
@r-devulap
Copy link
Member

I completely missed the SDE version 9.27 which has been available for a couple of months now. Updated CI to use this version and run tests on SPR. See c4e790b from #25376

@r-devulap
Copy link
Member

SDE tests are enabled on TGL and SPR platforms. I can finally close this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

No branches or pull requests

5 participants