Bad interaction (hang) involving TBB and subprocess fork and maybe MKL OMP #9501
Labels
bug - incorrect behavior
Bugs: incorrect behavior
discussion
An issue requiring discussion
no action required
No action was needed to resolve.
This is documenting an issue discovered during 0.59.1. By writing it in the issue tracker, hopefully it can be searchable in case similar problems occur in the future for us or others. The issue being described here is already fixed by b39bdf2. (Self reminder: cherrypick the fix to main and link this this issue)
The
test_issue9490_non_det_ssa_problem
testcase can hang on very specifically Python 3.9, NumPy 1.25 and Linux x86-64 at this line:numba/numba/tests/parfor_iss9490_usecase.py
Line 71 in 147c641
GDB reveals output:
Important observations here:
__pthread_clockjoin_ex
as part of aprepare_fork
call for the TBB backend. TBB is known to have problems during fork. This explains the hang. But it is surprising fornumba.testing.assert_allclose
would cause a fork.Using PDB to trace the hanging line, it is discovered that NumPy 1.25+ introduced this lines which call subprocess thus fork:
https://github.com/numpy/numpy/blob/maintenance/1.25.x/numpy/testing/_private/utils.py#L1239-L1253
Together with the lazy import by
numba.testing.assert_allclose
, it explains the use of fork.As to why the test is not failing with NumPy 1.26:
subprocess
which may have averted bugs with fork; e.g. potential undefined behavior with subprocess using vfork() on Linux? python/cpython#91401The [fix] (
numba/numba/tests/parfor_iss9490_usecase.py
Line 71 in 147c641
subprocess
fork bug. I didn't do further analysis since the bug is fixed.The text was updated successfully, but these errors were encountered: