ENH: skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE #9985

ziyan · 2017-11-08T01:50:26Z

GIL unnecessary when numpy floating point error handling is set to ignore.

Fixes an issue where numpy might deadlock when computing a**2 where a is tiny, e.g. -2.3693744349064819e-197:

#0  0x00007f6ff8c5b536 in do_futex_wait.constprop () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f6ff8c5b5e4 in __new_sem_wait_slow.constprop.0 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f6ff20b6768 in PyThread_acquire_lock (lock=0x55e085e26020, waitflag=<optimized out>) at ../Python/thread_pthread.h:324
#3  0x00007f6ff2028556 in PyEval_RestoreThread (tstate=0x7f6fe11adf40) at ../Python/ceval.c:359
#4  0x00007f6ff20e0b96 in PyGILState_Ensure () at ../Python/pystate.c:611
#5  0x00007f6fcb880dd6 in _error_handler (method=method@entry=0, errobj=errobj@entry=('double_scalars', None), errtype=errtype@entry=0x7f6fcb8d28c1 "underflow", retstatus=retstatus@entry=4, first=first@entry=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:119
#6  0x00007f6fcb8872ff in PyUFunc_handlefperr (errmask=521, errobj=('double_scalars', None), retstatus=retstatus@entry=4, first=first@entry=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:209
#7  0x00007f6fcb894f08 in double_power (a=<optimized out>, b=2, __NPY_UNUSED_TAGGEDc=<optimized out>) at numpy/core/src/umath/scalarmath.c.src:1168
#8  0x00007f6ff2083a17 in ternary_op.isra.5 (v=<optimized out>, w=<optimized out>, z=None, op_slot=48) at ../Objects/abstract.c:1065
#9  0x00007f6ff2029e0a in PyEval_EvalFrameEx

P.S. the stack is coming from numpy 1.11, but same problem seems to exist on master.

GIL unnecessary when numpy floating point error handling is set to ignore.

njsmith · 2017-11-08T01:52:18Z

Is there an bug open for the deadlock? Can you give a reproducer?

…

On Nov 7, 2017 19:50, "Ziyan Zhou" ***@***.***> wrote: GIL unnecessary when numpy floating point error handling is set to ignore. Fixes an issue where numpy might deadlock when computing a**2 where a is tiny, e.g. -2.3693744349064819e-197: #0 0x00007f6ff8c5b536 in do_futex_wait.constprop () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f6ff8c5b5e4 in __new_sem_wait_slow.constprop.0 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f6ff20b6768 in PyThread_acquire_lock (lock=0x55e085e26020, waitflag=<optimized out>) at ../Python/thread_pthread.h:324 #3 0x00007f6ff2028556 in PyEval_RestoreThread (tstate=0x7f6fe11adf40) at ../Python/ceval.c:359 #4 0x00007f6ff20e0b96 in PyGILState_Ensure () at ../Python/pystate.c:611 #5 0x00007f6fcb880dd6 in _error_handler ***@***.***=0, ***@***.***=('double_scalars', None), ***@***.***=0x7f6fcb8d28c1 "underflow", ***@***.***=4, ***@***.***=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:119 #6 0x00007f6fcb8872ff in PyUFunc_handlefperr (errmask=521, errobj=('double_scalars', None), ***@***.***=4, ***@***.***=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:209 #7 0x00007f6fcb894f08 in double_power (a=<optimized out>, b=2, __NPY_UNUSED_TAGGEDc=<optimized out>) at numpy/core/src/umath/scalarmath.c.src:1168 #8 0x00007f6ff2083a17 in ternary_op.isra.5 (v=<optimized out>, w=<optimized out>, z=None, op_slot=48) at ../Objects/abstract.c:1065 #9 0x00007f6ff2029e0a in PyEval_EvalFrameEx P.S. the stack is coming from numpy 1.11, but same problem seems to exist on master. ------------------------------ You can view, comment on, or merge this pull request online at: #9985 Commit Summary - ENH skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE File Changes - *M* numpy/core/src/umath/extobj.c <https://github.com/numpy/numpy/pull/9985/files#diff-0> (5) Patch Links: - https://github.com/numpy/numpy/pull/9985.patch - https://github.com/numpy/numpy/pull/9985.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9985>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOaFT5hX2nuTgmZu3Kbkm5ceDJ-BKsks5s0QjkgaJpZM4QVul6> .

ziyan · 2017-11-08T02:08:00Z

There is no bug open, because I have not found a way to reproduce the deadlock with simple script. I will get back to you on that.

ziyan · 2017-11-08T03:15:00Z

I added a simple reproduction in this repository:
https://github.com/ziyan/numpy-wsgi-deadlock

You will need docker to reproduce it. But it is a simple apache2 wsgi running a simple python script hello.wsgi.

eric-wieser · 2017-11-08T04:33:31Z

Diff looks fine, but if there's a deadlock here, can't we still hit it in the other modes anyway? This doesn't seem to solve the underlying problem to me.

ziyan · 2017-11-08T04:48:36Z

You are right. It does not. I think the deadlock is an instance of gh-8559. And the conclusion there is that wsgi under default settings is not supported by numpy because it uses multiple sub interpreters. I tried WSGIApplicationGroup %{GLOBAL} and the problem seems to go away.

eric-wieser · 2017-11-08T04:51:14Z

Sounds pretty convincing to me. The patch looks uncontroversial, so I'll put it in. Thanks!

eric-wieser · 2017-11-08T04:53:10Z

Yeah, this sure looks like gh-5856 to me

ziyan · 2017-11-08T04:53:36Z

Thank you for the quick response!

twmr · 2017-12-01T22:57:37Z

Does this PR fix the deadlock in the following simple program ?

//
// gcc pyinterptest.c -I/usr/include/python2.7 -Wall -lpython2.7 -o pyinterptest && ./pyinterptest
//
#include <Python.h>


void run_numpy_code(void){
    PyRun_SimpleString("import numpy as np");
    PyRun_SimpleString("print(3)");
    PyRun_SimpleString("print(np.array([1.23434e-312])**2)");
    /* PyRun_SimpleString("print(np.array([1.23434e-312])**1.)"); */
}


void foo1(void) {
    // does not hang
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(mainThread); // ACC GIL

    run_numpy_code(); // does not hang
}

void foo2(void) {
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState* t1 = PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(t1); // ACC GIL

    run_numpy_code(); // hangs
}

void foo3(void) {
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState* t1  = PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(t1);
    PyEval_SaveThread(); // REL GIL
    PyEval_RestoreThread(mainThread); // ACC GIL

    run_numpy_code(); // does not hang
}

int main(int argc, char *argv[]) {
    Py_Initialize();
    PyEval_InitThreads(); // ACC GIL

    foo2();
    Py_Finalize();
    return 0;
}

twmr · 2017-12-02T07:55:54Z

I've just compiled numpy@master locally and can verify that the above program does not hang anymore. 👍

Do we want to add a stripped down version of the above program to the numpy unit tests? @eric-wieser

ENH skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE

bbf8011

GIL unnecessary when numpy floating point error handling is set to ignore.

eric-wieser merged commit 3d0c041 into numpy:master Nov 8, 2017

eric-wieser added 01 - Enhancement component: numpy._core Embedded Issues regarding embedded python interpreters labels Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE #9985

ENH: skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE #9985

ziyan commented Nov 8, 2017

njsmith commented Nov 8, 2017 via email

ziyan commented Nov 8, 2017

ziyan commented Nov 8, 2017

eric-wieser commented Nov 8, 2017 •

edited

ziyan commented Nov 8, 2017

eric-wieser commented Nov 8, 2017

eric-wieser commented Nov 8, 2017

ziyan commented Nov 8, 2017

twmr commented Dec 1, 2017 •

edited

twmr commented Dec 2, 2017

ENH: skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE #9985

ENH: skip NPY_ALLOW_C_API for UFUNC_ERR_IGNORE #9985

Conversation

ziyan commented Nov 8, 2017

njsmith commented Nov 8, 2017 via email

ziyan commented Nov 8, 2017

ziyan commented Nov 8, 2017

eric-wieser commented Nov 8, 2017 • edited

ziyan commented Nov 8, 2017

eric-wieser commented Nov 8, 2017

eric-wieser commented Nov 8, 2017

ziyan commented Nov 8, 2017

twmr commented Dec 1, 2017 • edited

twmr commented Dec 2, 2017

eric-wieser commented Nov 8, 2017 •

edited

twmr commented Dec 1, 2017 •

edited