Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly random segfault on macOS if function is in larger library #5890

Closed
1 task done
max3-2 opened this issue Jun 19, 2020 · 58 comments
Closed
1 task done

Seemingly random segfault on macOS if function is in larger library #5890

max3-2 opened this issue Jun 19, 2020 · 58 comments
Labels
bug ParallelAccelerator threading Issue involving the threading layers

Comments

@max3-2
Copy link

max3-2 commented Jun 19, 2020

Reporting a bug

Hi,

first of all, sorry for this small report and few examples, but at this point this seems to untraceable to me that im hoping for any input to trace down the issue. Maybe its even a severely stupid mistake by myself that I just can't find. I was getting segfaults from a Numba function and traced it down to the state I will outline here, but at this point I can't find anything anymore.

I have a function which operates on arrays. I have simplified it very far so I know its not making so much sense - but here it goes.

import numpy as np
import numba

@numba.jit("f8[:,:](f8[:,:,:],f8[:,:],f8,f8,f8[:])", nopython=True, parallel=True,
           nogil=True)
def evalManyIndividual(individuals, X,p1, p2, p3):
    fitnesses = np.zeros((individuals.shape[0], 4))
    nF = individuals.shape[0]
    for i in numba.prange(nF):
        individual = individuals[i]
        P = np.random.random((3,4))
        fitnesses[i] = np.random.random((4,))
    return fitnesses

For debugging, Im using synthetic input data

n = 15
m = 3
inputInd = np.random.random((500, n, m))
inputArray = np.random.random((n, m))
p1 = 25e-3
p2 = 55.
p3 = np.array([320., 240.])

ret = evalManyIndividualQ3D(inputInd, inputArray, p1, p2, p3)

Running this as a small script works. Running this from interactive works. However, I have a large library with Numba functions in which the one above is included. Just somewhere in there. Same syntax, copy & paste. If I then add the execution with the same synthetic input data after the library (compiling the full library including the above function), only calling the function as above

ret = evalManyIndividualQ3D(inputInd, inputArray, p1, p2, p3)

Im getting a segfault. No traceback, no nothing. In terminal its zsh: segmentation fault, Jupyter just hangs completely.

This happens on macOS 10.15. A difference to mention would be that during compilation of the library, Im getting some warnings

NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 2d, A))
  warnings.warn(NumbaPerformanceWarning(msg))

Those products are not in the function or connected to the function that crashes!

At this point, Im happy for any type of input since I can't find a reason.

@max3-2
Copy link
Author

max3-2 commented Jun 19, 2020

I went back some versions for Numba. At 0.47 and below Im getting an error due to a dict, so I can't go back further.

At 0.48 its working.
At 0.49 its crashing.
At 0.50 its crashing.

There is a class definition in the library (jitclass). In either 49 or 50 not moving to experimental will raise a warning but does not influence the behavior.
I went on and commented out the jitclass alltogether, removing all connected occurrences. Does not change the behavior - is seems that from 0.49 something went wrong.

@sklam sklam added bug question Notes an issue as a question labels Jun 19, 2020
@sklam
Copy link
Member

sklam commented Jun 19, 2020

@max3-2 here are a few questions to help diagnose the problem:

  • Does the larger library uses multiprocessing (ipc? fork? mpi?) or threading?
  • Does turning off the parallel=True make the bug go away?
  • Does removing the type-signature ("f8[:,:](f8[:,:,:],f8[:,:],f8,f8,f8[:])") make the bug go away?

@stuartarchibald
Copy link
Contributor

Wonder if boundscheck=True in the decorator might help?

@max3-2
Copy link
Author

max3-2 commented Jun 21, 2020

Wonder if boundscheck=True in the decorator might help?

Adding this to the function in question or globally doesn't change the issue.

@max3-2 here are a few questions to help diagnose the problem:

  • Does the larger library uses multiprocessing (ipc? fork? mpi?) or threading?
  • Does turning off the parallel=True make the bug go away?
  • Does removing the type-signature ("f8[:,:](f8[:,:,:],f8[:,:],f8,f8,f8[:])") make the bug go away?

1 )Just using parallel setting on multicore local. Don't know the backend...MPI?
2) It does. Also the execution speed does not seem to change without parallel (e.g. its not slower). I am keeping the Numba.prange though and not replacing it with a range(). The compilation seems to be much faster, im comparison to parallel=True where it seems to "hang" before the actually computation starts.
IMPORTANT: If I go back to my original library, the program crashes at the first instance of a function with parallel=True. If I set all to False, it works!
3) With parallel=True, changing the signature does not affect the issue.

Maybe this is connected to having multiple functions with parallel=True in one library getting compiled at the same time? Even if they are not used?

@esc
Copy link
Member

esc commented Jun 22, 2020

This seems like a tricky issue and appears to be hard to debug. Here are some suggestions:

  1. You could use git bisect to locate the commit that seems to have introduced this regression, since you know that 0.48 is working and you have a way to reproduce the crash with certainty.

  2. Going further, if you do have a segfault, you will most likely have a core-dump too. You could use lldb to inspect the core-dump to see if that has any pointers as to what might be going wrong.

@max3-2
Copy link
Author

max3-2 commented Jun 22, 2020

This seems like a tricky issue and appears to be hard to debug. Here are some suggestions:

  1. You could use git bisect to locate the commit that seems to have introduced this regression, since you know that 0.48 is working and you have a way to reproduce the crash with certainty.
  2. Going further, if you do have a segfault, you will most likely have a core-dump too. You could use lldb to inspect the core-dump to see if that has any pointers as to what might be going wrong.

Ok let me correct this: bisect somehow starts to throw a different error, see below. This makes it loose the original I think. Finally, it points to b528205463b6c532c361980f5cb3cd4a5d402a4d however I don't know how meaningful this is. The new error it starts to produce on compilation is

numba.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Type of #4 arg mismatch: i1 != i32

File "Python/its-image-analyzer/itsimageanalyzer/quickComputation/calcDLTQLib.py", line 36:
def arglexsortQ(arr):
    <source elided>
        sorter = np.argsort(arr[:, nc-1-i], kind='mergesort')
        arr = arr[sorter, :].copy()
        ^

``

@esc
Copy link
Member

esc commented Jun 22, 2020

@max3-2 thanks for reporting back on this. I am not sure what to make of the error. Perhaps you can find a way to build some kind of a reproducer that is self-contained and that you can share? It would help reproducing and debugging this.

@max3-2
Copy link
Author

max3-2 commented Jun 23, 2020

Think this is possible. Ill try to get to it and report back...may take a day or two...

@esc
Copy link
Member

esc commented Jun 23, 2020

@max3-2 no problem, take your time!

@max3-2
Copy link
Author

max3-2 commented Jun 25, 2020

@max3-2 no problem, take your time!

This would be a fairly simple version which reproducibly, on my machine, runs on .48 and crashes from .49.
Running bisect somehow looses track and produces the same error as above - I still don't know if this is connected since I can't make much from the traceback

https://gist.github.com/max3-2/ce88444a3cfd5d0a5f5525c66afba056

@max3-2
Copy link
Author

max3-2 commented Jun 25, 2020

And on a different note: I had few Windows users test it, they (three) were able to reproduce on Win10 with py3.7

@stuartarchibald
Copy link
Contributor

MWR:

import numpy as np
import numba

numba.config.NUMBA_NUM_THREADS = 3

@numba.njit(parallel=True)
def foo(A, B, C, D):
    # B is not referenced
    # C is not referenced
    # D is not referenced
    E = np.zeros((A.shape[0], 4))
    for i in numba.prange(A.shape[0]):
        E[i] = np.random.random((4,))
    return 

if __name__ == '__main__':
    m = 2
    n = 15
    A = np.zeros((100, m, n))
    B = np.zeros(1)
    C = 1.
    D = np.zeros(1)
    foo(A, B, C, D)

CC @DrTodd13 seems like setting numba.config.NUMBA_NUM_THREADS to something that isn't core count is the cause, wonder if there's some core count based value being captured somewhere still? It's also strange that it seems to be an illegal decref causing it, wonder if this is due to the variables referenced but not used.

gdb bt:

#0  0x00007fe8e76be065 in NRT_decref ()
#1  0x00007fe8d3fb15b2 in cpython::__main__::foo$243(Array<double, 3, C, mutable, aligned>, Array<double, 1, C, mutable, aligned>, double, Array<double, 1, C, mutable, aligned>) ()

@stuartarchibald stuartarchibald added ParallelAccelerator and removed question Notes an issue as a question labels Jun 25, 2020
@max3-2
Copy link
Author

max3-2 commented Jun 26, 2020

Thanks for further reducing it. At least in my "final" code, no variables are unused, this was due to quickly reducing it to the core problem.

@JSKenyon
Copy link

This has just reared its head in one of my projects @stuartarchibald. Reducing your MWE even further, the following example fails on numba>=0.49:

import numpy as np
import numba

numba.config.NUMBA_NUM_THREADS = 3


@numba.njit(parallel=True)
def f(x):

    for i in numba.prange(x.size):
        x[i] = x[i]
    return


if __name__ == '__main__':
    f(np.ones(100))

It doesn't seem to have anything to do with unused variables. Could this be a side effect of the thread masking introduced in 0.49? The second point in the docs here might be a hint.

@JSKenyon
Copy link

I can confirm that the following doesn't crash @stuartarchibald, @max3-2, @esc:

import numpy as np
import numba


@numba.njit(parallel=True)
def f(x):

    numba.set_num_threads(3)

    for i in numba.prange(x.size):
        x[i] = x[i]
    return


if __name__ == '__main__':
    f(np.ones(100))

@JSKenyon
Copy link

Pinging @asmeurer in case he has some ideas.

@stuartarchibald
Copy link
Contributor

@JSKenyon thanks for looking at this more.

RE: #5890 (comment) can't seem to reproduce locally on the released 0.50.0, valgrind also seems to be ok with it.

RE:

Could this be a side effect of the thread masking introduced in 0.49

possibly, but it'd be good to rule out other things too.

@JSKenyon
Copy link

Interesting that it doesn't reproduce the issue for you. I am in a clean virtualenv running Python 3.6.9 and numba 0.50.1.

@stuartarchibald
Copy link
Contributor

Interesting that it doesn't reproduce the issue for you. I am in a clean virtualenv running Python 3.6.9 and numba 0.50.1.

I've tried numerous conda envs and then also RHEL python3.6.8 virtual env with:

$ pip freeze 
llvmlite==0.33.0
numba==0.50.1
numpy==1.19.0

If you catchsegv the program where's the segfault coming from?

@JSKenyon
Copy link

This is the only output I get from catchsegv:

Register dump:

 RAX: 0000000000000000   RBX: 0000000000000018   RCX: 00007f82ab354320
 RDX: 0000000001dbac00   RSI: 0000000000000001   RDI: 0000000000000000
 RBP: 000000000000002f   R8 : 0000000000000000   R9 : 0000000000000000
 R10: 0000000001db3010   R11: 0000000000000000   R12: 000000000000001f
 R13: 0000000000000020   R14: 0000000000000027   R15: 0000000000000028
 RSP: 00007ffe4150fa80

 RIP: 0000000000000030   EFLAGS: 00010206

 CS: 0033   FS: 0000   GS: 0000

 Trap: 0000000e   Error: 00000014   OldMask: 00000000   CR2: 00000030

 FPUCW: 0000037f   FPUSW: 00000420   TAG: 00007f82
 RIP: 8590e8f8   RDP: 00000000

 ST(0) ffff 8000000000000000   ST(1) 0000 0000000000000000
 ST(2) 0000 0000000000000000   ST(3) ffff f800000000000000
 ST(4) ffff 81ceb32c4b43fcf5   ST(5) ffff 81ceb32c4b43fcf5
 ST(6) ffff ffffffffe219652c   ST(7) b000 b000000000000000
 mxcsr: 1fa2
 XMM0:  000000000000000000000000ba92bb80 XMM1:  000000000000000000000000ba92bb80
 XMM2:  000000000000000000000000ba92bb80 XMM3:  000000000000000000000000ba92bb80
 XMM4:  000000000000000000000000ba92bb80 XMM5:  000000000000000000000000ba92bb80
 XMM6:  000000000000000000000000ba92bb80 XMM7:  000000000000000000000000ba92bb80
 XMM8:  000000000000000000000000ba92bb80 XMM9:  000000000000000000000000ba92bb80
 XMM10: 000000000000000000000000ba92bb80 XMM11: 000000000000000000000000ba92bb80
 XMM12: 000000000000000000000000ba92bb80 XMM13: 000000000000000000000000ba92bb80
 XMM14: 000000000000000000000000ba92bb80 XMM15: 000000000000000000000000ba92bb80

The backtrace is empty.

@JSKenyon
Copy link

I am running Pop!_OS 18.04 LTS although a colleague of mine is reporting the segfaults on an Ubuntu 18.04 system with Python 3.6.9. In both cases we are using numba via pip, not conda.

@sjperkins
Copy link
Contributor

I can also reproduce this on

  • Ubuntu 18.04
  • Python 3.6.9
  • numba 0.50.1

@sjperkins
Copy link
Contributor

And

  • Ubuntu 18.04
  • Python 3.8.3
  • numba 0.50.1

@sjperkins
Copy link
Contributor

I wonder if this is a problem with the wheels? In my python 3.8 virtual environment, the following command

$ python -m numba -s

produces the following output:

`python -m numba -s` output
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2020-07-14 12:58:55.359701
UTC start time                                : 2020-07-14 10:58:55.359707
Running time (s)                              : 1.896834

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
CPU Count                                     : 8
Number of accessible CPUs                     : 8
List of accessible CPUs cores                 : 0-7
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 bmi bmi2
                                                clflushopt cmov cx16 cx8 f16c fma
                                                fsgsbase fxsr invpcid lzcnt mmx
                                                movbe mpx pclmul popcnt prfchw
                                                rdrnd rdseed rtm sahf sgx sse sse2
                                                sse3 sse4.1 sse4.2 ssse3 xsave
                                                xsavec xsaveopt xsaves

Memory Total (MB)                             : 15840
Memory Available (MB)                         : 8962

__OS Information__
Platform Name                                 : Linux-5.3.0-62-generic-x86_64-with-glibc2.27
Platform Release                              : 5.3.0-62-generic
OS Name                                       : Linux
OS Version                                    : #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020
OS Specific Version                           : ?
Libc Version                                  : glibc 2.27

__Python Information__
Python Compiler                               : GCC 7.5.0
Python Implementation                         : CPython
Python Version                                : 3.8.3
Python Locale                                 : en_GB.UTF-8

__LLVM Information__
LLVM Version                                  : 9.0.1

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Detect Output:
None
CUDA Librairies Test Output:
None

__ROC information__
ROC Available                                 : False
ROC Toolchains                                : None
HSA Agents Count                              : 0
HSA Agents:
None
HSA Discrete GPUs Count                       : 0
HSA Discrete GPUs                             : None

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : True
+-->TBB imported successfully.
OpenMP Threading Layer Available              : False
+--> Disabled due to Unknown import problem.
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : 3.18.9
Conda Env                                     : 4.7.12
Conda Platform                                : linux-64
Conda Python Version                          : 3.7.3.final.0
Conda Root Writable                           : True

__Installed Packages__
_anaconda_depends         2019.03                  py37_0  
_ipyw_jlab_nb_ext_conf    0.1.0                    py37_0  
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                   py37_0  
anaconda                  custom                   py37_1  
anaconda-client           1.7.2                    py37_0  
anaconda-navigator        1.9.7                    py37_0  
anaconda-project          0.8.3                      py_0  
asn1crypto                0.24.0                   py37_0  
astroid                   2.3.1                    py37_0  
astropy                   3.2.1            py37h7b6447c_0  
atomicwrites              1.3.0                    py37_1  
attrs                     19.1.0                   py37_1  
babel                     2.7.0                      py_0  
backcall                  0.1.0                    py37_0  
backports                 1.0                        py_2  
backports.functools_lru_cache 1.5                        py_2  
backports.os              0.1.1                    py37_0  
backports.shutil_get_terminal_size 1.0.0                    py37_2  
backports.tempfile        1.0                        py_1  
backports.weakref         1.0.post1                  py_1  
beautifulsoup4            4.8.0                    py37_0  
bitarray                  1.0.1            py37h7b6447c_0  
bkcharts                  0.2                      py37_0  
blas                      1.0                         mkl  
bleach                    3.1.0                    py37_0  
blosc                     1.16.3               hd408876_0  
bokeh                     1.3.4                    py37_0  
boto                      2.49.0                   py37_0  
bottleneck                1.2.1            py37h035aef0_1  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2019.8.28                     0  
cairo                     1.14.12              h8948797_3  
certifi                   2019.9.11                py37_0  
cffi                      1.12.3           py37h2e261b9_0  
chardet                   3.0.4                 py37_1003  
click                     7.0                      py37_0  
cloudpickle               1.2.2                      py_0  
clyent                    1.2.2                    py37_1  
colorama                  0.4.1                    py37_0  
conda                     4.7.12                   py37_0  
conda-build               3.18.9                   py37_3  
conda-env                 2.6.0                         1  
conda-package-handling    1.6.0            py37h7b6447c_0  
conda-verify              3.4.2                      py_1  
contextlib2               0.6.0                      py_0  
cryptography              2.7              py37h1ba5d50_0  
curl                      7.65.3               hbc83047_0  
cycler                    0.10.0                   py37_0  
cython                    0.29.13          py37he6710b0_0  
cytoolz                   0.10.0           py37h7b6447c_0  
dask                      2.5.0                      py_0  
dask-core                 2.5.0                      py_0  
dbus                      1.13.6               h746ee38_0  
decorator                 4.4.0                    py37_1  
defusedxml                0.6.0                      py_0  
distributed               2.5.1                      py_0  
docutils                  0.15.2                   py37_0  
entrypoints               0.3                      py37_0  
et_xmlfile                1.0.1                    py37_0  
expat                     2.2.6                he6710b0_0  
fastcache                 1.1.0            py37h7b6447c_0  
filelock                  3.0.12                     py_0  
flask                     1.1.1                      py_0  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_1  
fribidi                   1.0.5                h7b6447c_0  
fsspec                    0.5.1                      py_0  
future                    0.17.1                   py37_0  
get_terminal_size         1.0.0                haa9412d_0  
gevent                    1.4.0            py37h7b6447c_0  
glib                      2.56.2               hd408876_0  
glob2                     0.7                        py_0  
gmp                       6.1.2                h6c8ec71_1  
gmpy2                     2.0.8            py37h10f8cd9_2  
graphite2                 1.3.13               h23475e2_0  
greenlet                  0.4.15           py37h7b6447c_0  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
h5py                      2.9.0            py37h7918eee_0  
harfbuzz                  1.8.8                hffaf4a1_0  
hdf5                      1.10.4               hb1b8bf9_0  
heapdict                  1.0.1                      py_0  
html5lib                  1.0.1                    py37_0  
icu                       58.2                 h9c2bf20_1  
idna                      2.8                      py37_0  
imageio                   2.5.0                    py37_0  
imagesize                 1.1.0                    py37_0  
importlib_metadata        0.23                     py37_0  
intel-openmp              2019.4                      243  
ipykernel                 5.1.2            py37h39e3cac_0  
ipython                   7.8.0            py37h39e3cac_0  
ipython_genutils          0.2.0                    py37_0  
ipywidgets                7.5.1                      py_0  
isort                     4.3.21                   py37_0  
itsdangerous              1.1.0                    py37_0  
jbig                      2.1                  hdba287a_0  
jdcal                     1.4.1                      py_0  
jedi                      0.15.1                   py37_0  
jeepney                   0.4.1                      py_0  
jinja2                    2.10.1                   py37_0  
joblib                    0.13.2                   py37_0  
jpeg                      9b                   h024ee3a_2  
json5                     0.8.5                      py_0  
jsonschema                3.0.2                    py37_0  
jupyter                   1.0.0                    py37_7  
jupyter_client            5.3.3                    py37_1  
jupyter_console           6.0.0                    py37_0  
jupyter_core              4.5.0                      py_0  
jupyterlab                1.1.4              pyhf63ae98_0  
jupyterlab_server         1.0.6                      py_0  
keyring                   18.0.0                   py37_0  
kiwisolver                1.1.0            py37he6710b0_0  
krb5                      1.16.1               h173b8e3_7  
lazy-object-proxy         1.4.2            py37h7b6447c_0  
libarchive                3.3.3                h5d8350f_5  
libcurl                   7.65.3               h20c2e04_0  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
liblief                   0.9.0                h7725739_2  
libpng                    1.6.37               hbc83047_0  
libsodium                 1.0.16               h1bed415_0  
libssh2                   1.8.2                h1ba5d50_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.0.10               h2733197_2  
libtool                   2.4.6                h7b6447c_5  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.9                hea5a465_1  
libxslt                   1.1.33               h7d1a2b0_0  
llvmlite                  0.29.0           py37hd408876_0  
locket                    0.2.0                    py37_1  
lxml                      4.4.1            py37hefd8a0e_0  
lz4-c                     1.8.1.2              h14c3975_0  
lzo                       2.10                 h49e0be7_2  
markupsafe                1.1.1            py37h7b6447c_0  
matplotlib                3.1.1            py37h5429711_0  
mccabe                    0.6.1                    py37_1  
mistune                   0.8.4            py37h7b6447c_0  
mkl                       2019.4                      243  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.0.14           py37ha843d7b_0  
mkl_random                1.1.0            py37hd6b4f25_0  
mock                      3.0.5                    py37_0  
more-itertools            7.2.0                    py37_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.1                hdf1c602_3  
mpmath                    1.1.0                    py37_0  
msgpack-python            0.6.1            py37hfd86e86_1  
multipledispatch          0.6.0                    py37_0  
navigator-updater         0.2.1                    py37_0  
nbconvert                 5.6.0                    py37_1  
nbformat                  4.4.0                    py37_0  
ncurses                   6.1                  he6710b0_1  
networkx                  2.3                        py_0  
nltk                      3.4.5                    py37_0  
nose                      1.3.7                    py37_2  
notebook                  6.0.1                    py37_0  
numba                     0.44.1           py37h962f231_0  
numexpr                   2.7.0            py37h9e4a6bb_0  
numpy                     1.17.2           py37haad9e8e_0  
numpy-base                1.17.2           py37hde5b4d6_0  
numpydoc                  0.9.1                      py_0  
olefile                   0.46                     py37_0  
openpyxl                  3.0.0                      py_0  
openssl                   1.1.1d               h7b6447c_2  
packaging                 19.2                       py_0  
pandas                    0.25.1           py37he6710b0_0  
pandoc                    2.2.3.2                       0  
pandocfilters             1.4.2                    py37_1  
pango                     1.42.4               h049681c_0  
parso                     0.5.1                      py_0  
partd                     1.0.0                      py_0  
patchelf                  0.9                  he6710b0_3  
path.py                   12.0.1                     py_0  
pathlib2                  2.3.5                    py37_0  
patsy                     0.5.1                    py37_0  
pcre                      8.43                 he6710b0_0  
pep8                      1.7.1                    py37_0  
pexpect                   4.7.0                    py37_0  
pickleshare               0.7.5                    py37_0  
pillow                    6.1.0            py37h34e0f95_0  
pip                       19.2.3                   py37_0  
pixman                    0.38.0               h7b6447c_0  
pkginfo                   1.5.0.1                  py37_0  
pluggy                    0.13.0                   py37_0  
ply                       3.11                     py37_0  
prometheus_client         0.7.1                      py_0  
prompt_toolkit            2.0.9                    py37_0  
psutil                    5.6.3            py37h7b6447c_0  
ptyprocess                0.6.0                    py37_0  
py                        1.8.0                    py37_0  
py-lief                   0.9.0            py37h7725739_2  
pycodestyle               2.5.0                    py37_0  
pycosat                   0.6.3            py37h14c3975_0  
pycparser                 2.19                     py37_0  
pycrypto                  2.6.1            py37h14c3975_9  
pycurl                    7.43.0.3         py37h1ba5d50_0  
pyflakes                  2.1.1                    py37_0  
pygments                  2.4.2                      py_0  
pylint                    2.4.2                    py37_0  
pyodbc                    4.0.27           py37he6710b0_0  
pyopenssl                 19.0.0                   py37_0  
pyparsing                 2.4.2                      py_0  
pyqt                      5.9.2            py37h05f1152_2  
pyrsistent                0.15.4           py37h7b6447c_0  
pysocks                   1.7.1                    py37_0  
pytables                  3.5.2            py37h71ec239_1  
pytest                    5.2.0                    py37_0  
pytest-arraydiff          0.3              py37h39e3cac_0  
pytest-astropy            0.5.0                    py37_0  
pytest-doctestplus        0.4.0                      py_0  
pytest-openfiles          0.4.0                      py_0  
pytest-remotedata         0.3.2                    py37_0  
python                    3.7.3                h0371630_0  
python-dateutil           2.8.0                    py37_0  
python-libarchive-c       2.8                     py37_13  
pytz                      2019.2                     py_0  
pywavelets                1.0.3            py37hdd07704_1  
pyyaml                    5.1.2            py37h7b6447c_0  
pyzmq                     18.1.0           py37he6710b0_0  
qt                        5.9.7                h5867ecd_1  
qtawesome                 0.6.0                      py_0  
qtconsole                 4.5.5                      py_0  
qtpy                      1.9.0                      py_0  
readline                  7.0                  h7b6447c_5  
requests                  2.22.0                   py37_0  
ripgrep                   0.10.0               hc07d326_0  
rope                      0.14.0                     py_0  
ruamel_yaml               0.15.46          py37h14c3975_0  
scikit-image              0.15.0           py37he6710b0_0  
scikit-learn              0.21.3           py37hd81dba3_0  
scipy                     1.3.1            py37h7c811a0_0  
seaborn                   0.9.0                    py37_0  
secretstorage             3.1.1                    py37_0  
send2trash                1.5.0                    py37_0  
setuptools                41.2.0                   py37_0  
simplegeneric             0.8.1                    py37_2  
singledispatch            3.4.0.3                  py37_0  
sip                       4.19.8           py37hf484d3e_0  
six                       1.12.0                   py37_0  
snappy                    1.1.7                hbae5bb6_3  
snowballstemmer           1.9.1                      py_0  
sortedcollections         1.1.2                    py37_0  
sortedcontainers          2.1.0                    py37_0  
soupsieve                 1.9.3                    py37_0  
sphinx                    2.2.0                      py_0  
sphinxcontrib             1.0                      py37_1  
sphinxcontrib-applehelp   1.0.1                      py_0  
sphinxcontrib-devhelp     1.0.1                      py_0  
sphinxcontrib-htmlhelp    1.0.2                      py_0  
sphinxcontrib-jsmath      1.0.1                      py_0  
sphinxcontrib-qthelp      1.0.2                      py_0  
sphinxcontrib-serializinghtml 1.1.3                      py_0  
sphinxcontrib-websupport  1.1.2                      py_0  
spyder                    3.3.6                    py37_0  
spyder-kernels            0.5.2                    py37_0  
sqlalchemy                1.3.8            py37h7b6447c_0  
sqlite                    3.29.0               h7b6447c_0  
statsmodels               0.10.1           py37hdd07704_0  
sympy                     1.4                      py37_0  
tbb                       2019.4               hfd86e86_0  
tblib                     1.4.0                      py_0  
terminado                 0.8.2                    py37_0  
testpath                  0.4.2                    py37_0  
tk                        8.6.8                hbc83047_0  
toolz                     0.10.0                     py_0  
tornado                   6.0.3            py37h7b6447c_0  
tqdm                      4.36.1                     py_0  
traitlets                 4.3.2                    py37_0  
unicodecsv                0.14.1                   py37_0  
unixodbc                  2.3.7                h14c3975_0  
urllib3                   1.24.2                   py37_0  
wcwidth                   0.1.7                    py37_0  
webencodings              0.5.1                    py37_1  
werkzeug                  0.16.0                     py_0  
wheel                     0.33.6                   py37_0  
widgetsnbextension        3.5.1                    py37_0  
wrapt                     1.11.2           py37h7b6447c_0  
wurlitzer                 1.0.3                    py37_0  
xlrd                      1.2.0                    py37_0  
xlsxwriter                1.2.1                      py_0  
xlwt                      1.3.0                    py37_0  
xz                        5.2.4                h14c3975_4  
yaml                      0.1.7                had09818_2  
zeromq                    4.3.1                he6710b0_3  
zict                      1.0.0                      py_0  
zipp                      0.6.0                      py_0  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

No errors reported.


__Warning log__
Warning (cuda): CUDA device intialisation problem. Message:Error at driver init: 
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init: 
NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path.  Note it must be a filepath of the .so/.dll/.dylib or the driver:
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================


The following line looks incorrect:

numba                     0.44.1           py37h962f231_0

then again the following seems to do the write thing in the venv

$ python
Python 3.8.3 (default, May 14 2020, 20:11:43) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numba
>>> numba.__version__
'0.50.1'
>>> 

@sjperkins
Copy link
Contributor

Ah wait, I think numba -s is picking up my anaconda install, which I don't use at all!

@JSKenyon
Copy link

I am likely revealing my ignorance here, but why are there so many threads when I have (in this case) set numba.config.NUMBA_NUM_THREADS = 2?

@asmeurer
Copy link
Contributor

That variable just mirrors the NUMBA_NUM_THREADS environment variable. Changing it in Python does nothing because by then the threads have already been launched. If you want to change the number in Python after numba has been imported, you have to use set_num_threads(). See https://numba.pydata.org/numba-doc/dev/user/threading-layer.html#setting-the-number-of-threads.

@stuartarchibald
Copy link
Contributor

hmm, so workqueue is broken the same way

@stuartarchibald
Copy link
Contributor

@JSKenyon as you've got the Anaconda distro somewhere, any chance you could try out a conda package of 0.50.1 with same python as the virtual env one please? Wondering if it's across all builds or specific to wheels.

@JSKenyon
Copy link

JSKenyon commented Jul 14, 2020 via email

@JSKenyon
Copy link

I can reproduce the bug using conda 4.8.3, Python 3.6.9 and Numba 0.50.1. It is really weird to me, as the root cause seems to be invoking numba.config.NUMBA_NUM_THREADS = 2, where 2 < total number of physical + virtual cores. I know @asmeurer points out that set_num_threads() is the correct (but far newer) way of doing this, but previously this invocation wouldn't cause random segfaults, even if it wasn't doing what I anticipated.

@stuartarchibald
Copy link
Contributor

Thanks for checking. Yes, this is strange, I'm somewhat convinced there's something else going on, it's just finding out what! Just to double check, the "bug" is the code from here: #5890 (comment) ? How many cores does your machine have?

@stuartarchibald
Copy link
Contributor

Also, any chance you can activate the conda env and do conda list --export please and paste the output? I'm still trying to work out how to reproduce this problem. Also, this might be better moved to gitter.im to debug in real time? What do you think? Thanks.

@JSKenyon
Copy link

Thanks for checking. Yes, this is strange, I'm somewhat convinced there's something else going on, it's just finding out what! Just to double check, the "bug" is the code from here: #5890 (comment) ? How many cores does your machine have?

Yes, that is the reproducer. I have an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz which has 6 physical cores, 2 threads per core.

@JSKenyon
Copy link

Also, any chance you can activate the conda env and do conda list --export please and paste the output? I'm still trying to work out how to reproduce this problem. Also, this might be better moved to gitter.im to debug in real time? What do you think? Thanks.

I think I have messaged you directly on gitter - I haven't used it all that much. Just pasting the conda list --export results here for posterity.

conda list --export

_libgcc_mutex=0.1=main
blas=1.0=mkl
ca-certificates=2020.6.24=0
certifi=2020.6.20=py36_0
intel-openmp=2020.1=217
libedit=3.1.20191231=h14c3975_1
libffi=3.2.1=hd88cf55_4
libgcc-ng=9.1.0=hdf63c60_0
libgfortran-ng=7.3.0=hdf63c60_0
libllvm9=9.0.1=h4a3c616_1
libstdcxx-ng=9.1.0=hdf63c60_0
llvmlite=0.33.0=py36hc6ec683_1
mkl=2020.1=217
mkl-service=2.3.0=py36he904b0f_0
mkl_fft=1.1.0=py36h23d657b_0
mkl_random=1.1.1=py36h0573a6f_0
ncurses=6.2=he6710b0_1
numba=0.50.1=py36h0573a6f_1
numpy=1.18.5=py36ha1c710e_0
numpy-base=1.18.5=py36hde5b4d6_0
openssl=1.1.1g=h7b6447c_0
pip=20.1.1=py36_1
python=3.6.9=h265db76_0
readline=7.0=h7b6447c_5
setuptools=47.3.1=py36_0
six=1.15.0=py_0
sqlite=3.32.3=h62c20be_0
tbb=2020.0=hfd86e86_0
tk=8.6.10=hbc83047_0
wheel=0.34.2=py36_0
xz=5.2.5=h7b6447c_0
zlib=1.2.11=h7b6447c_3

@stuartarchibald
Copy link
Contributor

Thanks for that and for making contact on gitter.im, have found a way to reproduce and I've an idea what's causing this now.

@stuartarchibald stuartarchibald added the threading Issue involving the threading layers label Jul 15, 2020
@asmeurer
Copy link
Contributor

To be clear, numba.config.NUMBA_NUM_THREADS = 2 does nothing if numba has already launched threads (this was the case even before set_num_threads was added). Maybe see if numba.np.ufunc.parallel._is_initialized is True or not before setting it?

@stuartarchibald
Copy link
Contributor

I think the problem here is as follows... using this code as an example:

import numpy as np
import numba
from numba import njit
import numpy as np


@numba.njit(parallel=True, debug=True)
def f(x):
    x[:] = 1
    return


if __name__ == '__main__':
    numba.config.NUMBA_NUM_THREADS = 2
    f(np.ones(100))

    from numba import threading_layer
    print(threading_layer())

When this script is run the following sequence occurs:

  1. Near the top, import numba, via it's __init__,

    numba/numba/__init__.py

    Lines 38 to 39 in b4badb5

    from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
    get_num_threads, set_num_threads)

    imports vectorize which goes via numba.np.ufunc.__init__
    from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize

    which has the side effect of this import too:
    from numba.np.ufunc.parallel import (threading_layer, get_num_threads,
    set_num_threads, _get_thread_id)
  2. As a result of 1. and numba.np.ufunc.parallel being imported as part of numba.__init__ this module global is evaluated:
    NUM_THREADS = get_thread_count()
    the result being that NUM_THREADS is e.g. 4 for a 4 core machine.
  3. Python continues with the script and runs if __name__ == "__main__", this making a call first to set numba.config.NUMBA_NUM_THREADS=2 and then to call the @numba.njit(parallel=True, debug=True) decorated function f.
  4. As part of the compilation of f, _launch_threads is called to start the actual thread pool, this is done here
    _launch_threads()
    and is set with the value NUM_THREADS from above here:
    launch_threads(NUM_THREADS)
    as a result there's a threadpool of size e.g. 4. then _load_num_threads_funcs() is called here:
    _load_num_threads_funcs(lib) # load late

    which then calls the backend specific _set_num_threads function such that the main thread has NUM_THREADS as the number of threads in the pool in its TLS slot, here:
    _set_num_threads(NUM_THREADS)

    and here (for OpenMP):
    static void
    set_num_threads(int count)
    {
    _TLS_num_threads = count;
    }
  5. Further on in the compilation of f, the parfors lowering queries the python function numba.np.ufunc.parallel.get_thread_count from here
    sched_size = get_thread_count() * num_dim * 2
    and this function in turn looks like:
    def get_thread_count():
    """
    Gets the available thread count.
    """
    t = config.NUMBA_NUM_THREADS
    if t < 1:
    raise ValueError("Number of threads specified must be > 0.")
    return t
    which results in the sched_size ending up based on the value 2 as it's read from the numba.config variable. However, later, when the memory allocated to the sched_size size is used at run time in a call to do_scheduling:
    builder.call(
    do_scheduling, [
    context.get_constant(
    types.uintp, num_dim), dim_starts, dim_stops, num_threads,
    sched, context.get_constant(
    types.intp, debug_flag)])
    the number of threads used also comes from a call made at runtime from here:
    num_threads = builder.call(get_num_threads, [])

    the value of which at runtime is e.g. 4 as it's from the TLS slot in the threading backend, e.g. for OpenMP:
    static int
    get_num_threads(void)
    {
    if (_TLS_num_threads == 0)
    {
    // This is a thread that did not call launch_threads() but is still a
    // "main" thread, probably from e.g. threading.Thread() use, it still
    // has a TLS slot which is 0 from the lack of launch_threads() call
    _TLS_num_threads = _INIT_NUM_THREADS;
    }
    return _TLS_num_threads;
    }
  6. The result of all this is a schedule based on size 2 is baked in at compiled time and a thread count of size e.g. 4 is present at runtime, this result in invalid access, which subsequently, assuming I've got this right, is probably the cause of the somewhat hard to trace segfault.

@asmeurer
Copy link
Contributor

So you should make it so that certain config options like NUMBA_NUM_THREADS are read-only. In my branch that I started at https://github.com/asmeurer/numba/tree/config-cleanup, I already made this happen for reload, but didn't consider someone writing to it directly. I guess it should use a @Property that disables the setter (_readenv could return a descriptor object).

Feel free to continue my work from that branch. I probably won't have time to work on it further myself. I also indicated some other TODOs that I saw in the commit message asmeurer@df0cb4d.

@asmeurer
Copy link
Contributor

asmeurer commented Jul 21, 2020

Alternately, make it only read-only before threads are launched, and allow it before but make sure it properly updates the variable in the threading backend. That's a lot more work, but it would allow this use-case. In my branch I allow it to be reset before threads are launched, but apparently even that is broken.

Although I'm not sure I would see the advantage of this extra work, since it definitely isn't going to work before threads are launched, which tends to happen pretty early. So there's little difference from just doing

os.environ['NUMBA_NUM_THREADS'] = 2
import numba

And anyway, is there a good reason to not just use set_num_threads() if you are setting the number of threads lower than the default?

@asmeurer
Copy link
Contributor

So there's little difference from just doing

I guess a difference is that

numba.config.NUMBA_NUM_THREADS = 2`

can potentially give you an error message if it isn't actually going to work. Setting the environment variable will just silently do nothing is numba is already imported. More correct code would be

if 'numba' not in sys.modules:
    os.environ['NUMBA_NUM_THREADS'] = 2
else:
    raise RuntimeError("numba is already imported")

And anyway setting environment variables should really only be done out of process, not from within Python (but I've seen people do it nonetheless).

@JSKenyon
Copy link

@asmeurer I think that at the time I wrote the offending code, I was unsure of another way to set the number of threads at runtime without forcing the user to set the environment variable themselves. Perhaps this has never behaved quite as I expected. I will move to using set_num_threads() from now on.

Thanks for tracking this down @stuartarchibald!

bmerry added a commit to ska-sa/katsdpcal that referenced this issue Mar 12, 2021
The previous way is a foot-gun, as detailed
[here](numba/numba#5890 (comment)).
@joseph-long
Copy link

Enough of the keywords in this issue line up with things going on in my debugging that I thought I'd chime in (and watch for further updates). I have been using @njit(parallel=True, cache=True) on a function and the test case intermittently fails by hanging the pytest process so I can't interrupt it and have to kill it. Triggering recompilation of the njited function returns it to working.

I've been unable to track down the root cause but @stuartarchibald's analysis above was comprehensive (seriously impressive!) and it seems likely the "wrong" value is getting baked in somewhere in my case. As a workaround I'm not using cache=True on those functions and I'm no longer calling numba.set_num_threads at all.

Is #6025 the best hope for resolving this on macOS?

@jeffspence
Copy link

@joseph-long -- thanks for this, I was running into segfaults with numba, and had no idea why. But based on this, I realized it might be having both cache=True and parallel=True in my @njit's (compiling and caching on a head node on a server (presumably with fewer cores) and then running on compute nodes caused the segfault). Switching off caching fixed it for me. All of this was on CentOS.

@stuartarchibald
Copy link
Contributor

This was fixed in #7625, which will be in the Numba 0.56 release (should be out this week). Demonstrative unit test:

@skip_parfors_unsupported
class TestParforsCacheChangingThreads(DispatcherCacheUsecasesTest):
# NOTE: This test is checking issue #7518, that thread counts are not
# baked into cached objects.
here = os.path.dirname(__file__)
usecases_file = os.path.join(here, "parfors_cache_usecases.py")
modname = "parfors_caching_test_fodder"
def run_in_separate_process(self, thread_count):
# Cached functions can be run from a distinct process.
code = """if 1:
import sys
sys.path.insert(0, %(tempdir)r)
mod = __import__(%(modname)r)
mod.self_run()
""" % dict(tempdir=self.tempdir, modname=self.modname)
new_env = {**os.environ, "NUMBA_NUM_THREADS" : str(thread_count)}
popen = subprocess.Popen([sys.executable, "-c", code],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
env=new_env)
out, err = popen.communicate()
if popen.returncode != 0:
raise AssertionError(f"process failed with code {popen.returncode}:"
f"stderr follows\n{err.decode()}\n")
def test_caching(self):
self.check_pycache(0)
self.run_in_separate_process(1)
self.check_pycache(3 * 2) # ran 3 functions, 2 entries each
self.run_in_separate_process(2)
self.check_pycache(3 * 2) # ran 3 functions, 2 entries each

Am pleased to be able to close this, thanks to everyone who helped with debugging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ParallelAccelerator threading Issue involving the threading layers
Projects
None yet
Development

No branches or pull requests

9 participants