New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seemingly random segfault on macOS if function is in larger library #5890
Comments
I went back some versions for Numba. At 0.47 and below Im getting an error due to a dict, so I can't go back further. At 0.48 its working. There is a class definition in the library ( |
@max3-2 here are a few questions to help diagnose the problem:
|
Wonder if |
Adding this to the function in question or globally doesn't change the issue.
1 )Just using parallel setting on multicore local. Don't know the backend...MPI? Maybe this is connected to having multiple functions with parallel=True in one library getting compiled at the same time? Even if they are not used? |
This seems like a tricky issue and appears to be hard to debug. Here are some suggestions:
|
Ok let me correct this: bisect somehow starts to throw a different error, see below. This makes it loose the original I think. Finally, it points to
|
@max3-2 thanks for reporting back on this. I am not sure what to make of the error. Perhaps you can find a way to build some kind of a reproducer that is self-contained and that you can share? It would help reproducing and debugging this. |
Think this is possible. Ill try to get to it and report back...may take a day or two... |
@max3-2 no problem, take your time! |
This would be a fairly simple version which reproducibly, on my machine, runs on .48 and crashes from .49. https://gist.github.com/max3-2/ce88444a3cfd5d0a5f5525c66afba056 |
And on a different note: I had few Windows users test it, they (three) were able to reproduce on Win10 with py3.7 |
MWR: import numpy as np
import numba
numba.config.NUMBA_NUM_THREADS = 3
@numba.njit(parallel=True)
def foo(A, B, C, D):
# B is not referenced
# C is not referenced
# D is not referenced
E = np.zeros((A.shape[0], 4))
for i in numba.prange(A.shape[0]):
E[i] = np.random.random((4,))
return
if __name__ == '__main__':
m = 2
n = 15
A = np.zeros((100, m, n))
B = np.zeros(1)
C = 1.
D = np.zeros(1)
foo(A, B, C, D) CC @DrTodd13 seems like setting gdb bt:
|
Thanks for further reducing it. At least in my "final" code, no variables are unused, this was due to quickly reducing it to the core problem. |
This has just reared its head in one of my projects @stuartarchibald. Reducing your MWE even further, the following example fails on import numpy as np
import numba
numba.config.NUMBA_NUM_THREADS = 3
@numba.njit(parallel=True)
def f(x):
for i in numba.prange(x.size):
x[i] = x[i]
return
if __name__ == '__main__':
f(np.ones(100)) It doesn't seem to have anything to do with unused variables. Could this be a side effect of the thread masking introduced in 0.49? The second point in the docs here might be a hint. |
I can confirm that the following doesn't crash @stuartarchibald, @max3-2, @esc: import numpy as np
import numba
@numba.njit(parallel=True)
def f(x):
numba.set_num_threads(3)
for i in numba.prange(x.size):
x[i] = x[i]
return
if __name__ == '__main__':
f(np.ones(100)) |
Pinging @asmeurer in case he has some ideas. |
@JSKenyon thanks for looking at this more. RE: #5890 (comment) can't seem to reproduce locally on the released 0.50.0, RE:
possibly, but it'd be good to rule out other things too. |
Interesting that it doesn't reproduce the issue for you. I am in a clean virtualenv running Python 3.6.9 and numba 0.50.1. |
I've tried numerous conda envs and then also RHEL python3.6.8 virtual env with:
If you |
This is the only output I get from
The backtrace is empty. |
I am running Pop!_OS 18.04 LTS although a colleague of mine is reporting the segfaults on an Ubuntu 18.04 system with Python 3.6.9. In both cases we are using numba via pip, not conda. |
I can also reproduce this on
|
And
|
I wonder if this is a problem with the wheels? In my python 3.8 virtual environment, the following command $ python -m numba -s produces the following output: `python -m numba -s` output
The following line looks incorrect:
then again the following seems to do the write thing in the venv $ python
Python 3.8.3 (default, May 14 2020, 20:11:43)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numba
>>> numba.__version__
'0.50.1'
>>>
|
Ah wait, I think |
I am likely revealing my ignorance here, but why are there so many threads when I have (in this case) set |
That variable just mirrors the |
hmm, so workqueue is broken the same way |
@JSKenyon as you've got the Anaconda distro somewhere, any chance you could try out a conda package of 0.50.1 with same python as the virtual env one please? Wondering if it's across all builds or specific to wheels. |
Will give it a go first thing in the morning!
…Sent from my iPhone
On 14 Jul 2020, at 19:15, stuartarchibald ***@***.***> wrote:
@JSKenyon as you've got the Anaconda distro somewhere, any chance you could try out a conda package of 0.50.1 with same python as the virtual env one please? Wondering if it's across all builds or specific to wheels.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I can reproduce the bug using conda 4.8.3, Python 3.6.9 and Numba 0.50.1. It is really weird to me, as the root cause seems to be invoking |
Thanks for checking. Yes, this is strange, I'm somewhat convinced there's something else going on, it's just finding out what! Just to double check, the "bug" is the code from here: #5890 (comment) ? How many cores does your machine have? |
Also, any chance you can activate the conda env and do |
Yes, that is the reproducer. I have an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz which has 6 physical cores, 2 threads per core. |
I think I have messaged you directly on gitter - I haven't used it all that much. Just pasting the conda list --export
_libgcc_mutex=0.1=main |
Thanks for that and for making contact on |
To be clear, |
I think the problem here is as follows... using this code as an example: import numpy as np
import numba
from numba import njit
import numpy as np
@numba.njit(parallel=True, debug=True)
def f(x):
x[:] = 1
return
if __name__ == '__main__':
numba.config.NUMBA_NUM_THREADS = 2
f(np.ones(100))
from numba import threading_layer
print(threading_layer()) When this script is run the following sequence occurs:
|
So you should make it so that certain config options like Feel free to continue my work from that branch. I probably won't have time to work on it further myself. I also indicated some other TODOs that I saw in the commit message asmeurer@df0cb4d. |
Alternately, make it only read-only before threads are launched, and allow it before but make sure it properly updates the variable in the threading backend. That's a lot more work, but it would allow this use-case. In my branch I allow it to be reset before threads are launched, but apparently even that is broken. Although I'm not sure I would see the advantage of this extra work, since it definitely isn't going to work before threads are launched, which tends to happen pretty early. So there's little difference from just doing os.environ['NUMBA_NUM_THREADS'] = 2
import numba And anyway, is there a good reason to not just use |
I guess a difference is that numba.config.NUMBA_NUM_THREADS = 2` can potentially give you an error message if it isn't actually going to work. Setting the environment variable will just silently do nothing is numba is already imported. More correct code would be if 'numba' not in sys.modules:
os.environ['NUMBA_NUM_THREADS'] = 2
else:
raise RuntimeError("numba is already imported") And anyway setting environment variables should really only be done out of process, not from within Python (but I've seen people do it nonetheless). |
@asmeurer I think that at the time I wrote the offending code, I was unsure of another way to set the number of threads at runtime without forcing the user to set the environment variable themselves. Perhaps this has never behaved quite as I expected. I will move to using Thanks for tracking this down @stuartarchibald! |
The previous way is a foot-gun, as detailed [here](numba/numba#5890 (comment)).
Enough of the keywords in this issue line up with things going on in my debugging that I thought I'd chime in (and watch for further updates). I have been using I've been unable to track down the root cause but @stuartarchibald's analysis above was comprehensive (seriously impressive!) and it seems likely the "wrong" value is getting baked in somewhere in my case. As a workaround I'm not using Is #6025 the best hope for resolving this on macOS? |
@joseph-long -- thanks for this, I was running into segfaults with numba, and had no idea why. But based on this, I realized it might be having both |
This was fixed in #7625, which will be in the Numba 0.56 release (should be out this week). Demonstrative unit test: numba/numba/tests/test_parfors_caching.py Lines 50 to 83 in c58634a
Am pleased to be able to close this, thanks to everyone who helped with debugging! |
Reporting a bug
the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
Hi,
first of all, sorry for this small report and few examples, but at this point this seems to untraceable to me that im hoping for any input to trace down the issue. Maybe its even a severely stupid mistake by myself that I just can't find. I was getting segfaults from a Numba function and traced it down to the state I will outline here, but at this point I can't find anything anymore.
I have a function which operates on arrays. I have simplified it very far so I know its not making so much sense - but here it goes.
For debugging, Im using synthetic input data
Running this as a small script works. Running this from interactive works. However, I have a large library with Numba functions in which the one above is included. Just somewhere in there. Same syntax, copy & paste. If I then add the execution with the same synthetic input data after the library (compiling the full library including the above function), only calling the function as above
Im getting a segfault. No traceback, no nothing. In terminal its
zsh: segmentation fault
, Jupyter just hangs completely.This happens on macOS 10.15. A difference to mention would be that during compilation of the library, Im getting some warnings
Those products are not in the function or connected to the function that crashes!
At this point, Im happy for any type of input since I can't find a reason.
The text was updated successfully, but these errors were encountered: