-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: stack-overflow in OpenBLAS64 on macOS with embedding and importing numpy with pthreads #21799
Comments
It looks like still present bug. |
@Czaki what version of NumPy did you try? Could you try again with one of the nightly releases from https://anaconda.org/scientific-python-nightly-wheels/numpy ? |
Thanks for looking into this. Let me know if I can help. |
@mattip I have tested it with 1.24.3, 1.24.4, 1.25.0, 1.25.1 and crash on every one. It also works with numpy 1.25.1 from conda (numpy 1.25.1 py311hb8f3215_0). I'm able to reproduce this bug using my public project, so I could write reproduction instruction. I will be also happy to test any solution that may lead to stable solution when install from PyPi. |
The nightly is built with a newer OpenBLAS. It will be part of the next release. Maybe the conda forge build also uses a newer OpenBLAS (or Appple's Accelerate backend), I am not sure. |
When we could expect next release? |
Does that need a backport? |
@mattip There is a chance for bugfix release? using nightly no longer works as File "/Users/grzegorzbokota/.pyenv/versions/partseg-3.11/lib/python3.11/site-packages/scipy/linalg/_decomp.py", line 22, in <module>
from numpy import (array, isfinite, inexact, nonzero, iscomplexobj, cast,
ImportError: cannot import name 'cast' from 'numpy' (/Users/grzegorzbokota/.pyenv/versions/partseg-3.11/lib/python3.11/site-packages/numpy/__init__.py) |
That was commit 729d1f6, and |
I have installed 1.25.2 and still got:
So problem still exists. When install from source: |
@martin-frbg, thoughts? @Czaki you are loading NumPy in a worker thread and using address sanitizer? |
Any chance to see what the stack size is here, and perhaps to increase it ( |
@mattip I load numpy in main thread. Then I use thread for perform some calculation. It crash when calculate matrix inversion. I do not know if use address sanitizer. How to check it? @martin-frbg it report I do not know how long this problem exists. I just got my first Apple ARM computer. My code works without problem On x86 (Linux, Windows, MacOS) and x86 through Rosetta. My application is here: https://github.com/4DNucleome/PartSeg. It crash on simple start |
Are we sure that this is the actual problem this ticket was originally opened for, and not something else completely unrelated to stack size ? |
@martin-frbg You are right. I check my things, and I hit this error when running the code on Python build in debug mode. My current error is different and does not fit this Issue. I will open separately. |
Describe the issue:
I am embedding CPython and executing within threads. We are getting a "bus error" when importing numpy within a pthread function, but importing numpy within the "main thread" works. I've observed this both on intel and M1 macOS machines. I'm using numpy 1.22.4 installed with pip.
I compiled CPython 3.9.13 with the address sanitizer and found a stack overflow in dgetrf_parallel. Unfortunately I was not able to compile numpy my self with OpenBLAS to get the complete back trace. Compiling numpy without OpenBLAS runs without any errors. I have not compiled with the address sanitizer on Linux to see if I get the same error there, but I can if that would help.
I did look at the OpenBLAS source and there is this comment in lapack/getrf/getrf_parallel.c:
MAX_CPU_NUMBER is determined by the Makefiles based on the computer where OpenBLAS is compiled, and GETRF_MEM_ALLOC_THRESHOLD is defined as 80 on macOS in common.h. Maybe the solution here is to compile OpenBLAS so it always uses heap allocation in this routine?
Right now I am resorting to trying to import numpy in the main thread to avoid this issue. I've attached c code which reproduces the issue and the address sanitizer back trace information. Any advice on how to fix this would be greatly appreciated. Please let me know if there is any further information I can provide as well.
Reproduce the code example:
Error message:
NumPy/Python version information:
1.22.4 3.9.13 (main, Jun 14 2022, 22:13:49)
The text was updated successfully, but these errors were encountered: