-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Allocation Fault in LU factorization #7131
Comments
Assuming the data is real valued, could you please pass the same data directly to |
Yes, import numpy as np
from scipy import linalg
# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float64, order="F")
lu_, piv_, info_ = linalg.lapack.dgetrf(obj) Changing The last 3 calls before the fault in the stack traces in the new crash report are identical to the ones in the previous crash report. |
In case it helps, from the scikit-learn issue the problem was not happening within a conda environment using MKL, so it may be an OpenBLAS problem on OSX. |
Yeah this is a OSX-specific problem than only happens when using OpenBLAS. I have to say I am not sure how to further debug the problem and I am hoping someone here will have some suggestions. |
I am sure this has everything to do with Apple's accelerate, but I am unable to go any further. The following C++ code behaves identically to the python snippets above. #include <math.h>
// include the clapack headers
#include <Accelerate/Accelerate.h>
int main (int argc, char const *argv[])
{
int num_rows=46336, num_cols=110;
// create data array for our matrix. Will be a vector that is num_rows*num_cols elements
// to access A(r,c) use A[r+c*num_rows], in otherwords major column ordering. {r1c1,r2c1,r3c1...}
double *A = new double[num_rows*num_cols];
for(int row=0; row<num_rows; row++) {
for(int col=0; col<num_cols; col++) {
A[row+col*num_rows] = 1.0;
}
}
__CLPK_integer m, n, lda, info;
m=lda=num_rows;
n=num_cols;
__CLPK_integer * ipiv = new __CLPK_integer[(num_rows>num_cols?num_cols:num_rows)];
dgetrf_(&m, &n, A, &lda, ipiv, &info);
return 0;
/* g++ file.cpp -o file -framework accelerate */
} This should be compiled with |
Then it's likely an integer overflow inside Accelerate. Only Apple can
fix bugs in Accelerate, so it should be reported to them, and this issue
can be closed.
|
Closing (looks like accelerate bug). The discussion can be continued, but I guess the simplest fix is to not use accelerate. |
I was able to reproduce the original scikit-learn issue using OpenBLAS within an Anaconda environment. Was I somehow still using accelerate? |
Sorry, I understood your comment above as that it did not occur with non-accelerate environment. Nevertheless, you should try a similar minimal program as above. If it fails, then it's openblas/lapack issue. |
I assumed that the scipy wheels available on PyPI were built against OpenBLAS as is the case on Linux. I was wrong, on OSX the wheels are built against Accelerate, at least if I believe the output of I could run the Python snippet with OpenBLAS using conda-forge without error. All in all it seems like an Accelerate-specific problem. |
Yes, the OSX wheels are build with Accelerate. What version of OSX are you running? |
Thanks for confirming this, which I was not aware of. @ivannz has opened a bug with Apple and will let us know when he gets an answer. For completeness: @ivannz's version is different though, so it looks like it is not restricted to only one OSX version:
|
Same output on
and
and
but no such error on:
@ivannz - would you mind maintaining a copy of your report and any replies from Apple over at http://www.openradar.me ? |
Certainly! Here is the link to Open Radar rdar://30868533. |
Is it possible to build the macOS wheels without the Accelerate framework? I mean, are there any alternative libraries pre-installed such as OpenBLAS? Or are we really stuck with Apple's ancient and buggy numerical libraries (with no chance for a quick fix)? |
I'm hopeful that once the discussion in #6051 is finalized, we will be back on track with the freedom of choosing BLAS or LAPACK flavor. But it seems that we are stuck until v1.0 with Accelerate |
Has anyone tested if the issue still exists in the latest macOS High Sierra beta? I'm hoping that Apple has noticed the Accelerate bug and fixed it silently... |
Seems this was closed incorrectly, so reopening. The two examples don't crash for me on OS X 10.12 with scipy master built against Accelerate. Not sure if that's a change in scipy or in Accelerate. |
I just built numpy and scipy from their master branches, and the code given in the description above crashes on my late 2013 Macbook running OS X 10.9.5, with Python 3.5.2:
If I use numpy from Anaconda (numpy 1.12.1), I don't get the crash. Update: Now that I've done more than just skim the full thread, I guess this comment is old news. |
@rgommers the C++ example as well as all Python examples (using scipy master) still crash on macOS 10.12.6 (latest version released on July 19). If anyone has the preview of macOS 10.13 (High Sierra), it would be great to hear if the problem has been fixed by Apple. In any case, I suggest to add a test to scipy to test against this issue - WDYT? |
@cbrnr Could you also please add the result of |
It looks to me the diagnosis of the issue was correct, and was
reproduced with a standalone C program --- there clearly is a bug in
Accelerate, and the only way Scipy can fix it is by not using Accelerate.
|
@pv yes - unless Apple fixes the bug, which might have already happened in 10.13 Preview... |
@sturlamolden plain Anyway, this discussion is digressing too far from the main issue here. But it's clear that there are some major issues popping up here and there. I'm not sure this is helping in the decision how SciPy should proceed with official macOS builds. |
@sturlamolden @cbrnr I agree (but nb. gfortran from homebrew works fine on my machine... perhaps this points to something brittle in your setup?) |
I guess Apple hasn't fixed the issue. On the latest Sierra
I get the segfault for these new constants (just used the binary search from 46335 upward to find a new failure threshold): int num_rows=46344, num_cols=110; Changing 46343 to 46344 or any higher value also causes a segfault. Changing 110 to a higher value, while keeping I replicated this behaviour both with the standalone C code and the python example at the top of the thread. PS: There has been no activity around my bug report to Apple so far. |
Closing, there's nothing we can do aside from dropping Accelerate. |
I just upgraded to macOS High Sierra (10.13) and the bug seems to have been fixed by Apple. All examples above (both Python and C++) work! 🎉 |
Today I got an email from the developers, asking to test if the bug persists in High Sierra 10.13 GM.
Unfortunately, I won't be able to test this issue in the near future myself. @cbrnr Could I ask you, please, if it is not too inconvenient, to try increasing int num_rows=46344, num_cols=110; For example, could you try |
Downloading High Sierra now, still half an hour to go. :) |
Indeed, the bug seems to be fixed in High Sierra 😃 |
@ivannz @sturlamolden it works up to about |
Hm, I can confirm that raising |
And with MKL it actually works with these large matrices... So 👎 for this "fix"... |
@cbrnr how do you use MKL on macos sierra 10.12.6? I try to install it with Any ideas? |
I use it with Anaconda, but I'd be highly interested if it is possible to install it so that I can use MKL with my Homebrew Python. |
@ivannz and @matthew-brett the sample C code provided with this issue shows no problem on the Apple M1. The Apple Accelerate frameworks are able to leverage the dedicated AMX hardware on the M1. This can provide superior performance for some functions relative to openBLAS. Will SciPy and numpy ever reconsider using Accelerate? I think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track. However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon. |
I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in? |
If Apple is interested and addresses the issues outlined at https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate, then of course we can reconsider. We support MKL as well which is not open source, because Intel maintains it well and when there's an issue we have actual humans to talk to.
Accelerate being stuck on a too-low version of LAPACK (3.2.1 last time I checked) is also a real blocker. Our minimum is 3.4.1 at the moment I believe, and we may gradually increase that minimum. A lowest version that's from 2012 isn't too much to ask.
@lawrence-danna-apple given that you worked on M1 support for Python packaging, maybe you have an opinion on this topic or know who to ping? |
The main issues are:
The LAPACK version in Accelerate is not really an issue, as we can build LAPACK from Fortran sources on top of BLAS (like we do for OpenBLAS). Accelerate also uses f2c ABI, which is annoying, but not a complete blocker. |
While I wholeheartedly agree that M1 users should be able to join the number crunching community if accelerate is overhauled, however, I am not sure whether apple would bother with it since it doesn't just have blas/lapack but all kinds of extras in that framework that needs redeveloping. Since even LAPACK devs joined the fast GitHub track and started a much quicker release cycle, a turtle-speed and coldwar-responses won't cut it for us to enable a maintenance effort so it has many meta issues as well as technical ones. |
Given the divergence described with Accelerate, a simpler option might be for the Apple Developer Ecosystem Engineering team to provide patches to optimize OpenBLAS. This would parallel their support of open source projects like Embree and Clang/LLVM. Support could be added piecemeal and tested against the existing functional but not optimized code. I do think direct help from Apple would help, though some have begun to reverse engineer the AMX instructions. |
I honestly do not think Apple could care less about Accelerate or OpenBLAS. They want everyone to use Metal 2 for any serious number crunching. They have already deprecated OpenCL. Personally I am wainting for them to do away with Accelerate and GCD as well. Either we jump on the Metal bandwagon or we forget about Mac as a platform for serious number crunching and 3D graphics. That are the options we are given, and honestly it is an easy choice: just ignore Apple. I thought I would stick to Mac because that is what I like to use, but now I am considering to run Linux on my next desktop computer. It seems that is the way scientific computing is heading. |
The thing with Apple is, once we have fully embraced Metal, they will come up with something new and deprecate it. Then we are back to being pissed off. They like to piss off developers, and unlike the guys in Redmond they are quite good at it. At some point we just have to tell them they will be irrelevant if they continue to try to make themselves irrelevant. |
Well we can't ignore it, even if we're annoyed with Accelerate's (lack of) maintenance historically. We have enough users (and devs, like me) who use macOS that we want to provide both from-source build options and wheels with decent performance. Metal isn't really useful until there's a BLAS/LAPACK library for it. There's also Intel MKL on macOS - it's not open source so not good for wheels, but it's fine to develop with. We'll take what we can get. |
@sturlamolden as I note on my review Metal has several limitations for matrix mathematics: limited to single precision, limited to maximum dimension of 65535, requires dimensions divisible by 8 for decent performance. Apple AMX removes these limitations and outperforms the Metal MPSMatrixMultiplication. You can demonstrate this for yourself by compiling the projects here. If Apple really felt Metal was the preferred solution, why would they waste the die space to implement the matrix co-processor? Apple's invested the time to develop AMX and use it in their own libraries vecLib, vImage, libBLAS, libBNNS, vDSP and LAPACK rather than Metal. While I like Metal as a compute language for leveraging the GPU, matrix mathematics appears to clearly benefit from dedicated hardware. |
@sturlamolden given how much faster M1 performs it would be a shame to ignore Apple. A lot of people seem to be going back to Macs just because of this when they previously would have rather bought Dell/... running Linux or even Windows (WSL2). @rgommers Intel MKL won't work on M1 which is clearly the future for Macs. |
I don't have any strong opinion about Apple but I am leaning towards force Apple's hand instead of us (OSS in general) running around just to support their stuff. Unfortunately that's what you get if you want to pay for M1. Performance means nothing if nothing supports it. So it is a bit on the consumer to weigh the pros and cons. I would love @neurolabusc s vision materialize but have epsilon hopes about it. |
By the way should we get to a brand new issue instead of hiding inside an old linalg issue? On one hand I like that it is hidden here since it gets salty very quickly when a brand is discussed, but also it might need more eyes on it. |
The only reason I re-ignited this here is that the specific issue raised by this issue seems to be resolved with the M1 (or perhaps the latest versions of accelerate). Naively, I thought this was one of the few blockers to using the accelerate framework. There are clearly a lot more issues that I was not aware of. This is well outside my expertise. I do appreciate the insightful and rapid feedback from SciPy developers. This suggests a vibrant community. My sense is that this question dies here unless someone closely involved with either SciPy or Apple takes the lead. If anyone fits that criteria, I encourage them to start a new issue. |
glad to hear that. we try to create a good community and be as responsive as we can:)
Let me just link to numpy/numpy#18143, which is the immediate bigger issue here. NumPy needs to provide universal2 wheels (and |
This issue was encountered here in Scikit. In short, when trying to perform linear dimensionality reduction for data analysis purposes, the script fails abruptly with a memory allocation error.
After some debugging, I managed to pin down the possible source of the problem to the LU factorization on line line 258 in utils/extmath.py. The fault occurs inside the decomposition in
linalg.lu
(specifically in line 185 of scipy/linalg/decomp_lu.py): the execution flow passes through the f2py wrapper of dlu_c and fails somewhere insidedgetrf
(line 27).Below is a simple reproducing snippet (on OS X 10.10.5), although LU-decomposing a matrix such as this makes little sense:
In python 3.6.0 (and in 2.7.13) it fails with
The value
18446744065123717120
isFFFF FFFE 003E 9000
in hex looks like a signed extension of an unsigned int.I used the following setup: python 3.6.0 (default, Dec 24 2016, 08:02:28) (in virtualenv), default numpy ('1.12.0') from pip. I tested both the latest build from git and the pip version ('0.18.1') of scipy.
System information:
No manual linkage to specific linear algebra libraries was done, since according to the reference numpy/scipy automatically links with the Apple's Accelerate/vecLib linear algebra libraries.
Judging from the crash report form python 2.7.13 (crash_report.txt), something odd is going on inside Apple's Accelerate.
cc: @lesteve
The text was updated successfully, but these errors were encountered: