Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Allocation Fault in LU factorization #7131

Closed
ivannz opened this issue Mar 6, 2017 · 110 comments
Closed

Memory Allocation Fault in LU factorization #7131

ivannz opened this issue Mar 6, 2017 · 110 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg

Comments

@ivannz
Copy link

ivannz commented Mar 6, 2017

This issue was encountered here in Scikit. In short, when trying to perform linear dimensionality reduction for data analysis purposes, the script fails abruptly with a memory allocation error.

After some debugging, I managed to pin down the possible source of the problem to the LU factorization on line line 258 in utils/extmath.py. The fault occurs inside the decomposition in linalg.lu (specifically in line 185 of scipy/linalg/decomp_lu.py): the execution flow passes through the f2py wrapper of dlu_c and fails somewhere inside dgetrf (line 27).

Below is a simple reproducing snippet (on OS X 10.10.5), although LU-decomposing a matrix such as this makes little sense:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float)
Q, _ = linalg.lu(obj, permute_l=True)

In python 3.6.0 (and in 2.7.13) it fails with

python(723,0x7fff77404300) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

The value 18446744065123717120 is FFFF FFFE 003E 9000 in hex looks like a signed extension of an unsigned int.

I used the following setup: python 3.6.0 (default, Dec 24 2016, 08:02:28) (in virtualenv), default numpy ('1.12.0') from pip. I tested both the latest build from git and the pip version ('0.18.1') of scipy.
System information:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

No manual linkage to specific linear algebra libraries was done, since according to the reference numpy/scipy automatically links with the Apple's Accelerate/vecLib linear algebra libraries.

Judging from the crash report form python 2.7.13 (crash_report.txt), something odd is going on inside Apple's Accelerate.

Python 2.7.13 (default, Dec 18 2016, 07:03:34) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin

cc: @lesteve

@ilayn
Copy link
Member

ilayn commented Mar 6, 2017

Assuming the data is real valued, could you please pass the same data directly to linalg.lapack.dgetrf?

@ivannz
Copy link
Author

ivannz commented Mar 6, 2017

Yes, np.ones(...) creates a matrix of real 1.0 values (double). This example fails with exactly the same fault:

import numpy as np

from scipy import linalg

# changing 46336 to 46335 avoids the fault
obj = np.ones((46336, 110), dtype=np.float64, order="F")
lu_, piv_, info_ = linalg.lapack.dgetrf(obj)

Changing np.float64 to np.float32 does not solve the issue.

The last 3 calls before the fault in the stack traces in the new crash report are identical to the ones in the previous crash report.

@lesteve
Copy link
Contributor

lesteve commented Mar 6, 2017

In case it helps, from the scikit-learn issue the problem was not happening within a conda environment using MKL, so it may be an OpenBLAS problem on OSX.

@ilayn
Copy link
Member

ilayn commented Mar 6, 2017

Indeed, on my machine I have no issues.

untitled

@lesteve
Copy link
Contributor

lesteve commented Mar 6, 2017

Yeah this is a OSX-specific problem than only happens when using OpenBLAS. I have to say I am not sure how to further debug the problem and I am hoping someone here will have some suggestions.

@ivannz
Copy link
Author

ivannz commented Mar 6, 2017

I am sure this has everything to do with Apple's accelerate, but I am unable to go any further. The following C++ code behaves identically to the python snippets above.

#include <math.h>

// include the clapack headers
#include <Accelerate/Accelerate.h>

int main (int argc, char const *argv[])
{

    int num_rows=46336, num_cols=110;

    // create data array for our matrix. Will be a vector that is num_rows*num_cols elements
    // to access A(r,c) use A[r+c*num_rows], in otherwords major column ordering. {r1c1,r2c1,r3c1...}
    double *A = new double[num_rows*num_cols];
    for(int row=0; row<num_rows; row++) {
        for(int col=0; col<num_cols; col++) {
            A[row+col*num_rows] = 1.0;
        }
    }

    __CLPK_integer m, n, lda, info;
    m=lda=num_rows;
    n=num_cols;

    __CLPK_integer * ipiv = new __CLPK_integer[(num_rows>num_cols?num_cols:num_rows)];

    dgetrf_(&m, &n, A, &lda, ipiv, &info);

    return 0;
/* g++ file.cpp -o file -framework accelerate */
}

This should be compiled with g++ file.cpp -o file -framework accelerate on a recent Mac. Changing 46336 to 46335 again removes the fault.
dgetrf on Netlib LAPACK

@pv
Copy link
Member

pv commented Mar 6, 2017 via email

@pv
Copy link
Member

pv commented Mar 6, 2017

Closing (looks like accelerate bug). The discussion can be continued, but I guess the simplest fix is to not use accelerate.

@lesteve
Copy link
Contributor

lesteve commented Mar 6, 2017

I guess the simplest fix is to not use accelerate.

I was able to reproduce the original scikit-learn issue using OpenBLAS within an Anaconda environment. Was I somehow still using accelerate?

@pv
Copy link
Member

pv commented Mar 6, 2017

Sorry, I understood your comment above as that it did not occur with non-accelerate environment.
You can see where the problem is by running it under debugger and seeing whether the backtrace points to openblas or accelerate.

Nevertheless, you should try a similar minimal program as above. If it fails, then it's openblas/lapack issue.

@lesteve
Copy link
Contributor

lesteve commented Mar 6, 2017

I assumed that the scipy wheels available on PyPI were built against OpenBLAS as is the case on Linux. I was wrong, on OSX the wheels are built against Accelerate, at least if I believe the output of otool -L on scipy/linalg/_fblas*.so. Pinging @matthew-brett just to confirm this.

I could run the Python snippet with OpenBLAS using conda-forge without error.

All in all it seems like an Accelerate-specific problem.

@matthew-brett
Copy link
Contributor

Yes, the OSX wheels are build with Accelerate. What version of OSX are you running?

@lesteve
Copy link
Contributor

lesteve commented Mar 6, 2017

Yes, the OSX wheels are build with Accelerate

Thanks for confirming this, which I was not aware of.

@ivannz has opened a bug with Apple and will let us know when he gets an answer.

For completeness:
The version of OSX I am using: OSX 10.12.1, Darwin 16.1.0.

@ivannz's version is different though, so it looks like it is not restricted to only one OSX version:

System Version: OS X 10.10.5 (14F2109)
Kernel Version: Darwin 14.5.0

@matthew-brett
Copy link
Contributor

Same output on

System Version: OS X 10.9.5 (13F1911)
Kernel Version: Darwin 13.4.0

and

      System Version: OS X 10.11.5 (15F34)
      Kernel Version: Darwin 15.5.0

and

      System Version: OS X 10.11.6 (15G1217)
      Kernel Version: Darwin 15.6.0

but no such error on:

      System Version: Mac OS X 10.6.8 (10K549)
      Kernel Version: Darwin 10.8.0

@ivannz - would you mind maintaining a copy of your report and any replies from Apple over at http://www.openradar.me ?

@ivannz
Copy link
Author

ivannz commented Mar 7, 2017

Certainly! Here is the link to Open Radar rdar://30868533.

@cbrnr
Copy link
Contributor

cbrnr commented May 24, 2017

Is it possible to build the macOS wheels without the Accelerate framework? I mean, are there any alternative libraries pre-installed such as OpenBLAS? Or are we really stuck with Apple's ancient and buggy numerical libraries (with no chance for a quick fix)?

@ilayn
Copy link
Member

ilayn commented May 24, 2017

I'm hopeful that once the discussion in #6051 is finalized, we will be back on track with the freedom of choosing BLAS or LAPACK flavor. But it seems that we are stuck until v1.0 with Accelerate

@cbrnr
Copy link
Contributor

cbrnr commented Jul 20, 2017

Has anyone tested if the issue still exists in the latest macOS High Sierra beta? I'm hoping that Apple has noticed the Accelerate bug and fixed it silently...

@rgommers rgommers reopened this Jul 21, 2017
@rgommers
Copy link
Member

Seems this was closed incorrectly, so reopening.

The two examples don't crash for me on OS X 10.12 with scipy master built against Accelerate. Not sure if that's a change in scipy or in Accelerate.

@rgommers rgommers added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg labels Jul 21, 2017
@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Jul 21, 2017

I just built numpy and scipy from their master branches, and the code given in the description above crashes on my late 2013 Macbook running OS X 10.9.5, with Python 3.5.2:

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.14.0.dev0+9afb77a'

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '1.0.0.dev0+85287f9'

In [5]: from scipy import linalg

In [6]: obj = np.ones((46336, 110), dtype=np.float)

In [7]: Q, _ = linalg.lu(obj, permute_l=True)
python(15120,0x7fff74a9d310) malloc: *** mach_vm_map(size=18446744065123717120) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Segmentation fault: 11

If I use numpy from Anaconda (numpy 1.12.1), I don't get the crash.


Update: Now that I've done more than just skim the full thread, I guess this comment is old news.

@cbrnr
Copy link
Contributor

cbrnr commented Jul 21, 2017

@rgommers the C++ example as well as all Python examples (using scipy master) still crash on macOS 10.12.6 (latest version released on July 19). If anyone has the preview of macOS 10.13 (High Sierra), it would be great to hear if the problem has been fixed by Apple.

In any case, I suggest to add a test to scipy to test against this issue - WDYT?

@ilayn
Copy link
Member

ilayn commented Jul 21, 2017

@cbrnr Could you also please add the result of scipy.linalg.lapack.ilaver()

@pv
Copy link
Member

pv commented Jul 21, 2017 via email

@cbrnr
Copy link
Contributor

cbrnr commented Jul 21, 2017

@pv yes - unless Apple fixes the bug, which might have already happened in 10.13 Preview...

@cbrnr
Copy link
Contributor

cbrnr commented Jul 25, 2017

@sturlamolden plain -m? This flag does not exist on Linux either.

Anyway, this discussion is digressing too far from the main issue here. But it's clear that there are some major issues popping up here and there. I'm not sure this is helping in the decision how SciPy should proceed with official macOS builds.

@defjaf
Copy link

defjaf commented Jul 25, 2017

@sturlamolden @cbrnr I agree (but nb. gfortran from homebrew works fine on my machine... perhaps this points to something brittle in your setup?)

@ivannz
Copy link
Author

ivannz commented Aug 30, 2017

I guess Apple hasn't fixed the issue. On the latest Sierra

      System Version: macOS 10.12.6 (16G29)
      Kernel Version: Darwin 16.7.0

I get the segfault for these new constants (just used the binary search from 46335 upward to find a new failure threshold):

    int num_rows=46344, num_cols=110;

Changing 46343 to 46344 or any higher value also causes a segfault. Changing 110 to a higher value, while keeping num_rows = 46343 does not cause a segfault.

I replicated this behaviour both with the standalone C code and the python example at the top of the thread.

PS: There has been no activity around my bug report to Apple so far.

@pv
Copy link
Member

pv commented Aug 30, 2017

Closing, there's nothing we can do aside from dropping Accelerate.

@pv pv closed this as completed Aug 30, 2017
@cbrnr
Copy link
Contributor

cbrnr commented Sep 26, 2017

I just upgraded to macOS High Sierra (10.13) and the bug seems to have been fixed by Apple. All examples above (both Python and C++) work! 🎉

@ivannz
Copy link
Author

ivannz commented Sep 26, 2017

Today I got an email from the developers, asking to test if the bug persists in High Sierra 10.13 GM.

This is a follow-up regarding regarding Bug ID# 30868533.

Please verify this issue with the macOS 10.13 GM and update your bug report at https://bugreport.apple.com/ with your results.

macOS High Sierra 10.13 GM (17A365)
https://developer.apple.com/download/
Posted Date: Sep 25, 2017

If the issue persists, please attach a new sysdiagnose captured in the latest build and attach it to the bug report.

macOS sysdiagnose Instructions:
https://developer.apple.com/services-account/download?path=/OS_X/OS_X_Logs/sysdiagnose_Logging_Instructions.pdf

Unfortunately, I won't be able to test this issue in the near future myself.

@cbrnr Could I ask you, please, if it is not too inconvenient, to try increasing
the num_rows variable here

int num_rows=46344, num_cols=110;

For example, could you try 60000 or higher and if it ever fails use, perhaps,
binary search to pinpoint the threshold of failure? Again, I apologise for not
being able to do this test myself.

@sturlamolden
Copy link
Contributor

Downloading High Sierra now, still half an hour to go. :)

@pv pv added the no action label Sep 26, 2017
@sturlamolden
Copy link
Contributor

Indeed, the bug seems to be fixed in High Sierra 😃

@cbrnr
Copy link
Contributor

cbrnr commented Sep 27, 2017

@ivannz @sturlamolden it works up to about num_rows=5925000, but a little higher and it crashes due to a malloc error. I think this is expected on my 16GB machine, right?

@sturlamolden
Copy link
Contributor

Hm, I can confirm that raising num_rows triggers the error again. At this point the data array is a little less than 5 GB.

@cbrnr
Copy link
Contributor

cbrnr commented Sep 27, 2017

And with MKL it actually works with these large matrices... So 👎 for this "fix"...

@emilioramirez
Copy link

emilioramirez commented Nov 30, 2017

@cbrnr how do you use MKL on macos sierra 10.12.6? I try to install it with pip install numpy-mkl but pip return an error Could not find a version that satisfies the requirement numpy-mkl. It's weird because when I search numpy-mkl with pip is listed 😖

Any ideas?

@cbrnr
Copy link
Contributor

cbrnr commented Nov 30, 2017

I use it with Anaconda, but I'd be highly interested if it is possible to install it so that I can use MKL with my Homebrew Python.

@neurolabusc
Copy link

@ivannz and @matthew-brett the sample C code provided with this issue shows no problem on the Apple M1. The Apple Accelerate frameworks are able to leverage the dedicated AMX hardware on the M1. This can provide superior performance for some functions relative to openBLAS.

Will SciPy and numpy ever reconsider using Accelerate? I think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track. However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 10, 2021

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

@rgommers
Copy link
Member

Will SciPy and numpy ever reconsider using Accelerate?

If Apple is interested and addresses the issues outlined at https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate, then of course we can reconsider. We support MKL as well which is not open source, because Intel maintains it well and when there's an issue we have actual humans to talk to.

think the strongest argument on the SciPy page is the opaque support Apple has for their tools, making discovery and resolution of bugs hard to track.

Accelerate being stuck on a too-low version of LAPACK (3.2.1 last time I checked) is also a real blocker. Our minimum is 3.4.1 at the moment I believe, and we may gradually increase that minimum. A lowest version that's from 2012 isn't too much to ask.

However, it would be great if the Apple Developer Ecosystem Engineering team could be engaged with core tools like SciPy and numpy in the same way they have helped port Intel's Embree to Apple Silicon.

I'd be 💯 for this! Can we tag someone at Apple who might be able to chime in?

@lawrence-danna-apple given that you worked on M1 support for Python packaging, maybe you have an opinion on this topic or know who to ping?

@sturlamolden
Copy link
Contributor

sturlamolden commented Jan 10, 2021

The main issues are:

  • Accelerate is not fork safe due to the GCD (libdispatch) threadpool.
  • Accelerate does not support 64-bit integers in BLAS

The LAPACK version in Accelerate is not really an issue, as we can build LAPACK from Fortran sources on top of BLAS (like we do for OpenBLAS). Accelerate also uses f2c ABI, which is annoying, but not a complete blocker.

@ilayn
Copy link
Member

ilayn commented Jan 10, 2021

While I wholeheartedly agree that M1 users should be able to join the number crunching community if accelerate is overhauled, however, I am not sure whether apple would bother with it since it doesn't just have blas/lapack but all kinds of extras in that framework that needs redeveloping.

Since even LAPACK devs joined the fast GitHub track and started a much quicker release cycle, a turtle-speed and coldwar-responses won't cut it for us to enable a maintenance effort so it has many meta issues as well as technical ones.

@neurolabusc
Copy link

Given the divergence described with Accelerate, a simpler option might be for the Apple Developer Ecosystem Engineering team to provide patches to optimize OpenBLAS. This would parallel their support of open source projects like Embree and Clang/LLVM. Support could be added piecemeal and tested against the existing functional but not optimized code. I do think direct help from Apple would help, though some have begun to reverse engineer the AMX instructions.

@sturlamolden
Copy link
Contributor

sturlamolden commented Jan 12, 2021

I honestly do not think Apple could care less about Accelerate or OpenBLAS. They want everyone to use Metal 2 for any serious number crunching. They have already deprecated OpenCL. Personally I am wainting for them to do away with Accelerate and GCD as well. Either we jump on the Metal bandwagon or we forget about Mac as a platform for serious number crunching and 3D graphics. That are the options we are given, and honestly it is an easy choice: just ignore Apple. I thought I would stick to Mac because that is what I like to use, but now I am considering to run Linux on my next desktop computer. It seems that is the way scientific computing is heading.

@sturlamolden
Copy link
Contributor

The thing with Apple is, once we have fully embraced Metal, they will come up with something new and deprecate it. Then we are back to being pissed off. They like to piss off developers, and unlike the guys in Redmond they are quite good at it. At some point we just have to tell them they will be irrelevant if they continue to try to make themselves irrelevant.

@rgommers
Copy link
Member

Well we can't ignore it, even if we're annoyed with Accelerate's (lack of) maintenance historically. We have enough users (and devs, like me) who use macOS that we want to provide both from-source build options and wheels with decent performance. Metal isn't really useful until there's a BLAS/LAPACK library for it.

There's also Intel MKL on macOS - it's not open source so not good for wheels, but it's fine to develop with. We'll take what we can get.

@neurolabusc
Copy link

neurolabusc commented Jan 12, 2021

@sturlamolden as I note on my review Metal has several limitations for matrix mathematics: limited to single precision, limited to maximum dimension of 65535, requires dimensions divisible by 8 for decent performance. Apple AMX removes these limitations and outperforms the Metal MPSMatrixMultiplication. You can demonstrate this for yourself by compiling the projects here. If Apple really felt Metal was the preferred solution, why would they waste the die space to implement the matrix co-processor?

Apple's invested the time to develop AMX and use it in their own libraries vecLib, vImage, libBLAS, libBNNS, vDSP and LAPACK rather than Metal. While I like Metal as a compute language for leveraging the GPU, matrix mathematics appears to clearly benefit from dedicated hardware.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 12, 2021

@sturlamolden given how much faster M1 performs it would be a shame to ignore Apple. A lot of people seem to be going back to Macs just because of this when they previously would have rather bought Dell/... running Linux or even Windows (WSL2). @rgommers Intel MKL won't work on M1 which is clearly the future for Macs.

@ilayn
Copy link
Member

ilayn commented Jan 12, 2021

I don't have any strong opinion about Apple but I am leaning towards force Apple's hand instead of us (OSS in general) running around just to support their stuff. Unfortunately that's what you get if you want to pay for M1. Performance means nothing if nothing supports it. So it is a bit on the consumer to weigh the pros and cons.

I would love @neurolabusc s vision materialize but have epsilon hopes about it.

@ilayn
Copy link
Member

ilayn commented Jan 12, 2021

By the way should we get to a brand new issue instead of hiding inside an old linalg issue? On one hand I like that it is hidden here since it gets salty very quickly when a brand is discussed, but also it might need more eyes on it.

@neurolabusc
Copy link

The only reason I re-ignited this here is that the specific issue raised by this issue seems to be resolved with the M1 (or perhaps the latest versions of accelerate). Naively, I thought this was one of the few blockers to using the accelerate framework. There are clearly a lot more issues that I was not aware of. This is well outside my expertise. I do appreciate the insightful and rapid feedback from SciPy developers. This suggests a vibrant community.

My sense is that this question dies here unless someone closely involved with either SciPy or Apple takes the lead. If anyone fits that criteria, I encourage them to start a new issue.

@rgommers
Copy link
Member

I do appreciate the insightful and rapid feedback from SciPy developers. This suggests a vibrant community.

glad to hear that. we try to create a good community and be as responsive as we can:)

My sense is that this question dies here unless someone closely involved with either SciPy or Apple takes the lead. If anyone fits that criteria, I encourage them to start a new issue.

Let me just link to numpy/numpy#18143, which is the immediate bigger issue here. NumPy needs to provide universal2 wheels (and macos-arm64 thin wheels), which includes figuring out what BLAS/LAPACK to use for the native M1 build. Once that's done, following suit for SciPy should be not too difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

No branches or pull requests