Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in large scale lbfgs optimization #5102

Open
borislavsm opened this issue Jul 31, 2015 · 6 comments
Open

Segmentation fault in large scale lbfgs optimization #5102

borislavsm opened this issue Jul 31, 2015 · 6 comments

Comments

@borislavsm
Copy link

Hello,

I receive a segmentation fault when I perform a large scale lbfgs optimisation in scipy. When I perform an optimisation with around 100000 parameters everything works OK, but when I do an optimisation with a significantly larger number of parameters (100000000) I get segfault.

The gdb trace is:

#0  0x00007ffff6db8466 in __memset_sse2 () from /lib64/libc.so.6
#1  0x00007fffe7c2ca29 in cauchy_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#2  0x00007fffe7c313f9 in mainlb_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#3  0x00007fffe7c3435e in setulb_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#4  0x00007fffe7c1dd00 in f2py_rout__lbfgsb_setulb () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so

OS - CentOS
Python 2.7.5
numpy 1.9.2
scipy 0.12.1

Code that showcases the problem:

import numpy as np
import scipy
import scipy.optimize

N = 100000000

def df(thetav):
    df = np.random.random(N)
    score = np.random.random()
    print('score : %f ' % score)
    return (score, df)

a = np.random.random(N)
(a, _,_) = scipy.optimize.fmin_l_bfgs_b(df, a)
@argriffing
Copy link
Contributor

Some of the Fortran parts of scipy use the Fortran integer type which is 32-bits, causing integer overflow bugs when used for data science. This problem extends throughout scipy, for example #5064.

@borislavsm
Copy link
Author

Thanks for the reply!

If the problem is integer overflow what is the maximal size of the derivative vector that can be used?

Just to note that the optimizer usually performs several steps and only then it crashes.

@argriffing
Copy link
Contributor

Here's the Fortran code: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f.

If the problem is integer overflow what is the maximal size of the derivative vector that can be used?

The expression 2*m*n + 11m*m + 5*n + 8*m appears in the comments at the top of that Fortran code, where n is 'the dimension of the problem' and m is 'the maximum number of variable metric corrections used to define the limited memory matrix`. I would guess that if this value starts to overflow 32 bits then bad things could start to happen, but I'm not 100% sure that this is the source of the problem that you are seeing.

In my opinion the best way to fix this would be to find compatibly licensed updates of this code elsewhere and re-introduce the code into scipy from that other source, if this exists. The lbfgsb.f source is 2011 Fortran code which has probably found its way into other projects, some of which may have already patched it to be web-scale. But it looks like the most recent official version is still 3.0 http://users.iems.northwestern.edu/~nocedal/lbfgsb.html.

@argriffing
Copy link
Contributor

@borislavsm You could try running https://github.com/stephenbeckr/L-BFGS-B-C. A header file has a note about integer sizes https://github.com/stephenbeckr/L-BFGS-B-C/blob/master/src/lbfgsb.h#L10.

@larsmans
Copy link
Contributor

larsmans commented Aug 2, 2015

There's also liblbfgs, which is a much more cleaned-up translation of L-BFGS-B to C. I have an (incomplete/unmaintained) Python wrapper for that code. Actually this offers L-BFGS only, without bounds constraints. I guess it's a not a proper replacement then.

@person142
Copy link
Member

I know I'm a bit late to the party, but here goes. I downloaded the original L-BFGS-B code:

http://users.iems.northwestern.edu/~nocedal/lbfgsb.html

It has a test routine driver1.f which minimizes the generalized Rosenbrock function. The default test size is n = 25. I took @borislavsm's n = 100000000 and ran make; of course the code didn't compile (complained about overflow). But after adding the flags -fdefault-integer-8 (to make the default integer size 64 bit) and -mcmodel=large (an x86 specific flag to allow accessing memory beyond 2 GB) the code compiled just fine, and when I ran it on the test case in driver1.f with n = 100000000 it converged.

This is rather tentative, and I need to test it more, but perhaps the issue can be solved simply by changing how the code is compiled?

Sorry if I'm missing something and am talking nonsense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants