Segmentation fault in large scale lbfgs optimization #5102

borislavsm · 2015-07-31T13:10:31Z

Hello,

I receive a segmentation fault when I perform a large scale lbfgs optimisation in scipy. When I perform an optimisation with around 100000 parameters everything works OK, but when I do an optimisation with a significantly larger number of parameters (100000000) I get segfault.

The gdb trace is:

#0  0x00007ffff6db8466 in __memset_sse2 () from /lib64/libc.so.6
#1  0x00007fffe7c2ca29 in cauchy_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#2  0x00007fffe7c313f9 in mainlb_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#3  0x00007fffe7c3435e in setulb_ () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so
#4  0x00007fffe7c1dd00 in f2py_rout__lbfgsb_setulb () from /usr/lib64/python2.7/site-packages/scipy/optimize/_lbfgsb.so

OS - CentOS
Python 2.7.5
numpy 1.9.2
scipy 0.12.1

Code that showcases the problem:

import numpy as np
import scipy
import scipy.optimize

N = 100000000

def df(thetav):
    df = np.random.random(N)
    score = np.random.random()
    print('score : %f ' % score)
    return (score, df)

a = np.random.random(N)
(a, _,_) = scipy.optimize.fmin_l_bfgs_b(df, a)

The text was updated successfully, but these errors were encountered:

argriffing · 2015-07-31T16:19:29Z

Some of the Fortran parts of scipy use the Fortran integer type which is 32-bits, causing integer overflow bugs when used for data science. This problem extends throughout scipy, for example #5064.

borislavsm · 2015-07-31T16:39:45Z

Thanks for the reply!

If the problem is integer overflow what is the maximal size of the derivative vector that can be used?

Just to note that the optimizer usually performs several steps and only then it crashes.

argriffing · 2015-07-31T17:07:54Z

Here's the Fortran code: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f.

If the problem is integer overflow what is the maximal size of the derivative vector that can be used?

The expression 2*m*n + 11m*m + 5*n + 8*m appears in the comments at the top of that Fortran code, where n is 'the dimension of the problem' and m is 'the maximum number of variable metric corrections used to define the limited memory matrix`. I would guess that if this value starts to overflow 32 bits then bad things could start to happen, but I'm not 100% sure that this is the source of the problem that you are seeing.

In my opinion the best way to fix this would be to find compatibly licensed updates of this code elsewhere and re-introduce the code into scipy from that other source, if this exists. The lbfgsb.f source is 2011 Fortran code which has probably found its way into other projects, some of which may have already patched it to be web-scale. But it looks like the most recent official version is still 3.0 http://users.iems.northwestern.edu/~nocedal/lbfgsb.html.

argriffing · 2015-07-31T17:26:10Z

@borislavsm You could try running https://github.com/stephenbeckr/L-BFGS-B-C. A header file has a note about integer sizes https://github.com/stephenbeckr/L-BFGS-B-C/blob/master/src/lbfgsb.h#L10.

larsmans · 2015-08-02T09:19:14Z

~~There's also liblbfgs, which is a much more cleaned-up translation of L-BFGS-B to C. I have an (incomplete/unmaintained) Python wrapper for that code.~~ Actually this offers L-BFGS only, without bounds constraints. I guess it's a not a proper replacement then.

person142 · 2015-10-31T07:44:11Z

I know I'm a bit late to the party, but here goes. I downloaded the original L-BFGS-B code:

http://users.iems.northwestern.edu/~nocedal/lbfgsb.html

It has a test routine driver1.f which minimizes the generalized Rosenbrock function. The default test size is n = 25. I took @borislavsm's n = 100000000 and ran make; of course the code didn't compile (complained about overflow). But after adding the flags -fdefault-integer-8 (to make the default integer size 64 bit) and -mcmodel=large (an x86 specific flag to allow accessing memory beyond 2 GB) the code compiled just fine, and when I ran it on the test case in driver1.f with n = 100000000 it converged.

This is rather tentative, and I need to test it more, but perhaps the issue can be solved simply by changing how the code is compiled?

Sorry if I'm missing something and am talking nonsense.

dlax added the scipy.optimize label Aug 1, 2015

prakaashkpk mentioned this issue Sep 1, 2019

Scipy crashes with Illegal instruction #10753

Closed

pv mentioned this issue Jan 1, 2020

ENH: build infrastructure for ILP64 BLAS + ARPACK conversion #11302

Merged

silastittes mentioned this issue Feb 16, 2023

scipy >= 1.8.1 causes a segfault kr-colab/product-space-FEM#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault in large scale lbfgs optimization #5102

Segmentation fault in large scale lbfgs optimization #5102

borislavsm commented Jul 31, 2015

argriffing commented Jul 31, 2015

borislavsm commented Jul 31, 2015

argriffing commented Jul 31, 2015

argriffing commented Jul 31, 2015

larsmans commented Aug 2, 2015

person142 commented Oct 31, 2015

Segmentation fault in large scale lbfgs optimization #5102

Segmentation fault in large scale lbfgs optimization #5102

Comments

borislavsm commented Jul 31, 2015

argriffing commented Jul 31, 2015

borislavsm commented Jul 31, 2015

argriffing commented Jul 31, 2015

argriffing commented Jul 31, 2015

larsmans commented Aug 2, 2015

person142 commented Oct 31, 2015