Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

ZeroDivisionError: float division by zero in scipy/sparse/linalg/isolve/lsqr.py #2986

Closed
ogrisel opened this Issue Mar 21, 2014 · 17 comments

Comments

Projects
None yet
5 participants
Owner

ogrisel commented Mar 21, 2014

In recent scipy (I think after 0.13.3) built against the reference lapack implementation (not the Atlas variant) the following test fails:

======================================================================
ERROR: Test that linear regression also works with sparse data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ogrisel/venvs/py27/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/ogrisel/code/scikit-learn/sklearn/linear_model/tests/test_base.py", line 77, in test_linear_regression_sparse
    ols.fit(X, y.ravel())
  File "/Users/ogrisel/code/scikit-learn/sklearn/linear_model/base.py", line 367, in fit
    out = lsqr(X, y)
  File "/Users/ogrisel/code/scipy-openblas/scipy/sparse/linalg/isolve/lsqr.py", line 436, in lsqr
    test3 = 1 / acond
ZeroDivisionError: float division by zero

This is a linear regression (ordinary least squares) on sparse data using the scipy sparse solver. The exact code of the test is:

def test_linear_regression_sparse(random_state=0):
    "Test that linear regression also works with sparse data"
    random_state = check_random_state(random_state)
    n = 100
    X = sparse.eye(n, n)
    beta = random_state.rand(n)
    y = X * beta[:, np.newaxis]

    ols = LinearRegression()
    ols.fit(X, y.ravel())
    assert_array_almost_equal(beta, ols.coef_ + ols.intercept_)
    assert_array_almost_equal(ols.residues_, 0)

The tests pass if X is converted to an array and the matrix multiplication replaced by np.dot. As the identity matrix is well conditioned and the beta are non-zero:

>>> np.any(np.random.RandomState(0).rand(100) == 0)
False

the OLS solver should be able to recover the exact betas. Before trying to transform it as a scipy-only code snippet I would like to have the confirmation that this is indeed a scipy bug and not a bug in our test.

@ogrisel ogrisel added the Bug label Mar 21, 2014

Owner

ogrisel commented Mar 21, 2014

The same error occurs in scipy 0.12.1 so this is not a recent regression. Maybe this is linked to the lapack implementation.

Owner

GaelVaroquaux commented Mar 21, 2014

The same error occurs in scipy 0.12.1 so this is not a recent
regression. Maybe this is linked to the lapack implementation.

Probably

Owner

ogrisel commented Mar 25, 2014

Actually I cannot reproduce that under a Ubuntu 13.10 with the provided liblapack package but only when I link numpy against the reference liblapack included in OpenBLAS. This might be a bug in OpenBLAS, more investigation is needed, probably as a minimal C program.

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014

For whatever it's worth, I just ran into the OP's 'float division by zero' error, after installing scikit-learn, and running the tests as recommended at http://scikit-learn.org/stable/install.html. I'm using 64-bit Windows 7, 32-bit python 2.7.8, scipy-0.14.0, numpy-1.9.0, and scikit-learn version 0.15.2.

Owner

ogrisel commented Sep 22, 2014

How did you install numpy and scipy? from the .exe installers on sourceforge?

Yup, exactly.

On Mon, Sep 22, 2014 at 8:13 AM, Olivier Grisel notifications@github.com
wrote:

How did you install numpy and scipy? from the .exe installers on
sourceforge?


Reply to this email directly or view it on GitHub
#2986 (comment)
.

Owner

ogrisel commented Sep 22, 2014

Weird, I thought the scipy installer on sourceforge was shipping with the ATLAS version of Lapack.

Owner

cournape commented Nov 7, 2014

FWIW, I also see this error with numpy 1.9.1, scipy 0.14.0 on top of MKL 10.3 (on Centos 5.9 64 bits).

Owner

amueller commented Nov 7, 2014

It seems unlikely to be a bug in all BLAS packages. @ogrisel you could reproduce the error on one setup, right?

Owner

amueller commented Nov 7, 2014

Can anyone who sees the error run

from scipy import sparse
from scipy.sparse.linalg import lsqr
import numpy as np

n = 100
X = sparse.eye(n, n)
beta = np.random.rand(n)
y = X * beta[:, np.newaxis]
lsqr(X, y)

Maybe in a loop or with varying random states?

Owner

amueller commented Nov 7, 2014

Ok I can actually reproduce the crash. Seems to be an instability in the scipy lsqr solver.

Owner

ogrisel commented Nov 12, 2014

I reproduce it as well with @amueller's snippet when I fix the rng seed to 3:

>>> %paste
from scipy import sparse
from scipy.sparse.linalg import lsqr
import numpy as np

np.random.seed(3)
n = 100
X = sparse.eye(n, n)
beta = np.random.rand(n)
y = X * beta[:, np.newaxis]
lsqr(X, y, show=True)

## -- End pasted text --

LSQR            Least-squares solution of  Ax = b
The matrix A has      100 rows  and      100 cols
damp = 0.00000000000000e+00   calc_var =        0
atol = 1.00e-08                 conlim = 1.00e+08
btol = 1.00e-08               iter_lim =      200

   Itn      x[0]       r1norm     r2norm   Compatible    LS      Norm A   Cond A
     0  0.00000e+00   5.448e+00  5.448e+00    1.0e+00  1.8e-01
Traceback (most recent call last):
  File "<ipython-input-22-21835754728d>", line 10, in <module>
    lsqr(X, y, show=True)
  File "/volatile/ogrisel/envs/py27/local/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/lsqr.py", line 436, in lsqr
    test3 = 1 / acond
ZeroDivisionError: float division by zero
Owner

ogrisel commented Nov 12, 2014

My scipy is 0.14.0 and is linked against ATLAS's version of BLAS & lapack.

Owner

ogrisel commented Nov 12, 2014

I have updated my numpy version to the latest version (1.9.1) and now the random seed 0 triggers the problem but not random seed 3...

>>> %paste
from scipy import sparse, __version__
from scipy.sparse.linalg import lsqr
import numpy as np

print("scipy version: %r" % __version__)

for i in range(100):
    print("random seed %d:" % i)
    np.random.seed(i)
    n = 100
    X = sparse.eye(n, n)
    beta = np.random.rand(n)
    y = X * beta[:, np.newaxis]
    lsqr(X, y, show=True)

## -- End pasted text --
scipy version: '0.14.0'
random seed 0:

LSQR            Least-squares solution of  Ax = b
The matrix A has      100 rows  and      100 cols
damp = 0.00000000000000e+00   calc_var =        0
atol = 1.00e-08                 conlim = 1.00e+08
btol = 1.00e-08               iter_lim =      200

   Itn      x[0]       r1norm     r2norm   Compatible    LS      Norm A   Cond A
     0  0.00000e+00   5.538e+00  5.538e+00    1.0e+00  1.8e-01
Traceback (most recent call last):
  File "<ipython-input-7-d6322cd1c85e>", line 14, in <module>
    lsqr(X, y, show=True)
  File "/volatile/ogrisel/envs/py27/local/lib/python2.7/site-packages/scipy/sparse/linalg/isolve/lsqr.py", line 436, in lsqr
    test3 = 1 / acond
ZeroDivisionError: float division by zero
Owner

amueller commented Nov 12, 2014

I think we need to talk to scipy people. @ogrisel do you want to do that or should I?

Owner

ogrisel commented Nov 13, 2014

I am on it.

Owner

ogrisel commented Nov 14, 2014

This has been fixed in scipy master (part of the future 0.15.0 release). I will do a backport for sklearn.

@ogrisel ogrisel closed this in 54f483a Nov 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment