Heisen-bug with omp_cv #3190

arjoly · 2014-05-23T13:05:49Z

Got a heisen travis failure while working on #3173.
The entire travis log is at https://travis-ci.org/scikit-learn/scikit-learn/jobs/25868444

======================================================================
ERROR: sklearn.linear_model.tests.test_omp.test_omp_cv
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/tests/test_omp.py", line 195, in test_omp_cv
    ompcv.fit(X, y_)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 867, in fit
    for train, test in cv)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 644, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 391, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 129, in __init__
    self.results = func(*args, **kwargs)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 764, in _omp_path_residues
    return_path=True)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 371, in orthogonal_mp
    copy_X=copy_X, return_path=return_path)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 109, in _cholesky_omp
    **solve_triangular_args)
  File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 115, in solve_triangular
    a1, b1 = map(asarray_chkfinite,(a,b))
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 595, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

The test doesn't seem to be always stable.

The text was updated successfully, but these errors were encountered:

ogrisel · 2014-05-23T13:14:29Z

@vene do you think can have a look at it? I also observed that several times in the past.

vene · 2014-05-27T08:16:44Z

I can't reproduce the test when running in isolation in a loop (but I guess that's why it's a heisenbug). However, it consistently raises a warning about linear dependence in the dictionary. I'll try to make the warning go away :)

MechCoder · 2014-05-29T13:00:15Z

Is this a duplicate of this? #3139

ogrisel · 2014-06-18T14:39:13Z

It seems to happen more often with the second build of the 3 travis builds, namely:

DISTRIB="conda" PYTHON_VERSION="2.6" INSTALL_MKL="false" NUMPY_VERSION="1.6.2"

Although I am not 100% sure.

kastnerkyle · 2014-06-24T06:34:08Z

I ran 28 million (yes, that many) runs using 4 cores of my machine last night, using this script

from sklearn.linear_model.tests.test_omp import test_omp_cv

itr = 0
for i in iter(int, 1):
    if itr % 1000 == 0:
        print(itr)
    test_omp_cv()
    itr += 1

Python details:
Python 3.4.1
scikit-learn master
Numpy 1.8.1
Scipy 0.14.0
ATLAS 3.8.4

No hint of an error... I am going to try running the full test suite repeatedly to see if that is somehow different. Does anyone know exactly where (relative to sklearn directory) and how Travis starts the tests?

ogrisel · 2014-06-24T07:12:49Z

Does anyone know exactly where (relative to sklearn directory) and how Travis starts the tests?

We just discussed that IRL, but for the record it's all configured in:

arjoly · 2014-06-24T07:28:59Z

Could it come from a bug with the blas?

kastnerkyle · 2014-06-26T06:57:12Z

So I have tried the following things:

Running test_comp_cv in isolation ~28 million times (~7million x 4 cores)
Running the entire test suite 890 times looking for a crash

This is with:
Python 3.4.1
sklearn master
numpy 1.8.1
scipy 0.14.0
atlas (none? Thought I was using it but apparently not)

Single test script

from sklearn.linear_model.tests.test_omp import test_omp_cv

itr = 0
for i in iter(int, 1):
    if itr % 1000 == 0:
        print(itr)
    test_omp_cv()
    itr += 1

The script I have been using for the full test suite, run from the root of a cloned sklearn (i.e. ~/src/scikit-learn) :

#!/bin/bash

DISTRIB="conda"
PYTHON_VERSION="2.6"
NUMPY_VERSION="1.6.2"
SCIPY_VERSION="0.11.0"
INSTALL_MKL="false"

PYTHON_ENV_PATH=$(conda info -e | grep testenv | tr -s " " | cut -d " " -f 2)
PYTHONPATH=$PYTHON_ENV_PATH/lib/python$PYTHON_VERSION/site-packages

# Configure the conda environment and put it in the path using the
# provided versions
conda remove -n testenv --all --yes
conda create -n testenv --yes python=$PYTHON_VERSION pip nose \
    numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION
conda install -n testenv --yes -f numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION

if [[ "$INSTALL_MKL" == "true" ]]; then
    # Make sure that MKL is used
    conda install -n testenv --yes mkl
else
    # Make sure that MKL is not used
    conda remove -n testenv --yes --features mkl || echo "MKL not installed"
fi

for i in `seq 1 10000`; do
    echo "Running test suite, iteration $i"
    echo $i > runcount.log
    $PYTHON_ENV_PATH/bin/python -u --version
    $PYTHON_ENV_PATH/bin/python -u -c "import numpy; print('numpy %s' % numpy.__version__)"
    $PYTHON_ENV_PATH/bin/python -u -c "import scipy; print('scipy %s' % scipy.__version__)"
    $PYTHON_ENV_PATH/bin/python -u setup.py clean
    # Test exit code to catch CTRL-C
    test $? -gt 128 && break
    $PYTHON_ENV_PATH/bin/python -u setup.py build_ext --inplace
    test $? -gt 128 && break
    $PYTHON_ENV_PATH/bin/python -u setup.py install
    test $? -gt 128 && break    
    $PYTHON_ENV_PATH/bin/nosetests --pdb-failures -s -v sklearn
    test $? -gt 128 && break
done

I am trying the same with older packages, but this bug is very hard to find, at least with my box/current settings.

GaelVaroquaux · 2014-07-15T08:19:19Z

This has been fixed in #3353

arjoly added the Bug label May 23, 2014

larsmans mentioned this issue Jun 16, 2014

Test failure in OMPCV #3139

Closed

larsmans added the Build / CI label Jun 16, 2014

ogrisel mentioned this issue Jun 30, 2014

[MRG] Generic multi layer perceptron #3204

Closed

4 tasks

kastnerkyle mentioned this issue Jul 9, 2014

[MRG+1] Skip test for OrthogonalMatchingPursuitCV if running on Travis #3353

Merged

GaelVaroquaux closed this as completed Jul 15, 2014

This was referenced Mar 12, 2015

test_non_meta_estimators 'OrthogonalMatchingPursuitCV' ValueError: array must not contain infs or NaNs #4387

Closed

[MRG+1] fix ompcv on old scipy versions #4402

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heisen-bug with omp_cv #3190

Heisen-bug with omp_cv #3190

arjoly commented May 23, 2014

ogrisel commented May 23, 2014

vene commented May 27, 2014

MechCoder commented May 29, 2014

ogrisel commented Jun 18, 2014

kastnerkyle commented Jun 24, 2014

ogrisel commented Jun 24, 2014

arjoly commented Jun 24, 2014

kastnerkyle commented Jun 26, 2014

GaelVaroquaux commented Jul 15, 2014

Heisen-bug with omp_cv #3190

Heisen-bug with omp_cv #3190

Comments

arjoly commented May 23, 2014

ogrisel commented May 23, 2014

vene commented May 27, 2014

MechCoder commented May 29, 2014

ogrisel commented Jun 18, 2014

kastnerkyle commented Jun 24, 2014

ogrisel commented Jun 24, 2014

arjoly commented Jun 24, 2014

kastnerkyle commented Jun 26, 2014

GaelVaroquaux commented Jul 15, 2014