TestProbitCG: 8 failing tests (Python 3.4 / Ubuntu 12.04) #1690

Closed
andrewclegg opened this Issue May 24, 2014 · 16 comments

Projects

None yet

2 participants

@andrewclegg
Contributor

I am attempting to install statsmodels master on Python 3.4 for the Snake Charmer project. Several tests are failing or throwing errors.

Python 3.4
Ubuntu 12.04 64 bit (virtual)
NumPy 1.8.1
SciPy 0.13.1

statsmodels revision 3adaa1a
Snake Charmer revision snake-charmer-devs/snake-charmer@72734de

To reproduce: install Snake Charmer, spin up a VM as directed in the README, log in using vagrant ssh, test as normal.

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_bse
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 85, in test_bse
    assert_almost_equal(self.res1.bse, self.res2.bse, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 50.0%)
 x: array([ 0.69414806,  0.08390496,  0.59514105,  2.54327138])
 y: array([ 0.69388252,  0.08389026,  0.59503792,  2.5424725 ])

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_conf_int
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 56, in test_conf_int
    assert_allclose(self.res1.conf_int(), self.res2.conf_int, rtol=8e-5)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 1183, in assert_allclose
    verbose=verbose, header=header)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=8e-05, atol=0

(mismatch 100.0%)
 x: array([[  0.26725425,   2.98826463],
       [ -0.1127225 ,   0.21617888],
       [  0.25981741,   2.59272747],
       [-12.44303176,  -2.47359115]])
 y: array([[  0.2658255,   2.985795 ],
       [ -0.1126929,   0.2161508],
       [  0.2600795,   2.592585 ],
       [-12.43547  ,  -2.469166 ]])

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_params
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 53, in test_params
    assert_almost_equal(self.res1.params, self.res2.params, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 50.0%)
 x: array([ 1.62775944,  0.05172819,  1.42627244, -7.45831145])
 y: array([ 1.62581025,  0.05172895,  1.42633237, -7.45232042])

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_predict
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 99, in test_predict
    self.res2.phat, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 43.75%)
 x: array([ 0.01813411,  0.05303997,  0.19003049,  0.01855665,  0.55528215,
        0.02720612,  0.018475  ,  0.04453257,  0.1087887 ,  0.66371484,
        0.01606698,  0.19368361,  0.32366954,  0.1952776 ,  0.35666492,...
 y: array([ 0.0181707,  0.0530805,  0.1899263,  0.0185707,  0.5545748,
        0.0272331,  0.0185033,  0.0445714,  0.1088081,  0.6631207,
        0.0161024,  0.1935566,  0.3233282,  0.1951826,  0.3563406,...

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_predict_xb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 104, in test_predict_xb
    self.res2.yhat, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 87.5%)
 x: array([-2.09390756, -1.61606651, -0.87778395, -2.08451562,  0.13901828,
       -1.92354024, -2.08631619, -1.70034989, -1.23299563,  0.42262303,
       -2.14274035, -0.86440177, -0.45746191, -0.85861095, -0.36738772,...
 y: array([-2.093086  , -1.61569178, -0.87816805, -2.08420706,  0.13722852,
       -1.92311108, -2.08569193, -1.69993722, -1.23289168,  0.42099541,
       -2.14186025, -0.86486465, -0.45841211, -0.85895526, -0.36825761,...

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_resid_dev
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 124, in test_resid_dev
    DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 68.75%)
 x: array([-0.19131412, -0.3301466 , -0.64924367, -0.19355092,  1.08469251,
       -0.23487468, -0.19312064, -0.301843  , -0.47994526,  0.90543103,
       -0.17998583, -0.6561693 , -0.88439072,  1.80739213, -0.93924391,...
 y: array([-0.191509 , -0.3302762, -0.6490455, -0.1936247,  1.085867 ,
       -0.2349926, -0.1932698, -0.3019776, -0.4799906,  0.9064196,
       -0.1801855, -0.6559291, -0.8838201,  1.807661 , -0.9387071,...

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_resid_generalized
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 128, in test_resid_generalized
    self.res2.resid_generalized, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 56.25%)
 x: array([-0.04537189, -0.11414615, -0.33506451, -0.04629088,  0.71154071,
       -0.06448265, -0.04611355, -0.09837399, -0.20931732,  0.54972289,
       -0.0408271 , -0.34052911, -0.53126158,  1.41310335, -0.57964708,...
 y: array([-0.045452, -0.11422 , -0.334908, -0.046321,  0.712624, -0.064538,
       -0.046175, -0.098447, -0.209349,  0.550593, -0.040906, -0.340339,
       -0.530763,  1.413373, -0.57917 , -0.053593, -0.100556, -0.071855,...

======================================================================
FAIL: statsmodels.discrete.tests.test_discrete.TestProbitCG.test_zstat
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 59, in test_zstat
    assert_almost_equal(self.res1.tvalues, self.res2.z, DECIMAL_4)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 454, in assert_almost_equal
    return assert_array_almost_equal(actual, desired, decimal, err_msg)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 4 decimals

(mismatch 75.0%)
 x: array([ 2.34497443,  0.61650933,  2.39652841, -2.93256611])
 y: array([ 2.34306269,  0.61662638,  2.39704449, -2.93113118])

----------------------------------------------------------------------
@andrewclegg andrewclegg referenced this issue in snake-charmer-devs/snake-charmer May 24, 2014
Closed

One problem in statsmodels #14

@josef-pkt
Member

Thanks for reporting,
all errors are in ProbitCG
see #1648 There are platform specific convergence problems with fmin_cg.

It's not a "real" problem, because this is just the example where we test all connected scipy optimizers, but we never use fmin_cg as default or in examples.

We need to adjust the example or find a case that is nicer for convergence.

@andrewclegg
Contributor

Thanks for clarifying -- not sure why my search for related issues didn't find #1648.

If the actual results aren't crucial, could I put in a request for e.g. relaxing the tolerance bounds on these until they pass? Just to reduce noise for downstream projects like mine. Happy to make a patch, but I'd like to check that you'd be happy with that approach, before I take the time to work out how to do it :-)

@josef-pkt
Member

@andrewclegg can you try maxiter=1000, to see whether it just didn't finish convergence in your case
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/discrete/tests/test_discrete.py#L366

I changed recently to relative tolerance for conf_int, but in your case everything fails, so the optimization doesn't converge well enough.

It's a nasty test case, and small differences in the LAPACK libraries (or something else?) affects the convergence path or causes that the optimization to stop without being close enough for the tests.

I haven't changed anything or much in follow-up to #1648 because it doesn't fail on TravisCI and on nightly testing on Debian and on Ubuntu, and on my testing with Windows32.

If this test failure persists, then I will look into changing the example or add better starting values.
Lowering the test precision for all test is not easy because it's the same for all optimizers, and might not be really useful because then the test might become too weak.

@andrewclegg
Contributor

maxiter=1000 didn't help I'm afraid -- same 8 failures.

@josef-pkt
Member

Not good.
Which LAPACK library are you using? (I'm not sure it's relevant.)

This turns into one of these scipy.optimize issues that only show up in specific configuration and machines, and I don't really know what to do because I never manage to replicate them in my setting.
(There are two other unsolved mysteries on some Debian machines in other parts.)

A few versions ago we had these tests disabled on Windows, because there fmin_cg didnt' converge.

I'll see if I can find anything suspicious in the test case, otherwise you could just add a skip for the entire test class.

@josef-pkt
Member

@andrewclegg as extra information trying to understand this: Does your scipy test suite run without errors and failures?

@andrewclegg
Contributor

liblapack3gf 3.3.1-1

I have one SciPy test failing:

scipy/scipy#3551

If you have time, you should be able to replicate it easily via Snake Charmer, since it provides a VM with all the package versions fixed. Unless the behaviour of things like LAPACK on a VM depends on the hardware of the host, which would surprise (and worry) me...

Installation instructions here:

https://github.com/andrewclegg/snake-charmer

@josef-pkt
Member

It takes too much time for me to set up a virtual machine again.

I'm preparing a PR that does rescaling for the ProbitCG test that should eventually be integrated as an optimization options along the lines of #1131.
It should be available later today, it's working but I still need to clean up and write unittests for the transformation class. I hope it will completely get rid of any ProbitCG problems (and convergence is much faster).

@andrewclegg
Contributor

Great, thanks -- I'll give it a go when it's ready.

BTW the point of Snake Charmer is to take the pain out of setting up a VM. You just need to install VirtualBox (about 3 clicks), install Vagrant (about 3 clicks), then type one command at the command line :-)

But hopefully we won't need it in this case.

@josef-pkt josef-pkt added a commit to josef-pkt/statsmodels that referenced this issue May 27, 2014
@josef-pkt josef-pkt TST: fix TestProbitCG by reparameterization, closes #1690 bd1abf4
@josef-pkt
Member

@andrewclegg I opened PR #1699 which should hopefully fix this.
I will merge this when TravisCI is green
You can try it out, or wait for the merge and pull master.

@josef-pkt josef-pkt closed this in bd1abf4 May 27, 2014
@josef-pkt
Member

closed from the commit, please reopen if issue didn't get fixed

@josef-pkt
Member

failing test on Debian, needs fcalls >=73, I only put in 70

(Still a mystery, why it's working so much when we are already close to or at the optimum.)

http://nipy.bic.berkeley.edu/builders/statsmodels-py2.7-wheezy-sparc/builds/49/steps/shell_7/logs/stdio

======================================================================
ERROR: test suite for <class 'statsmodels.discrete.tests.test_discrete.TestProbitCG'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/nose/suite.py", line 208, in run
    self.setUp()
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/nose/suite.py", line 291, in setUp
    self.setupContext(ancestor)
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/nose/suite.py", line 314, in setupContext
    try_run(context, names)
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/nose/util.py", line 470, in try_run
    return func()
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/statsmodels-0.6.0-py2.7-linux-sparc64.egg/statsmodels/discrete/tests/test_discrete.py", line 381, in setupClass
    assert_array_less(cls.res1.mle_retvals['fcalls'], 70)
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 880, in assert_array_less
    header='Arrays are not less-ordered')
  File "/home/buildslave/nd-bb-slave-sparc-wheezy/statsmodels-py2_7-wheezy-sparc/build/venv/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not less-ordered

(mismatch 100.0%)
 x: array(73)
 y: array(70)
@josef-pkt josef-pkt reopened this May 31, 2014
@josef-pkt
Member

pythonxy nightly Ubuntu are still both green

@andrewclegg
Contributor

With that revision I get the same failure, but with different values reported:

ERROR: test suite for <class 'statsmodels.discrete.tests.test_discrete.TestProbitCG'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/suite.py", line 209, in run
    self.setUp()
  File "/usr/local/lib/python3.4/dist-packages/nose/suite.py", line 292, in setUp
    self.setupContext(ancestor)
  File "/usr/local/lib/python3.4/dist-packages/nose/suite.py", line 315, in setupContext
    try_run(context, names)
  File "/usr/local/lib/python3.4/dist-packages/nose/util.py", line 470, in try_run
    return func()
  File "/usr/local/lib/python3.4/dist-packages/statsmodels/discrete/tests/test_discrete.py", line 381, in setupClass
    assert_array_less(cls.res1.mle_retvals['fcalls'], 10)
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 880, in assert_array_less
    header='Arrays are not less-ordered')
  File "/usr/local/lib/python3.4/dist-packages/numpy/testing/utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not less-ordered

(mismatch 100.0%)
 x: array(37)
 y: array(10)
@josef-pkt
Member

That's fine, that was just a check how many function calls it needs, similar to the Debian failure above.
It still converged in your case, since it didn't fail in the parameter test.

I will change this to 100 fcalls to have a bit of buffer across machines
assert_array_less(cls.res1.mle_retvals['fcalls'], 100)

@josef-pkt
Member

thanks for reporting back

@josef-pkt josef-pkt closed this in 74b4dc9 Jun 18, 2014
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@josef-pkt josef-pkt TST: fix TestProbitCG by reparameterization, closes #1690 27c67bf
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@josef-pkt josef-pkt TST: TestProbitCG increase bound for fcalls closes #1690 914424d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment