BUG/TST: Appveyor failing on statespace test_simulate #3386

ChadFulton · 2017-01-22T03:09:56Z

It's not clear what's happening, except tests are failing (i.e. exiting without finishing). Error message:

statsmodels.tsa.statespace.tests.test_simulate.test_structural ...
Command exited with code -1073741819

First guess would be some kind of segfault in simulation, so in the simulation smoother.

The text was updated successfully, but these errors were encountered:

josef-pkt · 2017-01-22T04:49:38Z

In one of Kerby's PRs it fails on python 2.7 but in a different simulation test
statsmodels.tsa.statespace.tests.test_simulate.test_arma_lfilter ... Command exited with code -1073741819

bashtage · 2017-01-22T22:09:46Z

Probably have issues with long differences between Unix and Windows. To avoid these I always use things int64_t/np.int64 rather than long or long long

bashtage · 2017-02-02T12:34:40Z

This seems to happen a lot. I saw one locally and loaded the debugger after the crash. Since it wasn't a debug build, there wasn't a lot of helpful information. However, I did see that the crash happened in mkl_av2.dll which is where BLAS calls are executed. I have a suspicion that this is an obscure data alignment issue with SIMD. In many cases data is aligned on the right sized boundary (e.g. 32 bytes). Occasionally it is not, and so the error occurs.

bashtage · 2017-02-02T12:35:33Z

Another possibility is that one of the inputs to the BLAS function calls does not have the correct type on Windows.

bashtage · 2017-02-02T12:45:08Z

The integer types you pass to BLAS functions might need to be np.npy_intp which would result in 32 bit integers on 32 bit platforms and 64 bit integers on 64 bit platforms.

josef-pkt · 2017-02-02T14:03:04Z

I never managed to run into problems with the simulate tests. I tried for a while yesterday with Windows 8.1, 64 bit python 3.4 with statsmodels compiled with MingW/gcc, winpython.

bashtage · 2017-02-02T14:33:01Z

It is very hard to trigger. I have intentionally tried many times and cannot get a failure locally aside from the one I caught. From: Josef Perktold Sent: Thursday, February 2, 2017 2:03 PM To: statsmodels/statsmodels Cc: Kevin Sheppard; Comment Subject: Re: [statsmodels/statsmodels] BUG/TST: Appveyor failing on statespacetest_simulate (#3386) I never managed to run into problems with the simulate tests. I tried for a while yesterday with Windows 8.1, 64 bit python 3.4 with statsmodels compiled with MingW/gcc, winpython. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

bashtage · 2017-02-03T15:38:37Z

This is really strange. I have done some more runs where I printed a lot to see where the failure was -- it seems that it occurs between tests. I'm not quite sure what this implies. Maybe it is crashing when garbage collecting the various extension classes from the Cython.

bashtage · 2017-02-03T15:53:20Z

The exception code in hex is C0000005 which means access violation - something is trying to read/write data is isn't owned by it. It could also be freeing something that has already been freed.

josef-pkt · 2017-02-03T17:38:31Z

There is one inplace modification of eps while it is attached to a model
The last two cases in test_structural don't have a copy before modifying it
desired = eps.copy()

It might also be worth a try to make all eps copies when calling the simulate method.
(I don't know much about the internals and cython, but a eps.copy() would prevent that they are all using the same array memory.)

bashtage · 2017-02-03T19:56:06Z

The difficulty is that it fails right after each of these tests. The actual simulation code is all python so very doubtful that this is causing a problem. I now suspect there is some issue with mkl. Trying a new mkl on appveyor to see if it works.

ChadFulton · 2017-02-09T04:39:08Z

I'm having trouble replicating this. I have a conda environment with the same packages as appveyor on my Windows 10 machine, and can't get a problem.

@bashtage was the local failure you observed a fluke? or did you get it from repeated test runs? And was it a segfault?

bashtage · 2017-02-09T10:32:42Z

It seems to be hard to trigger. I have seen it locally twice. Both were accidental when doing entire runs of the test suite. My experiences:

It won't trigger by just running test_simulate. I ran all tests in this module 10K times and didn't see a single failure
It requires at a minimum running all of nosetests statsmodels\tsa
It occurs between tests. I put a lot of print statements into the runs, basically outputiing a increasing count after each statement and a 'TEST_STARTED' 'TEST_ENDED' at the first and last. The crash always happened after 'TEST_ENDED' but before the next 'TEST_STARTED'. Since these were the firs tand last lines, it is something in the garbage collection.
When it occurred locally, I was able to open visual studio's debugger. I didn't have a debug build, but I could see it was stop 0xC0000005 in mkl_avx2_dll -- a BLAS library. I have a suspicion that there is a very subtle alignment issue in how NumPy arrays are boundary aligned (I think they don't use any alignment, but mkl_avx2 might be expecting 16 or 32 byte alignment). This isn't technicallly a NumPy issue, since they don't vendorize a BLAS
I tried setting up an appveyor run that uses pip NumPy 1.12 which comes with ATLAS to see if it occurs, but I'm didn't succeed since SciPy isn't pip-installable and SciPy's blas interfact is very important here.
I did set up some runs using an older version of MKL but this also crashed on Appveyor.

Going to be hard to debug.

ChadFulton · 2017-02-09T13:26:29Z

Thanks for those notes! I'll keep trying.

bashtage · 2017-07-08T20:17:28Z

@jbrockmendel @josef-pkt @ChadFulton This appears to have been fixed after anaconda updated MKL to 2017.0.3. I think it can be closed.

josef-pkt · 2017-07-20T14:13:33Z

It still fails every once in a while
for example my PR
mkl: 2017.0.3-0
https://ci.appveyor.com/project/josef-pkt/statsmodels/build/1.0.2020/job/1x31rc0k5e3wgmc0

bashtage · 2017-09-26T23:11:09Z

Does is still fail?

josef-pkt · 2017-09-26T23:18:19Z

Yes, but not very often anymore. I had a few cases in the last few days, less than one quarter of test runs as a rough guess.

bashtage · 2017-10-21T16:57:51Z

Has the new MKL (2018) fixed this. I haven’t seen one in a long time.

josef-pkt · 2017-10-21T17:06:43Z

The last one, ignoring Chad's statespace PR. was 6 days ago, AFAICS
https://ci.appveyor.com/project/josef-pkt/statsmodels/build/1.0.2403/job/prru22dxk854jflu
mkl: 2018.0.0-h36b65af_4

josef-pkt · 2017-10-25T16:11:47Z

as update, another hanging test run
https://ci.appveyor.com/project/josef-pkt/statsmodels/build/1.0.2491/job/dodtbv25v4n4j24b
It was quite for some time but it still doesn't last.

bashtage · 2017-10-27T16:36:58Z

I just caught one locally. Didn't have debugging but seems to occur in mkl_avx2.dll which I don't have symbols for, and so it woulnd't help.

ChadFulton · 2018-04-28T18:19:36Z

@bashtage if you have a moment, would you mind telling me a bit about your setup where you locally caught the error with the debug enabled and you saw that something was trying to access self.nobs? e.g. version of Windows, Python, Numpy / Scipy, BLAS / LAPACK...

(I assume no one has seen the segfault anywhere but Windows?)

josef-pkt · 2018-04-28T18:26:43Z

@ChadFulton I don't manage to reproduce the simulate segfault at the moment
Neither pytest ...\test.simulate.py nor pytest ...\statespace\tests segfaulted now.
I'm trying to run the entire test suite in different ways, but that takes time.

josef-pkt · 2018-04-28T18:50:44Z

not reproducible with pytest path to statsmodel either
running in the interpreter

import statsmodels.tsa.statespace as ss
ss.test()

no failures, errors or segfault

but if I do additionally

import statsmodels.api as sm
ss.test()

then I get one failure


================================== FAILURES ===================================
_______________________ TestCompanionMatrix.test_cases ________________________
..\try_py34\lib\site-packages\statsmodels\tsa\statespace\tests\test_tools.py:39:
 in test_cases
    assert_equal(tools.companion_matrix(polynomial), result)
..\try_py34\lib\site-packages\statsmodels\tsa\statespace\tools.py:320: in compan
ion_matrix
    elif polynomial[0] == 1:
E   ValueError: The truth value of an array with more than one element is ambigu
ous. Use a.any() or a.all()
======= 1 failed, 1600 passed, 30 skipped, 29 warnings in 96.57 seconds =======

I will add results here when this is finished (in a restarted python in console)

>>> import statsmodels.api as sm
>>> sm.test()

results: no statespace problem or failure

cvxopt failure as before

================================== FAILURES ===================================
________________________________ test_testers _________________________________
..\try_py34\lib\site-packages\statsmodels\stats\tests\test_knockoff.py:81: in te
st_testers
    RegressionFDR(y, x, tv, design_method=method)
..\try_py34\lib\site-packages\statsmodels\stats\_knockoff.py:83: in __init__
    exog1, exog2, _ = _design_knockoff_sdp(exog)
..\try_py34\lib\site-packages\statsmodels\stats\_knockoff.py:161: in _design_kno
ckoff_sdp
    sol = solvers.sdp(c, G0, h0, [G1], [h1])
c:\users\josef\downloads\winpython-64bit-3.4.4.5qt5\python-3.4.4.amd64\lib\site-
packages\cvxopt\coneprog.py:4129: in sdp
    = ds)
c:\users\josef\downloads\winpython-64bit-3.4.4.5qt5\python-3.4.4.amd64\lib\site-
packages\cvxopt\coneprog.py:1396: in conelp
    misc.update_scaling(W, lmbda, ds, dz)
c:\users\josef\downloads\winpython-64bit-3.4.4.5qt5\python-3.4.4.amd64\lib\site-
packages\cvxopt\misc.py:614: in update_scaling
    offsetU = ind2, offsetVt = ind2)
E   ArithmeticError: 49

cvxopt-1.1.7.dist-info, I don't see a version number in the imported cvxopt.
We might not have any CI testing for cvxopt

josef-pkt · 2018-04-28T21:25:45Z

same problem on python 3.6

after pip install of the 4.6 wheel
segfault on first run

>python
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD6
4)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import statsmodels.regression as smr
>>> smr.test()
Running pytest C:\WinPython64-3.6.5.0Qt5\python-3.6.5.amd64\lib\site-packages\st
atsmodels\regression --tb=short --disable-pytest-warnings
============================= test session starts =============================
platform win32 -- Python 3.6.5, pytest-3.5.0, py-1.5.3, pluggy-0.6.0

-> segfault in test_simulate.py

running the statespace tests again finishes successfuly without failures or errors.
When I run the same sm.test() again in new interpreter session but same command window, then the test finishes without problems and one failure with pandas

no cvxopt failure with cvxopt-1.1.9.dist-info

================================== FAILURES ===================================
_____________________________ test_getframe_smoke _____________________________
C:\WinPython64-3.6.5.0Qt5\python-3.6.5.amd64\lib\site-packages\statsmodels\multi
variate\tests\test_factor.py:227: in test_getframe_smoke
    assert_(isinstance(ldf, pd.formats.style.Styler))
E   AttributeError: module 'pandas' has no attribute 'formats'
 1 failed, 6291 passed, 63 skipped, 2 xfailed, 100 warnings in 692.96 seconds =
>>> import pandas
>>> pandas.__version__
'0.22.0'

>>> pandas.formats
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pandas' has no attribute 'formats'
>>> import pandas.io.formats
>>> pandas.io.formats.style.Styler
<class 'pandas.io.formats.style.Styler'>

looks like a different import path in 0.22.0 :(

josef-pkt · 2018-04-28T22:27:14Z

I guess the pandas formatting test is not run on Travis and Appveyor

    # The Styler option require jinja2, skip if not available
    try:
        from jinja2 import Template
    except ImportError:
        return

I'm using pandas '0.19.1' which has pd.formats but no pd.io.formats
needs another compat PR, if the code fails and not just the unit test
But given that the assert raises an error, the call to the formatting method was without exception
ldf = res.get_loadings_frame(style='display')

jbrockmendel · 2018-04-28T23:07:28Z

This may be tangential, but have you tried moving test_simulate higher in the file? Elsewhere I've noticed tests that can be broken be re-ordering. If e.g. moving it causes the segfault to occur in a different test, that would be useful information.

bashtage · 2018-04-29T08:46:26Z

I could not deterministically trigger the exception. I triggered it by running the full test suite in a loop until it was hit. I instrumented the cython code with lots of notifications and was never able to find an error. Considering this never happens on Linux or OSX, I suspect that it is either a bug in Visual Studio or MKL. The exception was always raised in a MKL dll, which also suggests this. Ideally one woud use 32 byte aligned memory, but this isn't easy with NumPy arrays which only align on 16 bytes IIRC.

bashtage · 2018-04-29T08:47:27Z

FWIW my setup was current Cython on Python 3.6 as of October 2017, so probably NumPy 1.13.

josef-pkt · 2018-04-29T11:55:13Z

I think it's better to disable the simulate tests on Windows for the release, so users don't get a segfault as a first impression of the release when running the test suite.
And I fix the pandas format compatibility issue.
I will leave the test failure on cvxopt. (We might have to look at it for the final release if Debian also has problems, which it had in the past with most likely buggy cvxopt.)

Then this should be fine for the release rc.

ChadFulton · 2018-05-01T23:15:43Z

Good news, maybe. I did find an illegal memory access issue in the simulation smoother that could cause segmentation faults in test_simulate.py, and the fix is in #4580.

I don't know if this fixes the ongoing problem or not since we haven't been able to reliably replicate the Appveyor segfault, but it seems promising.

ChadFulton · 2018-05-15T13:56:58Z

Closing as it appears that #4580 fixed the problem.

ChadFulton added comp-tsa type-bug labels Jan 22, 2017

ChadFulton added this to the 0.9 milestone Feb 2, 2017

ChadFulton mentioned this issue Apr 16, 2017

Unreachable code in statespace? #3602

Closed

josef-pkt mentioned this issue Jun 2, 2017

Bisect Test case that is breaking in appveyor #3728

Closed

thequackdaddy mentioned this issue Jun 27, 2017

BLD: Make appveyor wheels as artifacts #3791

Closed

ChadFulton mentioned this issue Jul 25, 2017

BUG: State space: memory error if k_posdef > k_states #3834

Merged

josef-pkt removed this from the 0.9 milestone Apr 18, 2018

josef-pkt added this to the 0.10 milestone Apr 18, 2018

josef-pkt mentioned this issue Apr 29, 2018

TST pandas compat, skip test_simulate on WIN #4565

Merged

ChadFulton mentioned this issue May 1, 2018

BUG: State space: Eliminate sometimes invalid copy operation in simulation. #4580

Merged

ChadFulton closed this as completed May 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/TST: Appveyor failing on statespace test_simulate #3386

BUG/TST: Appveyor failing on statespace test_simulate #3386

ChadFulton commented Jan 22, 2017

josef-pkt commented Jan 22, 2017

bashtage commented Jan 22, 2017

bashtage commented Feb 2, 2017

bashtage commented Feb 2, 2017

bashtage commented Feb 2, 2017

josef-pkt commented Feb 2, 2017

bashtage commented Feb 2, 2017 via email

bashtage commented Feb 3, 2017

bashtage commented Feb 3, 2017

josef-pkt commented Feb 3, 2017

bashtage commented Feb 3, 2017 •

edited

ChadFulton commented Feb 9, 2017

bashtage commented Feb 9, 2017

ChadFulton commented Feb 9, 2017

bashtage commented Jul 8, 2017

josef-pkt commented Jul 20, 2017

bashtage commented Sep 26, 2017

josef-pkt commented Sep 26, 2017

bashtage commented Oct 21, 2017

josef-pkt commented Oct 21, 2017

josef-pkt commented Oct 25, 2017

bashtage commented Oct 27, 2017

ChadFulton commented Apr 28, 2018

josef-pkt commented Apr 28, 2018

josef-pkt commented Apr 28, 2018 •

edited

josef-pkt commented Apr 28, 2018

josef-pkt commented Apr 28, 2018

jbrockmendel commented Apr 28, 2018

bashtage commented Apr 29, 2018

bashtage commented Apr 29, 2018

josef-pkt commented Apr 29, 2018

ChadFulton commented May 1, 2018

ChadFulton commented May 15, 2018

BUG/TST: Appveyor failing on statespace test_simulate #3386

BUG/TST: Appveyor failing on statespace test_simulate #3386

Comments

ChadFulton commented Jan 22, 2017

josef-pkt commented Jan 22, 2017

bashtage commented Jan 22, 2017

bashtage commented Feb 2, 2017

bashtage commented Feb 2, 2017

bashtage commented Feb 2, 2017

josef-pkt commented Feb 2, 2017

bashtage commented Feb 2, 2017 via email

bashtage commented Feb 3, 2017

bashtage commented Feb 3, 2017

josef-pkt commented Feb 3, 2017

bashtage commented Feb 3, 2017 • edited

ChadFulton commented Feb 9, 2017

bashtage commented Feb 9, 2017

ChadFulton commented Feb 9, 2017

bashtage commented Jul 8, 2017

josef-pkt commented Jul 20, 2017

bashtage commented Sep 26, 2017

josef-pkt commented Sep 26, 2017

bashtage commented Oct 21, 2017

josef-pkt commented Oct 21, 2017

josef-pkt commented Oct 25, 2017

bashtage commented Oct 27, 2017

ChadFulton commented Apr 28, 2018

josef-pkt commented Apr 28, 2018

josef-pkt commented Apr 28, 2018 • edited

josef-pkt commented Apr 28, 2018

josef-pkt commented Apr 28, 2018

jbrockmendel commented Apr 28, 2018

bashtage commented Apr 29, 2018

bashtage commented Apr 29, 2018

josef-pkt commented Apr 29, 2018

ChadFulton commented May 1, 2018

ChadFulton commented May 15, 2018

bashtage commented Feb 3, 2017 •

edited

josef-pkt commented Apr 28, 2018 •

edited