New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when running statsmodels.tsa.stattools.adfuller #4703
Comments
50,000 nobs might be too large for the default lag search How long does adfuller work, i.e. very short or several seconds, before segfaulting? inf and nan can cause errors in the LAPACK, library. But if you don't have any in your data, then something is going wrong during lag-search, I guess. |
It segfaults almost instantly, even with I tried running two versions of the script to test this:
The first version does involve creating a connection to HDFS, and using |
Can you access information about the data just before it goes into adfuller, e.g. using pdb or printing some information? |
It's definitely not the data that's the problem -- I checked for NaNs, infinities, etc. All the values themselves are fine. Even when just loading in the values from the Update: I've narrowed the issue down to the creation of a Without the line that initializes the |
Are the data coming form that hdfs? Another possible check is to make a copy of the data with |
When I ran the debugging tests, the data was coming straight from the The debugging script I'm running is:
When I leave the I do get this warning when I initiate the HDFS connection ( Another Update:
still causes Does statsmodels interact with Java at all behind the scenes? Maybe that has something to do with it? |
adfuller is pure numpy based in-memory computation, the large parts of the actual computation are in the Fortran linalg libraries which might be OpenBLAS in your system. for example you could try to replace the adfuller call with I don't know anything about Hadoop, but it looks like adfuller doesn't have a standard c-python backend anymore. |
so looks like it's OpenBLAS. I'm running all this inside an Ubuntu 16.04 Docker image (inside a Kubeflow/JupyterHub pod), for context, so some configs might be different. Not too familiar with the inner workings of numpy myself. I guess my next step is to see if I can find a way to get a full backtrace to figure out where exactly things are going wrong. |
I was going to ask based on the gdb output if you were using jython or something. IIRC hadoop uses java.
I've never gotten this to give me anything useful personally, but supposedly |
What happens when you run
And then what is If this works then the problem is somewhere else. It is possible that another extension module (pyarrow) is clobbering some memory which leads to the segfault. |
@bashtage both those scenarios worked just fine. It must be something related to |
@ChadFulton @josef-pkt closeable. |
Thanks @bashtage |
Hi, I'm trying to run the
adfuller
stationarity test on a 50,000 length sequence, but keep getting a segmentation fault on theadfuller()
call. I've tried passing the input in as a Pandas Series, and as a regular Python list. There are no NaN/infinite values in the sequence. Besides passing in the input sequence, the other arguments were left blank.The error message is just
Segmentation fault (core dumped)
.When I run it through GDB, I get:
I know for a fact that the segfault occurs during the call to
adfuller()
. Any thoughts on what could be going wrong? I'm running this in a Conda Python 3.6 environment on Ubuntu 16.04.The text was updated successfully, but these errors were encountered: