TST: test_arima_small_data_bug on current master #1046

Closed
jreback opened this Issue Aug 14, 2013 · 71 comments

Projects

None yet

5 participants

@jreback
Contributor
jreback commented Aug 14, 2013

On curent master, pandas 0.12.0
python 2.7.3, linux 32-bit, numpy 1.7.1

the endog is squeezed to a 0-dim array np.array(5.000)
e.g. np.array([5.0]).squeeze(), so prob need to check this

======================================================================
ERROR: statsmodels.tsa.tests.test_arima.test_arima_small_data_bug
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/vagrant/statsmodels/statsmodels/tsa/tests/test_arima.py", line 1810, in test_arima_small_data_bug
    assert_raises(ValueError, mod.fit)
  File "/usr/local/lib/python2.7/dist-packages/numpy-1.7.1-py2.7-linux-i686.egg/numpy/testing/utils.py", line 1019, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 471, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/vagrant/statsmodels/statsmodels/tsa/arima_model.py", line 828, in fit
    start_params = self._fit_start_params((k_ar,k_ma,k), method)
  File "/home/vagrant/statsmodels/statsmodels/tsa/arima_model.py", line 453, in _fit_start_params
    start_params = self._fit_start_params_hr(order)
  File "/home/vagrant/statsmodels/statsmodels/tsa/arima_model.py", line 421, in _fit_start_params_hr
    coefs = GLS(endog[max(p_tmp+q,p):], X).fit().params
  File "/home/vagrant/statsmodels/statsmodels/regression/linear_model.py", line 260, in __init__
    cholsigmainv=cholsigmainv)
  File "/home/vagrant/statsmodels/statsmodels/regression/linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/vagrant/statsmodels/statsmodels/base/model.py", line 136, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "/home/vagrant/statsmodels/statsmodels/base/model.py", line 52, in __init__
    self.data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "/home/vagrant/statsmodels/statsmodels/base/data.py", line 397, in handle_data
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs)
  File "/home/vagrant/statsmodels/statsmodels/base/data.py", line 78, in __init__
    self._check_integrity()
  File "/home/vagrant/statsmodels/statsmodels/base/data.py", line 246, in _check_integrity
    if len(self.exog) != len(self.endog):
TypeError: len() of unsized object

----------------------------------------------------------------------
@jreback
Contributor
jreback commented Aug 14, 2013

looks like introduced after 5fbd5fe

@jseabold
Member

Hmm, this shouldn't be the case. I saw this while fixing the bug this was a test for and set it to raise a ValueError instead of ever trying to do an estimation with just 1 observation. Will look into it.

@jseabold
Member

Can't replicate on statsmodels master at 455510c with pandas 0.12.0-174-g9fc8636 or pandas master at 146ee99.

That commit and those after don't touch any of this code.

@jreback
Contributor
jreback commented Aug 15, 2013

I think I had another version install (that didn't have the recent commits), so false alarm

@jreback jreback closed this Aug 15, 2013
@yarikoptic
Contributor

I am running into this issue with 0.5.0 while rebuilding it for fresh ubuntu saucy 32bit (64bit build passes tests fine). full log is here http://www.onerussian.com/tmp/statsmodels_0.5.0-1~nd13.04+1+nd13.10+1_i386.build

@jseabold jseabold reopened this Oct 24, 2013
@jseabold
Member

Hmm, this looks like it's limited to pandas 0.12.0 on 32-bit. I'll need to replicate to be able to look into it, which I can't do. Patches welcome.

@yarikoptic
Contributor

why can't you replicate? ;) do you have any debian or ubuntu box? I will give you 1-3 liner to get you going

@jseabold
Member

Yes, I have an ubuntu box. If your solution includes chroot, please don't make me.

@yarikoptic
Contributor

LOL ;) why not -- schroot makes chrooting even easier than all the virtualenv magic

@jseabold
Member

My last foray into getting a 32-bit environment via chroot did not go well...

I'll see if I can just build a 32-bit python and the install the numpython stack. I should be up and running in a few hours...

@jseabold
Member

Hmm, it looks like chroot is easier than robustly cross-compiling. sigh Ok, any hints?

@yarikoptic
Contributor
  • debootstrap the environment, right away instructing to include pandas (which doesn't come from main -- so add all those universe multiverse components )
sudo apt-get install debootstrap schroot
CHROOT=/tmp/saucy-i386; mkdir -p $CHROOT; sudo debootstrap --arch=i386 --include=python-pandas,sudo,wget --components=main,universe,multiverse saucy $CHROOT http://debian.lcs.mit.edu/ubuntu
  • create schroot configuration for it
echo -e "[saucy-i386]\ndescription=Ubuntu saucy  i386 architecture\ndirectory=$CHROOT\ntype=directory\nusers=$USER\nroot-groups=root\n" | sudo bash -c 'cat - > /etc/schroot/chroot.d/saucy-i386'
  • enter the chroot
schroot -c saucy-i386
  • enable neurodebian repository for saucy, enable 'source' portion of the archive, install statsmodels build-depends:
wget -O- http://neuro.debian.net/lists/saucy.us-nh.full | sudo tee /etc/apt/sources.list.d/neurodebian.sources.list
sudo apt-key adv --recv-keys --keyserver pgp.mit.edu 2649A5A9
sudo sed -ie 's,#deb-src ,deb-src ,g' /etc/apt/sources.list.d/neurodebian.sources.list
sudo apt-get update
sudo apt-get build-dep python-statsmodels

and now your should be all set! make sure that you clean you working tree before building extensions -- current logic would skip rebuilding existing .so's if they are up-to-date but for the another architecture ;-)

@jseabold
Member

Any idea how much space it takes? I'm running low and might need to do some partition adjusting.

@jseabold
Member

Also, thanks ;)

@yarikoptic
Contributor

;-) I am short on space too and since it is SSD -- I am trying to do such things in /tmp which is in RAM... I will let you know whenever I complete this exercise then

@yarikoptic
Contributor

while installing it might take up to 900M, but then do sudo apt-get clean which would bring it down to ~600M . Meanwhile I will go and replicate it locally since I have this chroot now ;)

@yarikoptic
Contributor

yeap -- replicated perfectly

novo(saucy-i386):~/deb/gits/pkg-exppsy/statsmodels
*$> nosetests -s -v statsmodels/tsa/tests/test_arima.py:test_arima_small_data_bug
statsmodels.tsa.tests.test_arima.test_arima_small_data_bug ... ERROR

======================================================================
ERROR: statsmodels.tsa.tests.test_arima.test_arima_small_data_bug
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/tsa/tests/test_arima.py", line 1810, in test_arima_small_data_bug
    assert_raises(ValueError, mod.fit)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1019, in assert_raises
    return nose.tools.assert_raises(*args,**kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 475, in assertRaises
    callableObj(*args, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/tsa/arima_model.py", line 828, in fit
    start_params = self._fit_start_params((k_ar,k_ma,k), method)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/tsa/arima_model.py", line 453, in _fit_start_params
    start_params = self._fit_start_params_hr(order)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/tsa/arima_model.py", line 421, in _fit_start_params_hr
    coefs = GLS(endog[max(p_tmp+q,p):], X).fit().params
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/regression/linear_model.py", line 260, in __init__
    cholsigmainv=cholsigmainv)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/regression/linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/base/model.py", line 136, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/base/model.py", line 52, in __init__
    self.data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/base/data.py", line 397, in handle_data
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/base/data.py", line 78, in __init__
    self._check_integrity()
  File "/home/yoh/deb/gits/pkg-exppsy/statsmodels/statsmodels/base/data.py", line 246, in _check_integrity
    if len(self.exog) != len(self.endog):
TypeError: len() of unsized object

----------------------------------------------------------------------
Ran 1 test in 0.019s

FAILED (errors=1)
@jseabold
Member

This is still magic to me. Why didn't I get scipy in the chroot environment?

@jseabold
Member

Or things like vim, etc. I saw them being "retrieved".

@jseabold
Member

Yeah, I'm still pretty unclear on what all needs to be done to be able to do this. It looks like I still need to install scipy, git, vim, distribute, patsy, and cython. But I'm getting an error with Cython. It's trying to use my locally pip-installed Cython version even though I did a sudo apt-get install cython in the chroot?

running build_ext
failed to import Cython: /home/skipper/.local/lib/python2.7/site-packages/Cython/Compiler/Scanning.so: wrong ELF class: ELFCLASS64
error: Cython does not appear to be installed
@yarikoptic
Contributor

you should have got scipy since it is listed for statsmodels in build-depends, so just make sure you did

sudo apt-get update
sudo apt-get build-dep python-statsmodels

inside that chroot

@yarikoptic
Contributor

re cython -- the same story. I bet you have missed a build-dep step somehow ;)

@jseabold
Member
|42 $ sudo apt-get update                      
Ign http://debian.lcs.mit.edu saucy InRelease
Get:1 http://neuro.debian.net data InRelease [13.3 kB]
Hit http://debian.lcs.mit.edu saucy Release.gpg           
Hit http://debian.lcs.mit.edu saucy Release               
Ign http://neuro.debian.net data InRelease           
Get:2 http://neuro.debian.net saucy InRelease [12.3 kB]
Hit http://debian.lcs.mit.edu saucy/main i386 Packages     
Ign http://neuro.debian.net saucy InRelease                
Hit http://debian.lcs.mit.edu saucy/universe i386 Packages
Ign http://neuro.debian.net data/main i386 Packages/DiffIndex
Hit http://debian.lcs.mit.edu saucy/multiverse i386 Packages
Ign http://neuro.debian.net data/contrib i386 Packages/DiffIndex
Hit http://debian.lcs.mit.edu saucy/main Translation-en
Ign http://neuro.debian.net data/non-free i386 Packages/DiffIndex
Hit http://debian.lcs.mit.edu saucy/multiverse Translation-en
Hit http://debian.lcs.mit.edu saucy/universe Translation-en
Ign http://neuro.debian.net saucy/main i386 Packages/DiffIndex
Ign http://neuro.debian.net saucy/contrib i386 Packages/DiffIndex
Ign http://neuro.debian.net saucy/non-free i386 Packages/DiffIndex
Hit http://neuro.debian.net data/main i386 Packages
Hit http://neuro.debian.net data/contrib i386 Packages
Hit http://neuro.debian.net data/non-free i386 Packages
Hit http://neuro.debian.net saucy/main i386 Packages
Hit http://neuro.debian.net saucy/contrib i386 Packages
Hit http://neuro.debian.net saucy/non-free i386 Packages
Ign http://neuro.debian.net data/contrib Translation-en
Ign http://neuro.debian.net data/main Translation-en
Ign http://neuro.debian.net data/non-free Translation-en
Ign http://neuro.debian.net saucy/contrib Translation-en
Ign http://neuro.debian.net saucy/main Translation-en
Ign http://neuro.debian.net saucy/non-free Translation-en
Fetched 25.6 kB in 3s (7117 B/s)
Reading package lists... Done
W: GPG error: http://neuro.debian.net data InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A5D32F012649A5A9
W: GPG error: http://neuro.debian.net saucy InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A5D32F012649A5A9
[~] 
|43 $ sudo apt-get build-dep python-statsmodels
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: You must put some 'source' URIs in your sources.list
@jseabold
Member

Oh, nevermind. I see the problem.

@yarikoptic
Contributor

here we go -- probably cut paste issue forbidden getting the key and running the sed command from my instructions (which I was modifying as I was getting through it myself). look in that post please

@yarikoptic
Contributor

btw -- just a hint -- since /home is bind mounted, you can still just use you regular development enviroenment and simply use chroot'ed shell to run actual build and nosetests, thus avoiding need to clone inside the chroot and/or installing git/vim/... there as well

@jseabold
Member

The problem was just that the apt-key etc. commands just didn't get run when I copy-pasted them. It ran the first command that needed permissions but not the others. Re-running one at a time doesn't seem to bring any problems.

Still have the Cython problem though. Trying to use the one that I pip-installed.

Python 2.7.5+ (default, Sep 19 2013, 13:49:51) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cython
>>> print cython.__file__
/home/skipper/.local/lib/python2.7/site-packages/cython.pyc
>>> import sys
>>> sys.maxsize > 2 ** 32
False
@jseabold
Member

Except I need to rebuild the extensions for 32-bit, so I'm stuck.

@yarikoptic
Contributor

you could even simply start chrootted call straight from your development environment, just preceed the command with 'schroot -c saucy-i386'...

cython must be there if you ran build-dep and it returned without complaints

@jseabold
Member

It's there.

|60 $ sudo apt-get install cython; python setup.py build_ext --inplace
Reading package lists... Done
Building dependency tree       
Reading state information... Done
cython is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
running build_ext
failed to import Cython: /home/skipper/.local/lib/python2.7/site-packages/Cython/Compiler/Scanning.so: wrong ELF class: ELFCLASS64
error: Cython does not appear to be installed
@jseabold
Member

It's trying to read the 64-bit shared library file.

@yarikoptic
Contributor

also it might be worth making sure than while in the chroot you are not using any virtualenv and .local... here you go -- why should it use your .local?

@jseabold
Member

I have no idea.

@yarikoptic
Contributor

just mv .local aside for a moment, so it doesn't interfere -- apparently ~/.local is always in the sys.path there, and since python2 doesn't discriminate architectures in .so filenames -- you are stuck. Thus mv ~/.local{,.aside} and then proceed

@jseabold
Member

Works. Thanks. Replicated. I'll see what this is all about now.

@jseabold
Member

Phew. Ok, it's an underflow problem in the likelihood calculation of AR.fit. Hmm. Nothing to do with pandas.

It's actually kind of a weird perfect prediction issue in AR. You can replicate on 32-bit like so.

Any takers? The odds of this happening have to be astronomically low? Raise an error and tell the user their data sucks?

from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.ar_model import AR
from statsmodels.regression.linear_model import OLS
import numpy as np


vals = [96.2, 98.3, 99.1, 95.5, 94.0, 87.1, 87.9, 86.7402777504474]
ols_params = OLS(vals, np.ones_like(vals)).fit().params
y = vals - ols_params
maxlag = int(round(12*(len(vals)/100.)**(1/4.)))
trend = 'nc'
method = 'cmle'
ic = 'bic'

lag = 5
y_tmp = y[maxlag-lag:]
fit = AR(y_tmp).fit(maxlag=lag, method=method, trend=trend)
# this is nan
print fit.llf
# this is -inf (the most negative bic...)
print fit.bic

# here's why

self = fit.model
nobs = self.nobs
nobs
Y = self.Y
X = self.X
params = fit.params
from scipy.stats import t, norm, ss as sumofsq
ssr = sumofsq(Y.squeeze()-np.dot(X,fit.params))
sigma2 = ssr/nobs
-nobs/2 * (np.log(2*np.pi) + np.log(sigma2)) - ssr/(2*sigma2)

# oops ssr is zero (on 64-bit it's 1e-29)
@jseabold
Member

Yes, actually, we probably should be raising. This is an over-determined system for the OLS. We probably shouldn't even be trying to estimate this. Wasn't there an issue somewhere for this?

@josef-pkt
Member

Not clear to my:
why should 64 versus 32-bit matter? Aren't we calculating with float64 in both cases?
(I ran into problems recently when I didn't realize I had gotten float32 data.)

I don't have the failure on Windows with 32-bit python but numpy 1.6.1 (and pandas 0.12.0)

@jseabold
Member

See above snippet. Pandas doesn't matter. What does the snippet give for you? It's probably related to underlying blas, maybe numpy version though I doubt it.

@josef-pkt
Member

we just got #1146
We need to set a meaningful upper bound on maxlag. But I don't see why we get perfect fit with maxlag=6 in this case.
Same problem in some tsa hypothesis tests that don't work if the sample size is too small.

@josef-pkt
Member

that's what I get, no problem:

>>> np.__version__
'1.6.1'
>>> 
64.3367748261
-65.0952103508
>>> ssr
1.3410635388757201e-29

can you check whether some of your data arrays are not float32?

@josef-pkt
Member

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
on a 64-bit computer

@jseabold
Member

They're not float32.

[131]: X.dot(np.linalg.pinv(X).dot(Y)) == Y
[131]: 
array([[ True],
       [ True]], dtype=bool)
@jseabold
Member

Is that numpy with MKL?

@josef-pkt
Member

No, it's the plain old official binaries for numpy and scipy with MingW and Atlas

doesn't look like a 32-bit issue here (running after running your script)
I don't get nans or infs

>>> fit = AR(y_tmp).fit(maxlag=lag, method=method, trend=trend)
>>> fit.llf
64.336774826088913
>>> fit.bic
-65.095210350818419
>>> fit = AR(np.asarray(y_tmp, np.float32)).fit(maxlag=lag, method=method, trend=trend)
>>> fit.llf
63.951112345276925
>>> fit.bic
-64.709547870006432
@jseabold
Member

Ok, well let's call it a platform-specific 32-bit issue then until someone replicates it on 64-bit. It's either a 32-bit issue or an issue with the underlying lapack svd. I suspect the former unless all of these systems are using the same un-optimized system lapack.

@josef-pkt
Member

Do you know where the nan and inf are created?
If it were the SVD, then I think you should see a "SVD not converged" exception. (At least I saw it very often recently with almost singular arrays).

Something must have changed recently, because this test didn't fail with the 0.5.0 release on Ubuntu 32-bit pythonxy testing. (I'm pretty sure about this because I checked that everything runs clean.)

I would prefer to know what is messing this up, instead of ignoring it by adjusting the test case or working around it.

@jseabold
Member

The nans are created because ssr is 0.0.

@josef-pkt
Member

In python xy https://code.launchpad.net/~pythonxy/+recipe/statsmodels-daily-current
it shows up with sausy and trusty but not with raring (32 bit builds) (I have no idea about the Ubuntu versions.)

@jseabold
Member

It's either a numpy or lapack version issue then I guess. I'm not too terribly interested in digging down into this / spending more time.

@josef-pkt
Member

can you look at the other two pandas tslib errors on ubunty pythonxy?
Are they temporary (pandas daily) or serious?

@josef-pkt
Member

It's either a numpy or lapack version issue then I guess. I'm not too terribly interested in digging down into this / spending more time.

Ok, I'm not using Ubuntu or Debian, and whatever changed doesn't affect Windows (so far).

@jseabold
Member

I can reproduce the pandas errors on master.

@josef-pkt
Member

I just started to understand X.dot(np.linalg.pinv(X).dot(Y)) == Y

It's underdetermined (not overdetermined). We have exact perfect prediction on the failing machines, and almost (1e-15) perfect prediction on the machines where the test doesn't fail.
No reason to worry about the linalg then.

It's mainly an issue to avoid underdetermined or exactly determined systems (or handle them correctly if we accept that they show up).

just restrict maxlag and refuse to calculate if we don't have enough observations?

@jseabold
Member

Right. Underdetermined. Think-o. This is what I was saying and yes I think we should just refuse to do these in the OLS code unless there's an actual use case (and I recall discussing this before). Regardless, this code should never get to this point, so that's a separate issue.

@jseabold
Member

Closed in #1149.

@jseabold jseabold closed this Oct 25, 2013
@yarikoptic
Contributor

cool! is new release coming or should I cherry-pick this one for 0.5.0
to get it built/tested on 32bit?

On Fri, 25 Oct 2013, Skipper Seabold wrote:

Closed in [1]#1149.


Reply to this email directly or [2]view it on GitHub.

References

Visible links

  1. BUG: Fix small data issues for ARIMA.
    #1149
  2. #1046 (comment)

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@jseabold
Member

I'm going to cut 0.5.1 soon (next few days).

@josef-pkt
Member

Next week.
We still need a round of version and platform compatibility testing, although with Ubuntu, Debian and TravisCI testing already, I expect at most minor problems.

@yarikoptic
Contributor

On Fri, 25 Oct 2013, Josef Perktold wrote:

Next week.
We still need a round of version and platform compatibility testing,
although with Ubuntu, Debian and TravisCI testing already, I expect at
most minor problems.

and do not forget to look at
http://nipy.bic.berkeley.edu/waterfall?category=statsmodels
which seems to be in red atm :-/
some of statsmodels.stats.tests.test_diagnostic.TestDiagnosticGPandas have precision issues seems to me

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@josef-pkt
Member

@yarikoptic I don't know what you did to your Debians to make our tests fail. :)
I saw it recently #1113

@yarikoptic
Contributor

Heh heh @josef-pkt -- it is all your fault that now I have "wasted" time on

  • some cruel shell script
    https://github.com/yarikoptic/bblogs2git
    to import all the buildbot logs under GIT so we could easily inspect what has changed between builds without enjoying buildbot's web ui
  • running it on nibotmi logs for statsmodels and pandas to see how useful it would be
    https://github.com/yarikoptic/nibotmi-logs
    so you could now do something like gitk statsmodels-py2_x-sid-sparc to quickly see how builds/environment was different for when it failed, this time in yarikoptic/nibotmi-logs@535024e with all the changes in versions nicely shown because of that print_versions ;)

@matthew-brett: how do you like my script? ;) what about polishing it a bit more and placing it on cron job to add logs and push to github under nipy/nibotmi-logs ? ;-)

BTW git repack is crucial -- .git went from 128M to 8M

@josef-pkt
Member

enjoying your Friday afternoons? :)

I thought it's too difficult to read but I guess it's just updating Atlas
yarikoptic/nibotmi-logs@535024e#diff-ab7e66f8f4eaa929f8018916e8d8239eL1468

Looks useful for more serious changes, in this case we just lower the decimal requirement in the test.

BTW git repack is crucial -- .git went from 128M to 8M

what does that mean? repack on github?

@josef-pkt
Member

I thought it's too difficult to read but I guess it's just updating Atlas

more likely updating scipy, there is no linalg involved in the lilliefors normality tests, or it could be anything else that causes small floating point changes.

useful overview page:
https://github.com/yarikoptic/nibotmi-logs/commits/master/statsmodels-py2_x-sid-sparc

Looks very useful, Thanks @yarikoptic

@matthew-brett
Contributor

Yo yoh - sorry - for some reason the first @matthew-brett ping didn't reach me.

Won't this repo get pretty big if we import all the builds?

@yarikoptic
Contributor

@matthew-brett -- only the 'experience' will show -- I have started that script straight on nibotme box, and will report back what will be the end-size

@yarikoptic
Contributor

@matthew-brett upon aggressive repacking (git repack -a -d -f --window=100) .git is 50M with 12339 commits starting from [PASSED] 0 nibabel-py2_6 from 2011-08-19 17:52:38.000000000 -0700 ... so it seems it should be fine for quite a few years and when it reaches some ceiling we could start thinking about an alternative approach (e.g. per project) or do you think it should be done right away?

@matthew-brett
Contributor

Do you know how the local packing corresponds to the github packing? I guess if it's packed locally it gets sent and received as packfiles? So - the same? If so - yes - that should be fine.

I should say that I offered matplotlib space on the buildbots, and we should probably put numpy on there. Well - we can see later, as you say.

@yarikoptic
Contributor

On Sat, 26 Oct 2013, Matthew Brett wrote:

Do you know how the local packing corresponds to the github packing?

nope -- I do not know but would assume that they should correspond

I
guess if it's packed locally it gets sent and received as
packfiles? So - the same? If so - yes - that should be fine.

I would assume the same too

I should say that I offered matplotlib space on the buildbots, and we
should probably put numpy on there. Well - we can see later, as you say.

Cool -- so I will look into deploying this beast. Meanwhile I have
initiated https://github.com/nipy/nibotme-logs ... thought to push right
away but upon rerun spotted a possible problem with sorting of the logs
-- will check later, fix, report and only then push

Cheers!

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@jseabold jseabold added a commit that referenced this issue Nov 23, 2013
@jseabold jseabold Backport PR #1149: BUG: Fix small data issues for ARIMA.
This closes #1146 and #1046.

1. We now check to make sure that we have at least one degree of freedom to estimate the problem. If so, then we try the estimation.
   1. Most / all of these estimations will return garbage. We have an extra check that we can estimate stationary initial params. Usually we can't in these cases, so the usual error will be raised here asking to set start_params. This should be enough of a warning to the user that this is "odd." If in the small chance the estimation goes through for a model with 5 observations and 1 degree of freedom, it's on the user then to determine things are no good.
2. We now avoid the problem of maxlag >= nobs happening in the call to AR so this avoids the problem of #1046 that also presented itself as part of #1146.
1eba381
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment