Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add lbfgs for fitting #1147

Merged
merged 18 commits into from Oct 28, 2013

Conversation

Projects
None yet
5 participants
@argriffing
Copy link

commented Oct 24, 2013

I tried to add lbfgs as a fitting method, in anticipation of eventually using the L-BFGS-B simultaneous f(x) and f'(x) evaluation interface, but the changes in this commit do not pass the unit tests.

bounds = [(None, None)] * len(start_params)
retvals = optimize.fmin_l_bfgs_b(f, start_params, fprime=score, args=fargs,
bounds=bounds, m=m, factr=factr, pgtol=pgtol, epsilon=epsilon,
maxfun=maxfun, maxiter=maxiter, disp=disp, callback=callback)

This comment has been minimized.

Copy link
@josef-pkt
@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

What's our minimum scipy requirement these days? 0.9.0 is almost 3 years old.

At the very least, I'd wrap the call in a try/except and fallback to the 0.9.0 behavior without maxiter and callback, perhaps raising a warning if they're not None.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

I was thinking about increasing the scipy version requirement > 0.9, but I'd rather stick with for the main statsmodels functionality with a scipy version that's available by default on Ubuntu and TravisCI.
(python 2.6 is 5 years old)

I switched away from using scipy 0.9.0 as my default development version, but haven't looked yet what we would like to use from newer scipy besides optimization. There is also rank-revealing QR in linalg (scipy >= 0.10, IIRC)

BTW: wheels are coming for Linux scipy/scipy#3020

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

I plan to remove these two args for compatibility, but I almost want to suggest that if people want to use an old version of scipy then they can use an old version of statsmodels too! According to http://packages.ubuntu.com/search?keywords=python-scipy the current Ubuntu ('saucy') uses scipy version 0.12.0.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

I say keep them, have the code something like this

try:
    fmin_l_bfgs_b(..., maxiter=maxiter, callback=callback)
except TypeError:
    if maxiter is not None or callback is not None:
        from warnings import warn
        warn("fmin_l_bfgs_b does not support maxiter or callback arguments"
                "Update your scipy, otherwise they have no effect", UserWarning)
    fmin_l_bfgs_b(...)
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

I also think Skipper's proposal is the best way.

In general: statsmodels is much easier to build than scipy. I don't think 3 years is too old to just drop it without strong reasons.

MAINT: improve compatibility with old scipy versions, but some comple…
…x number are sneaking in and causing tests to fail
@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

I just commented them out and added a note, for less clutter. This change also reduced the number of locally failing ARIMA tests from 5 or 6 down to only 1. The failing test seems to be caused by a complex number sneaking in, and some passing tests also complain about complex numbers.

@@ -846,6 +846,8 @@ def fit(self, order=None, start_params=None, trend='c', method = "css-mle",
if transparams: # transform initial parameters to ensure invertibility
start_params = self._invtransparams(start_params)

# NOTE: after having added 'lbfgs' to the list of fitting methods,
# the solver-is-None branch should no longer be necessary
if solver is None: # use default limited memory bfgs
bounds = [(None,)*2]*(k_ar+k_ma+k)
pgtol = kwargs.get('pgtol', 1e-8)

This comment has been minimized.

Copy link
@jseabold

jseabold Oct 25, 2013

Member

I would set these defaults here, and not in the fmin_bfgs wrapper function. The defaults with the optimizer are probably good enough, I just found these to work in most cases for ARIMA.

This comment has been minimized.

Copy link
@argriffing

argriffing Oct 25, 2013

Author

This is a dumb python question, but how would I let fmin_l_bfgs_b use its internal defaults while also allowing these args to be optionally specified by the caller through kwargs?

This comment has been minimized.

Copy link
@jseabold

jseabold Oct 25, 2013

Member

if solver == 'lbfgs' in the above and then just remove the getdefault stuff that is in _fit_mle_lbfgs

This comment has been minimized.

Copy link
@jseabold

jseabold Oct 25, 2013

Member

You're essentially doing this now. I just want it so that the defaults for ARIMA are different from the default defaults. Right now it's written so that the defaults are always what they are in ARIMA now. Does that make sense?

This comment has been minimized.

Copy link
@argriffing

argriffing Oct 25, 2013

Author

I did something that might address this. I am a little bit confused about the function call chains that modify and pass through the **kwargs. I guess this is common to any code that works with options which can be overridden at multiple levels.

alexbrc added some commits Oct 25, 2013

MAINT: for model fitting, let lbfgs use its own fprime approximation …
…instead of passing a separate function that uses finite differences
@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

The tests are failing because somewhere a variance is negative, and this gives imaginary numbers when you try to take its square root to compute the standard deviation. I know nothing about econometrics, so I'm having a hard time tracking it down.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

It's unclear to me why anything should be different given where the lbfgs code is called from. Has anything else changed?

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

If mlefit = super(ARMA, self).fit(...) does something more magical than calling the new lbfgs mle function, then this could cause a difference in the behavior.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

Yeah, let me have a look at this.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

I found the problem. We have this code in LikelihoodModel.fit after the fitting is done.

Hinv = np.linalg.inv(self.hessian(xopt))

Since the Hessian of the model is done with the complex-step derivative, it passes complex params to loglikelihood of the KalmanFilter. This method sets sigma2. That's where the complex sigma2 comes from. Path of least resistance is just to add a keyword to loglike in kalmanftiler.py on whether or not to set sigma2. By default it should be true, but when either score or hessian calls this likelihood, then it should be false. Make sense?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

you could just replace sigma2 by sigma2.real at the end of fit. Or is this too late?

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

I dunno. Makes me uncomfortable. I'm already uncomfortable that this only showed up in 1 failing bug fix test.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 25, 2013

Also, make sure that you change solver=None to solver='lbfgs' in the other fit method as well.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

Also, make sure that you change solver=None to solver='lbfgs' in the other fit method as well.

@jseabold do you mean the ARIMA model (in addition to the ARMA model for which the solver has already been changed)? [EDIT: assuming you meant this one, I've made this change in https://github.com/argriffing/statsmodels/commit/5fe272685cba0d5634683054b1ecffac561f9c92]

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 25, 2013

Path of least resistance is just to add a keyword to loglike in kalmanftiler.py on whether or not to set sigma2.
By default it should be true, but when either score or hessian calls this likelihood, then it should be false.
Make sense?

Could this go into a subsequent PR after this present one is merged? I made these changes.

alexbrc added some commits Oct 26, 2013

MAINT: add keyword to the kalman filter log likelihood calculation fu…
…nction, reducing the amount of side effects of the function when False is passed
@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 26, 2013

@jseabold I implemented your suggestions and now the tests pass.

@coveralls

This comment has been minimized.

Copy link

commented Oct 26, 2013

Coverage Status

Coverage remained the same when pulling 3c763ae on argriffing:add-lbfgs-fit into 1bc6576 on statsmodels:master.

return approx_fprime_cs(params, loglike)
def pure_loglike(params):
return self.loglike(params, set_sigma2=False)
return approx_fprime_cs(params, pure_loglike)

This comment has been minimized.

Copy link
@jseabold

jseabold Oct 26, 2013

Member

approx_fprime_cs takes an args and kwargs arguments. You could just do

return approx_fprime_cs(params, loglike, args=(False,))

This comment has been minimized.

Copy link
@argriffing
return approx_hess_cs(params, loglike)
def pure_loglike(params):
return self.loglike(params, set_sigma2=False)
return approx_hess_cs(params, pure_loglike)

This comment has been minimized.

Copy link
@jseabold

jseabold Oct 26, 2013

Member

Same comment.

This comment has been minimized.

Copy link
@argriffing
@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 26, 2013

Overall this looks good. If you add the try/except loop into the fit_mle_lbfgs, it should be about ready to merge.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 27, 2013

I removed the warning. Personally I would rather explicitly check scipy versions (using LooseVersion or something) because that's what we are really doing, and catching the type error as a scipy version proxy could mask type errors inside the fmin call.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

I'm fine with that. I worried a bit about just catching TypeErrors too.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

Checking scipy version is also fine,

BTW: running the test suite on current master I also get a runtime warning, related to L-BFGS-B that might be resolved with this PR as a side effect ?
C:\Programs\Python27\lib\site-packages\scipy\optimize\_minimize.py:297: RuntimeWarning: Method L-BFGS-B does not use Hessian information (hess)

@coveralls

This comment has been minimized.

Copy link

commented Oct 27, 2013

Coverage Status

Coverage remained the same when pulling 7697b6d on argriffing:add-lbfgs-fit into 1bc6576 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

It's fixed in #1121.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 27, 2013

I've added the explicit scipy version check. As far as I know, the only remaining weirdness about this PR is the complex sigma2 which I guess @jseabold is investigating.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

Ah, right. I saw this earlier but forgot to comment. The loglike_css method also needs the set_sigma2 keyword.

@coveralls

This comment has been minimized.

Copy link

commented Oct 27, 2013

Coverage Status

Coverage remained the same when pulling 713acdf on argriffing:add-lbfgs-fit into 1bc6576 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

Let me check a bit more first.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

Yeah this should do it

diff --git a/statsmodels/tsa/arima_model.py b/statsmodels/tsa/arima_model.py
index f365295..fed0540 100644
--- a/statsmodels/tsa/arima_model.py
+++ b/statsmodels/tsa/arima_model.py
@@ -670,7 +670,7 @@ class ARMA(tsbase.TimeSeriesModel):
         if method in ['mle', 'css-mle']:
             return self.loglike_kalman(params, set_sigma2)
         elif method == 'css':
-            return self.loglike_css(params)
+            return self.loglike_css(params, set_sigma2)
         else:
             raise ValueError("Method %s not understood" % method)

@@ -680,7 +680,7 @@ class ARMA(tsbase.TimeSeriesModel):
         """
         return KalmanFilter.loglike(params, self, set_sigma2)

-    def loglike_css(self, params):
+    def loglike_css(self, params, set_sigma2=True):
         """
         Conditional Sum of Squares likelihood function.
         """
@@ -705,7 +705,8 @@ class ARMA(tsbase.TimeSeriesModel):

         ssr = np.dot(errors,errors)
         sigma2 = ssr/nobs
-        self.sigma2 = sigma2
+        if set_sigma2:
+            self.sigma2 = sigma2
         llf = -nobs/2.*(log(2*pi) + log(sigma2)) - ssr/(2*sigma2)
         return llf

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 27, 2013

Yeah this should do it

done

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 27, 2013

Great. Thanks for seeing this through. I think it's a nice addition. Long on my TODO list.

@coveralls

This comment has been minimized.

Copy link

commented Oct 27, 2013

Coverage Status

Coverage remained the same when pulling c8dc2a2 on argriffing:add-lbfgs-fit into 1bc6576 on statsmodels:master.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 28, 2013

So this can be merged then? After it's merged, I'd like to add simultaneous loglike and score. Although my eventual target is MNLogit with 'big data', MNLogit is not so great for testing this feature because lbfgs is not the best way for fitting models/data that are small enough to be good test cases. Could you guys suggest a better model/data for testing this feature (loglike+score)? @jseabold had mentioned that ARIMA has very similar loglike and score calculations. Do you have links/references to the formulas?

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

This is the best reference I used

http://www.amazon.com/Series-Analysis-Methods-Statistical-Science/dp/019964117X

I don't have it at the moment, but there should be a section on computing the derivatives while you compute the likelihood for the Kalman Filter ARMA model. If not, I'd try googling.

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

Ok by me to merge.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

Could you guys suggest a better model/data for testing this feature (loglike+score)?

For setting up the pattern/support a simple model like Logit might be easiest. However, AR(I)MA is most likely the one it would currently have the most performance impact.

There are some differences:
In Logit and other discrete models we have explicit gradient/score calculations, so we only need to calculate them jointly and connect those to the optimizer.
In AR(I)MA the score uses numerical differentiation. For numerical derivatives, we would have to add the additional returns back in the numdiff functions to calculate results jointly (we dropped them at some point).

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

Re: ARIMA. You would drop the numerical calculations completely. You can compute the score in the same Kalman Filter loop as the likelihood by adding a few lines, so the exact derivative and log-likelihood could be returned with only one loop. I don't recall the details but it should be in the Durbin and Koopman book.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 28, 2013

If not, I'd try googling.

I asked instead of googled, because I assume that there are dozens or hundreds of variants or alternative parameterizations of the model, and I wanted to be sure to get the one that matches the log likelihood that is implemented in statsmodels. I just hiked to the library and checked out the Durbin and Koopman book, so maybe it will work!

Ok by me to merge.

I have no power here, but maybe @josef-pkt would be willing to click the button...

@jseabold

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

Yeah, let me know. It should be pretty straightforward even to just do the math but that will help you figure out what's what. Let me know if you don't have any luck and I'll see if I can find some other references, but I don't have much time to spend on this right now. It might be easier to do the Logit first since the analytic derivatives are already done. The MNLogit derivatives are a little hairy. Took me a while to get all the axes right.

josef-pkt added a commit that referenced this pull request Oct 28, 2013

Merge pull request #1147 from argriffing/add-lbfgs-fit
ENH: add lbfgs for fitting

@josef-pkt josef-pkt merged commit 915abdc into statsmodels:master Oct 28, 2013

1 check passed

default The Travis CI build passed
Details
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 28, 2013

merged

Thanks @argriffing

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

Yeah, let me know. It should be pretty straightforward even to just do the math

Sorry, I got lost within all of the kalman and AR* code, so I tested with MNLogit instead.

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.