Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add lbfgs for fitting #1147

Merged
merged 18 commits into from Oct 28, 2013
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
75ee789
ENH: add lbfgs for fitting, but testing shows arima errors
alexbrc Oct 24, 2013
73991e5
MAINT: improve compatibility with old scipy versions, but some comple…
alexbrc Oct 25, 2013
1b1b709
MAINT: more flexibility in setting parameters for fitting
alexbrc Oct 25, 2013
cb6bbb4
MAINT: for model fitting, let lbfgs use its own fprime approximation …
alexbrc Oct 25, 2013
5fe2726
MAINT: change another fitting default solver from None to 'lbfgs'
alexbrc Oct 25, 2013
4610c28
MAINT: add keyword to the kalman filter log likelihood calculation fu…
alexbrc Oct 26, 2013
3c763ae
MAINT: avoid side effects when computing log likelihood for the estim…
alexbrc Oct 26, 2013
f3e27c8
MAINT: use a mysterious args=(False,) argument instead of using neste…
alexbrc Oct 26, 2013
9ed378d
MAINT: add conditional code to support varying versions of scipy fmin…
alexbrc Oct 26, 2013
3b2c104
MAINT: default maxiter 35 -> 50
alexbrc Oct 27, 2013
2c6ac4c
MAINT: prune dead conditional path
alexbrc Oct 27, 2013
79fa713
MAINT: track changes to maxiter among layers of code
alexbrc Oct 27, 2013
6f47919
MAINT: add untested bounds for lbfgsb fitting, and change names from …
alexbrc Oct 27, 2013
b9908f1
MAINT: change the name back
alexbrc Oct 27, 2013
060248b
MAINT: finish changing name back
alexbrc Oct 27, 2013
7697b6d
MAINT: remove a warning
alexbrc Oct 27, 2013
713acdf
MAINT: explicitly check scipy version instead of checking for TypeError
alexbrc Oct 27, 2013
c8dc2a2
MAINT: avoid storing complex sigma2 during finite-differences score a…
alexbrc Oct 27, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
117 changes: 100 additions & 17 deletions statsmodels/base/model.py
Expand Up @@ -186,33 +186,39 @@ def fit(self, start_params=None, method='newton', maxiter=100,
start_params : array-like, optional
Initial guess of the solution for the loglikelihood maximization.
The default is an array of zeros.
method : str {'newton','nm','bfgs','powell','cg','ncg','basinhopping'}
Method can be 'newton' for Newton-Raphson, 'nm' for Nelder-Mead,
'bfgs' for Broyden-Fletcher-Goldfarb-Shanno, 'powell' for modified
Powell's method, 'cg' for conjugate gradient, 'ncg' for Newton-
conjugate gradient or 'basinhopping' for global basin-hopping
solver, if available. `method` determines which solver from
scipy.optimize is used. The explicit arguments in `fit` are passed
to the solver, with the exception of the basin-hopping solver. Each
method : str, optional
The `method` determines which solver from `scipy.optimize`
is used, and it can be chosen from among the following strings:

- 'newton' for Newton-Raphson, 'nm' for Nelder-Mead
- 'bfgs' for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
- 'lbfgs' for limited-memory BFGS
- 'powell' for modified Powell's method
- 'cg' for conjugate gradient
- 'ncg' for Newton-conjugate gradient
- 'basinhopping' for global basin-hopping solver

The explicit arguments in `fit` are passed to the solver,
with the exception of the basin-hopping solver. Each
solver has several optional arguments that are not the same across
solvers. See the notes section below (or scipy.optimize) for the
available arguments and for the list of explicit arguments that the
basin-hopping solver supports..
maxiter : int
basin-hopping solver supports.
maxiter : int, optional
The maximum number of iterations to perform.
full_output : bool
full_output : bool, optional
Set to True to have all available output in the Results object's
mle_retvals attribute. The output is dependent on the solver.
See LikelihoodModelResults notes section for more information.
disp : bool
disp : bool, optional
Set to True to print convergence messages.
fargs : tuple
fargs : tuple, optional
Extra arguments passed to the likelihood function, i.e.,
loglike(x,*args)
callback : callable callback(xk)
callback : callable callback(xk), optional
Called after each iteration, as callback(xk), where xk is the
current parameter vector.
retall : bool
retall : bool, optional
Set to True to return list of solutions at each iteration.
Available in Results object's mle_retvals attribute.

Expand Down Expand Up @@ -242,6 +248,18 @@ def fit(self, start_params=None, method='newton', maxiter=100,
epsilon
If fprime is approximated, use this value for the step
size. Only relevant if LikelihoodModel.score is None.
'lbfgs'
m : int
This many terms are used for the Hessian approximation.
factr : float
A stop condition that is a variant of relative error.
pgtol : float
A stop condition that uses the projected gradient.
epsilon
If fprime is approximated, use this value for the step
size. Only relevant if LikelihoodModel.score is None.
maxfun : int
Maximum number of function evaluations to make.
'cg'
gtol : float
Stop when norm of gradient is less than gtol.
Expand Down Expand Up @@ -303,7 +321,7 @@ def fit(self, start_params=None, method='newton', maxiter=100,
cov_params_func = kwargs.setdefault('cov_params_func', None)

Hinv = None # JP error if full_output=0, Hinv not defined
methods = ['newton', 'nm', 'bfgs', 'powell', 'cg', 'ncg',
methods = ['newton', 'nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg',
'basinhopping']
methods += extra_fit_funcs.keys()
if start_params is None:
Expand Down Expand Up @@ -337,6 +355,7 @@ def fit(self, start_params=None, method='newton', maxiter=100,
'newton': _fit_mle_newton,
'nm': _fit_mle_nm, # Nelder-Mead
'bfgs': _fit_mle_bfgs,
'lbfgs': _fit_mle_lbfgs,
'cg': _fit_mle_cg,
'ncg': _fit_mle_ncg,
'powell': _fit_mle_powell,
Expand Down Expand Up @@ -471,6 +490,54 @@ def _fit_mle_bfgs(f, score, start_params, fargs, kwargs, disp=True,
return xopt, retvals


def _fit_mle_lbfgs(f, score, start_params, fargs, kwargs, disp=True,
maxiter=100, callback=None, retall=False,
full_output=True, hess=None):

# Pass the following keyword argument names through to fmin_l_bfgs_b
# if they are present in kwargs, otherwise use the fmin_l_bfgs_b
# default values.
names = ('m', 'pgtol', 'factr', 'maxfun', 'approx_grad')
extra_kwargs = dict((x, kwargs[x]) for x in names if x in kwargs)

if extra_kwargs.get('approx_grad', False):
score = None

epsilon = kwargs.setdefault('epsilon', 1e-8)
bounds = [(None, None)] * len(start_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we include now a constraint solve, we should let users take advantage of it.
if bound not in kwargs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The l_bfgs_b includes another feature which I would like to take advantage of -- this is the combined evaluation of a function and a derivative. But I am putting this off until a later PR. Maybe this later PR could also include optional bounds? Also maybe the solver name should be 'lbfgsb' instead of 'lbfgs' if bounds are allowed :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a different issue, for computational efficiency.
I don't see a need to postpone the check here that the user hasn't defined bounds in kwargs already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to also leave the bounds to a later PR. It's a different feature than the sparse hessian we get here. I plan to handle this separately and generally in #1121. The current constrained is not general either yet.

try:
retvals = optimize.fmin_l_bfgs_b(f, start_params,
fprime=score, args=fargs,
maxiter=maxiter, callback=callback,
bounds=bounds, epsilon=epsilon, disp=disp, **extra_kwargs)
except TypeError:
if maxiter is not None or callback is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxiter will always be not None because it is set to maxiter=100 in the function signature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is what I was referring to in #1147 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing to change though right? Users should be warned that it has no effect when using old scipy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it always raises the warning with old scipy, if the user uses the defaults.

I would set maxiter=None in the signature
and copy it maxiter_ = maxiter
and then assign the default
if maxiter is None: maxiter=100

more code but only a warning if the user sets maxiter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Sun, Oct 27, 2013 at 11:55 AM, Josef Perktold
notifications@github.comwrote:

But it always raises the warning with old scipy, if the user uses the
defaults.

Yes. The warning is then "your version of scipy is so old you're missing
features that we take for granted in the defaults. You need to do work on
your code to set maxiter=None, if you don't want to be reminded."

from warnings import warn
warn("fmin_l_bfgs_b does not support maxiter or callback arguments"
"Update your scipy, otherwise they have no effect",
UserWarning)
retvals = optimize.fmin_l_bfgs_b(f, start_params,
fprime=score, args=fargs,
bounds=bounds, epsilon=epsilon, disp=disp, **extra_kwargs)
if full_output:
xopt, fopt, d = retvals
# The warnflag is
# 0 if converged
# 1 if too many function evaluations or too many iterations
# 2 if stopped for another reason, given in d['task']
warnflag = d['warnflag']
converged = (warnflag == 0)
gopt = d['grad']
fcalls = d['funcalls']
retvals = {'fopt': fopt, 'gopt': gopt,
'fcalls':fcalls, 'warnflag': warnflag,
'converged': converged}
else:
xopt = None

return xopt, retvals


def _fit_mle_nm(f, score, start_params, fargs, kwargs, disp=True,
maxiter=100, callback=None, retall=False,
full_output=True, hess=None):
Expand Down Expand Up @@ -923,7 +990,7 @@ class LikelihoodModelResults(Results):
--------
The covariance of params is given by scale times normalized_cov_params.

Return values by solver if full_ouput is True during fit:
Return values by solver if full_output is True during fit:

'newton'
fopt : float
Expand Down Expand Up @@ -974,6 +1041,22 @@ class LikelihoodModelResults(Results):
True: converged. False: did not converge.
allvecs : list
Results at each iteration.
'lbfgs'
fopt : float
Value of the (negative) loglikelihood at its minimum.
gopt : float
Value of gradient at minimum, which should be near 0.
fcalls : int
Number of calls to loglike.
warnflag : int
Warning flag:

- 0 if converged
- 1 if too many function evaluations or too many iterations
- 2 if stopped for another reason

converged : bool
True: converged. False: did not converge.
'powell'
fopt : float
Value of the (negative) loglikelihood at its minimum.
Expand Down
32 changes: 16 additions & 16 deletions statsmodels/tsa/arima_model.py
Expand Up @@ -485,11 +485,7 @@ def score(self, params):
-----
This is a numerical approximation.
"""
loglike = self.loglike
#if self.transparams:
# params = self._invtransparams(params)
#return approx_fprime(params, loglike, epsilon=1e-5)
return approx_fprime_cs(params, loglike)
return approx_fprime_cs(params, self.loglike, args=(False,))

def hessian(self, params):
"""
Expand All @@ -499,10 +495,7 @@ def hessian(self, params):
-----
This is a numerical approximation.
"""
loglike = self.loglike
#if self.transparams:
# params = self._invtransparams(params)
return approx_hess_cs(params, loglike)
return approx_hess_cs(params, self.loglike, args=(False,))

def _transparams(self, params):
"""
Expand Down Expand Up @@ -665,7 +658,7 @@ def predict(self, params, start=None, end=None, exog=None, dynamic=False):
return predictedvalues
predict.__doc__ = _arma_predict

def loglike(self, params):
def loglike(self, params, set_sigma2=True):
"""
Compute the log-likelihood for ARMA(p,q) model

Expand All @@ -675,17 +668,17 @@ def loglike(self, params):
"""
method = self.method
if method in ['mle', 'css-mle']:
return self.loglike_kalman(params)
return self.loglike_kalman(params, set_sigma2)
elif method == 'css':
return self.loglike_css(params)
else:
raise ValueError("Method %s not understood" % method)

def loglike_kalman(self, params):
def loglike_kalman(self, params, set_sigma2=True):
"""
Compute exact loglikelihood for ARMA(p,q) model using the Kalman Filter.
"""
return KalmanFilter.loglike(params, self)
return KalmanFilter.loglike(params, self, set_sigma2)

def loglike_css(self, params):
"""
Expand Down Expand Up @@ -717,7 +710,7 @@ def loglike_css(self, params):
return llf

def fit(self, order=None, start_params=None, trend='c', method = "css-mle",
transparams=True, solver=None, maxiter=35, full_output=1,
transparams=True, solver='lbfgs', maxiter=35, full_output=1,
disp=5, callback=None, **kwargs):
"""
Fits ARMA(p,q) model using exact maximum likelihood via Kalman filter.
Expand All @@ -741,7 +734,7 @@ def fit(self, order=None, start_params=None, trend='c', method = "css-mle",
`start_params` as starting parameters. See above for more
information.
trend : str {'c','nc'}
Whehter to include a constant or not. 'c' includes constant,
Whether to include a constant or not. 'c' includes constant,
'nc' no constant.
solver : str or None, optional
Solver to be used. The default is 'l_bfgs' (limited memory
Expand Down Expand Up @@ -846,6 +839,8 @@ def fit(self, order=None, start_params=None, trend='c', method = "css-mle",
if transparams: # transform initial parameters to ensure invertibility
start_params = self._invtransparams(start_params)

# NOTE: after having added 'lbfgs' to the list of fitting methods,
# the solver-is-None branch should no longer be necessary
if solver is None: # use default limited memory bfgs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solver is None will also not be true anymore by default solver='bfgs'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's fine just redundant for now, except: why is bounds set here but not below if solver='bfgs'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was keeping this section only for reference, and I'll delete it before merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bounds is set in the lbfgs fit function now for all models.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here:
https://github.com/argriffing/statsmodels/blob/9ed378d6836e347c89b2f2c2487f445b09db29ee/statsmodels/base/model.py#L507
The bounds are always set to Nones because we only want L-BFGS, dropping the -B (boundedness) from L-BFGS-B.

bounds = [(None,)*2]*(k_ar+k_ma+k)
pgtol = kwargs.get('pgtol', 1e-8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set these defaults here, and not in the fmin_bfgs wrapper function. The defaults with the optimizer are probably good enough, I just found these to work in most cases for ARIMA.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dumb python question, but how would I let fmin_l_bfgs_b use its internal defaults while also allowing these args to be optionally specified by the caller through kwargs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if solver == 'lbfgs' in the above and then just remove the getdefault stuff that is in _fit_mle_lbfgs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're essentially doing this now. I just want it so that the defaults for ARIMA are different from the default defaults. Right now it's written so that the defaults are always what they are in ARIMA now. Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something that might address this. I am a little bit confused about the function call chains that modify and pass through the **kwargs. I guess this is common to any code that works with options which can be overridden at multiple levels.

Expand All @@ -858,6 +853,11 @@ def fit(self, order=None, start_params=None, trend='c', method = "css-mle",
params = mlefit[0]

else: # call the solver from LikelihoodModel
if solver == 'lbfgs':
kwargs.setdefault('pgtol', 1e-8)
kwargs.setdefault('factr', 1e2)
kwargs.setdefault('m', 12)
kwargs.setdefault('approx_grad', True)
mlefit = super(ARMA, self).fit(start_params, method=solver,
maxiter=maxiter, full_output=full_output, disp=disp,
callback = callback, **kwargs)
Expand Down Expand Up @@ -946,7 +946,7 @@ def _get_predict_end(self, end, dynamic=False):
return end - self.k_diff, out_of_sample

def fit(self, start_params=None, trend='c', method = "css-mle",
transparams=True, solver=None, maxiter=35, full_output=1,
transparams=True, solver='lbfgs', maxiter=35, full_output=1,
disp=5, callback=None, **kwargs):
"""
Fits ARIMA(p,d,q) model by exact maximum likelihood via Kalman filter.
Expand Down
10 changes: 8 additions & 2 deletions statsmodels/tsa/kalmanf/kalmanfilter.py
Expand Up @@ -611,7 +611,7 @@ def _init_kalman_state(cls, params, arma_model):
newparams, Z_mat, m, R_mat, T_mat, paramsdtype)

@classmethod
def loglike(cls, params, arma_model):
def loglike(cls, params, arma_model, set_sigma2=True):
"""
The loglikelihood for an ARMA model using the Kalman Filter recursions.

Expand All @@ -623,6 +623,10 @@ def loglike(cls, params, arma_model):
coefficients, then the `q` MA coefficients.
arma_model : `statsmodels.tsa.arima.ARMA` instance
A reference to the ARMA model instance.
set_sigma2 : bool, optional
True if arma_model.sigma2 should be set.
Note that sigma2 will be computed in any case,
but it will be discarded if set_sigma2 is False.

Notes
-----
Expand All @@ -647,7 +651,9 @@ def loglike(cls, params, arma_model):
else:
raise TypeError("This dtype %s is not supported "
" Please files a bug report." % paramsdtype)
arma_model.sigma2 = sigma2
if set_sigma2:
arma_model.sigma2 = sigma2

return loglike.item() # return a scalar not a 0d array


Expand Down