First pass. I tried to write it so that I only had to change the current code in one place. I'd like also to pull out the specialized code for regularization to a general _fit_constrained function too. Is regularized likelihood of general interest outside of the discrete models? I decided that a mix-in was going to be too heavy, but, for consideration, I started with a (private)
def fit(self, ...)
def gradient(self, ...):
def hessian(self, ...):
I also answered this question for myself. Why are we doing this? It looks like we're just doing what the new scipy.optimize.minimize does. 1) I don't think we require new enough scipy. 2) we curry results in and out a bit to conform to what we need. 3) I'd like to also provide for estimation with general contraints via l-bfgs-b, sqlsqp, and, if installed cvxopt/openopt.
See discussion in #697.
Coverage remained the same when pulling b44edd3 on jseabold:optimizer-class into 3b7082c on statsmodels:master.
Coverage remained the same when pulling 549a671 on jseabold:optimizer-class into 3b7082c on statsmodels:master.
I was tempted to try adding L-BFGS-B optimizer into the code, but then I saw this PR... I'm not interested in the -B part (box constraints) of L-BFGS-B, but rather the L- part (limited-memory Hessian approximation).
@argriffing I use L-BFGS to good effect in the ARIMA. Are you saying to add it to the suite of regular solvers as part of this PR? Then we'll just create the bounds of [(None,None)]*k_exog under the hood. That's no problem. I can do that and then refactor ARIMA to use the normal machinery instead of being special cased. Good idea.
@jseabold Yes exactly. In my code I will use it for MNLogit when the number of parameters (roughly #features * #categories) is so big that the explicit construction and manipulation of the Hessian becomes unwieldy.
I rebased this and made an Optimizer class. It's unfinished for the regularized stuff, but I didn't have to touch much code, and everything still works. I'm thinking about merging it as a WIP. I think I can take advantage of the Optimizer class in the exponential smoothing code in #1489. I've refactored it to be a model-fit-results pattern. They're not quite LikelihoodModel-like enough to inherit from likelihood model, but they could take advantage of the optimizers.
Where are the open issues / code for dealing with extra_params in fit. E.g., GEE, GMM, etc.? I might be able to refactor a bit to cleanly handle that in this code, but I'd like to see what I'm dealing with first and what the potential issues might be. For instance, the regularized fit method deals with the extra alpha param but it's not generalized yet.
before looking at this PR again:
I don't know if the optimizers need to handle extra_params separately, since they are usually concatenated for optimization. Mixed PR for example estimates covariance parameters in the same optimization with the mean parameters. GEE uses, like GLM and RLM, its own optimization.
For GMM, and general M-estimators, and partially for Mixed the main benefit come from specifying objective functions and derivatives different than loglike and score.
my version of M-estimators for non-linear functions was just using loglike and score even though that names are wrong.
Yeah, that sounds right.
This PR is just a refactor. I didn't change any code except fit now calls _fit which comes from Optimizer which LikelihoodModel now inherits from. Doesn't require any changes to any other code or how code is written in the future. It just makes this code a bit more modular and now _fit makes it so we don't rely on the likelihood being the objective function anymore.
REF: First pass at pulling optimization out of LikelihoodModel.
Squashed and rebased. Would like to go ahead and merge this.
I don't see any problems with merging this.
The optimizer is not really needed as a mixin.
Instead we could just delegate to Optimizer()._fit()
The Optimizer mixin class doesn't directly access any attributes of the Model instance, does it?
AFAICS, this PR does just reshuffling the code without any real changes (except for the additional _fit method)
It's a good refactoring, and will make it easier to do more refactoring.
Yeah it doesn't change anything. I had it as a function before squashing to a mix-in. Can't recall why I went with the class off-hand. I'll have a look.
Ah, I think the reason to set it as mix-in is that I thought it would grow some methods for setting constraints. E.g., before calling fit, you do mod.set_constraint('param1', 0) or something along these lines. I didn't think through the API a lot. Maybe exploit patsy here. That kind of violates my "I don't like procedural code" stance, but no one ever accused me of being a pillar of virtue.
I don't mind the class, I think it can become useful, and if not, then we still don't loose anything.
The question is mixin versus delegation.
For now I would just do in LikelihoodModel.fit
optimizer = Optimizer()
fit = optimizer._fit(......)
This would leave the LikelihoodModel completely unchanged for now, and we can switch to Mixin if we find a use for it later
REF: Delegate don't inherit
looks good to merge when TravisCI reports back