WIP: Refactor optimization code. #1121

Merged
merged 2 commits into from Apr 4, 2014

Projects

None yet

4 participants

@jseabold
Member

First pass. I tried to write it so that I only had to change the current code in one place. I'd like also to pull out the specialized code for regularization to a general _fit_constrained function too. Is regularized likelihood of general interest outside of the discrete models? I decided that a mix-in was going to be too heavy, but, for consideration, I started with a (private)

class Optimizer:
    def fit(self, ...)
        ...

    def gradient(self, ...):
        ....

    def hessian(self, ...):
        ....

I also answered this question for myself. Why are we doing this? It looks like we're just doing what the new scipy.optimize.minimize does. 1) I don't think we require new enough scipy. 2) we curry results in and out a bit to conform to what we need. 3) I'd like to also provide for estimation with general contraints via l-bfgs-b, sqlsqp, and, if installed cvxopt/openopt.

See discussion in #697.

@coveralls

Coverage Status

Coverage remained the same when pulling b44edd3 on jseabold:optimizer-class into 3b7082c on statsmodels:master.

@coveralls

Coverage Status

Coverage remained the same when pulling 549a671 on jseabold:optimizer-class into 3b7082c on statsmodels:master.

@argriffing

I was tempted to try adding L-BFGS-B optimizer into the code, but then I saw this PR... I'm not interested in the -B part (box constraints) of L-BFGS-B, but rather the L- part (limited-memory Hessian approximation).

@jseabold
Member

@argriffing I use L-BFGS to good effect in the ARIMA. Are you saying to add it to the suite of regular solvers as part of this PR? Then we'll just create the bounds of [(None,None)]*k_exog under the hood. That's no problem. I can do that and then refactor ARIMA to use the normal machinery instead of being special cased. Good idea.

@argriffing

@jseabold Yes exactly. In my code I will use it for MNLogit when the number of parameters (roughly #features * #categories) is so big that the explicit construction and manipulation of the Hessian becomes unwieldy.

@josef-pkt josef-pkt added the PR label Feb 19, 2014
@jseabold
Member

I rebased this and made an Optimizer class. It's unfinished for the regularized stuff, but I didn't have to touch much code, and everything still works. I'm thinking about merging it as a WIP. I think I can take advantage of the Optimizer class in the exponential smoothing code in #1489. I've refactored it to be a model-fit-results pattern. They're not quite LikelihoodModel-like enough to inherit from likelihood model, but they could take advantage of the optimizers.

Where are the open issues / code for dealing with extra_params in fit. E.g., GEE, GMM, etc.? I might be able to refactor a bit to cleanly handle that in this code, but I'd like to see what I'm dealing with first and what the potential issues might be. For instance, the regularized fit method deals with the extra alpha param but it's not generalized yet.

@josef-pkt
Member

before looking at this PR again:

I don't know if the optimizers need to handle extra_params separately, since they are usually concatenated for optimization. Mixed PR for example estimates covariance parameters in the same optimization with the mean parameters. GEE uses, like GLM and RLM, its own optimization.

For GMM, and general M-estimators, and partially for Mixed the main benefit come from specifying objective functions and derivatives different than loglike and score.

my version of M-estimators for non-linear functions was just using loglike and score even though that names are wrong.

@jseabold
Member

Yeah, that sounds right.

This PR is just a refactor. I didn't change any code except fit now calls _fit which comes from Optimizer which LikelihoodModel now inherits from. Doesn't require any changes to any other code or how code is written in the future. It just makes this code a bit more modular and now _fit makes it so we don't rely on the likelihood being the objective function anymore.

@jseabold
Member
jseabold commented Apr 3, 2014

Squashed and rebased. Would like to go ahead and merge this.

@josef-pkt
Member

I don't see any problems with merging this.

The optimizer is not really needed as a mixin.
Instead we could just delegate to Optimizer()._fit()
The Optimizer mixin class doesn't directly access any attributes of the Model instance, does it?

AFAICS, this PR does just reshuffling the code without any real changes (except for the additional _fit method)
It's a good refactoring, and will make it easier to do more refactoring.

@jseabold
Member
jseabold commented Apr 3, 2014

Yeah it doesn't change anything. I had it as a function before squashing to a mix-in. Can't recall why I went with the class off-hand. I'll have a look.

@jseabold
Member
jseabold commented Apr 3, 2014

Ah, I think the reason to set it as mix-in is that I thought it would grow some methods for setting constraints. E.g., before calling fit, you do mod.set_constraint('param1', 0) or something along these lines. I didn't think through the API a lot. Maybe exploit patsy here. That kind of violates my "I don't like procedural code" stance, but no one ever accused me of being a pillar of virtue.

@josef-pkt
Member

I don't mind the class, I think it can become useful, and if not, then we still don't loose anything.

The question is mixin versus delegation.

For now I would just do in LikelihoodModel.fit

optimizer = Optimizer()
fit = optimizer._fit(......)

This would leave the LikelihoodModel completely unchanged for now, and we can switch to Mixin if we find a use for it later

@josef-pkt
Member

looks good to merge when TravisCI reports back

@jseabold jseabold merged commit 24f862c into statsmodels:master Apr 4, 2014

1 check was pending

continuous-integration/travis-ci The Travis CI build is in progress
Details
@jseabold jseabold deleted the jseabold:optimizer-class branch Apr 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment