Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor optimization code. #1121

Merged
merged 2 commits into from Apr 4, 2014

Conversation

Projects
None yet
4 participants
@jseabold
Copy link
Member

commented Oct 16, 2013

First pass. I tried to write it so that I only had to change the current code in one place. I'd like also to pull out the specialized code for regularization to a general _fit_constrained function too. Is regularized likelihood of general interest outside of the discrete models? I decided that a mix-in was going to be too heavy, but, for consideration, I started with a (private)

class Optimizer:
    def fit(self, ...)
        ...

    def gradient(self, ...):
        ....

    def hessian(self, ...):
        ....

I also answered this question for myself. Why are we doing this? It looks like we're just doing what the new scipy.optimize.minimize does. 1) I don't think we require new enough scipy. 2) we curry results in and out a bit to conform to what we need. 3) I'd like to also provide for estimation with general contraints via l-bfgs-b, sqlsqp, and, if installed cvxopt/openopt.

See discussion in #697.

@coveralls

This comment has been minimized.

Copy link

commented Oct 17, 2013

Coverage Status

Coverage remained the same when pulling b44edd3 on jseabold:optimizer-class into 3b7082c on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Oct 22, 2013

Coverage Status

Coverage remained the same when pulling 549a671 on jseabold:optimizer-class into 3b7082c on statsmodels:master.

@argriffing

This comment has been minimized.

Copy link

commented Oct 22, 2013

I was tempted to try adding L-BFGS-B optimizer into the code, but then I saw this PR... I'm not interested in the -B part (box constraints) of L-BFGS-B, but rather the L- part (limited-memory Hessian approximation).

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2013

@argriffing I use L-BFGS to good effect in the ARIMA. Are you saying to add it to the suite of regular solvers as part of this PR? Then we'll just create the bounds of [(None,None)]*k_exog under the hood. That's no problem. I can do that and then refactor ARIMA to use the normal machinery instead of being special cased. Good idea.

@argriffing

This comment has been minimized.

Copy link

commented Oct 23, 2013

@jseabold Yes exactly. In my code I will use it for MNLogit when the number of parameters (roughly #features * #categories) is so big that the explicit construction and manipulation of the Hessian becomes unwieldy.

@josef-pkt josef-pkt added the PR label Feb 19, 2014

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 23, 2014

I rebased this and made an Optimizer class. It's unfinished for the regularized stuff, but I didn't have to touch much code, and everything still works. I'm thinking about merging it as a WIP. I think I can take advantage of the Optimizer class in the exponential smoothing code in #1489. I've refactored it to be a model-fit-results pattern. They're not quite LikelihoodModel-like enough to inherit from likelihood model, but they could take advantage of the optimizers.

Where are the open issues / code for dealing with extra_params in fit. E.g., GEE, GMM, etc.? I might be able to refactor a bit to cleanly handle that in this code, but I'd like to see what I'm dealing with first and what the potential issues might be. For instance, the regularized fit method deals with the extra alpha param but it's not generalized yet.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 23, 2014

before looking at this PR again:

I don't know if the optimizers need to handle extra_params separately, since they are usually concatenated for optimization. Mixed PR for example estimates covariance parameters in the same optimization with the mean parameters. GEE uses, like GLM and RLM, its own optimization.

For GMM, and general M-estimators, and partially for Mixed the main benefit come from specifying objective functions and derivatives different than loglike and score.

my version of M-estimators for non-linear functions was just using loglike and score even though that names are wrong.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 23, 2014

Yeah, that sounds right.

This PR is just a refactor. I didn't change any code except fit now calls _fit which comes from Optimizer which LikelihoodModel now inherits from. Doesn't require any changes to any other code or how code is written in the future. It just makes this code a bit more modular and now _fit makes it so we don't rely on the likelihood being the objective function anymore.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Squashed and rebased. Would like to go ahead and merge this.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

I don't see any problems with merging this.

The optimizer is not really needed as a mixin.
Instead we could just delegate to Optimizer()._fit()
The Optimizer mixin class doesn't directly access any attributes of the Model instance, does it?

AFAICS, this PR does just reshuffling the code without any real changes (except for the additional _fit method)
It's a good refactoring, and will make it easier to do more refactoring.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Yeah it doesn't change anything. I had it as a function before squashing to a mix-in. Can't recall why I went with the class off-hand. I'll have a look.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Ah, I think the reason to set it as mix-in is that I thought it would grow some methods for setting constraints. E.g., before calling fit, you do mod.set_constraint('param1', 0) or something along these lines. I didn't think through the API a lot. Maybe exploit patsy here. That kind of violates my "I don't like procedural code" stance, but no one ever accused me of being a pillar of virtue.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

I don't mind the class, I think it can become useful, and if not, then we still don't loose anything.

The question is mixin versus delegation.

For now I would just do in LikelihoodModel.fit

optimizer = Optimizer()
fit = optimizer._fit(......)

This would leave the LikelihoodModel completely unchanged for now, and we can switch to Mixin if we find a use for it later

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

looks good to merge when TravisCI reports back

jseabold added a commit that referenced this pull request Apr 4, 2014

Merge pull request #1121 from jseabold/optimizer-class
WIP: Refactor optimization code.

@jseabold jseabold merged commit 24f862c into statsmodels:master Apr 4, 2014

1 check was pending

continuous-integration/travis-ci The Travis CI build is in progress
Details

@jseabold jseabold deleted the jseabold:optimizer-class branch Apr 4, 2014

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.