Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

Merged
merged 5 commits into from Nov 2, 2013

Conversation

Projects
None yet
4 participants
@argriffing
Copy link

commented Oct 29, 2013

No description provided.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

related to issue #1141

@@ -1588,6 +1588,21 @@ def score(self, params):
#NOTE: might need to switch terms if params is reshaped
return np.dot(firstterm.T, self.exog).flatten()

def loglike_and_score(self, params):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Oct 29, 2013

Member

Update: maybe not such a good idea, because your _fit_mle_lbfgs covers several cases of loglike and score

Since this is mainly for optimization and doesn't affect other statistical results, I would define it in terms of the negative logliglihood nloglike_and_score avoids the negation.

in statsmodels.base.model.GenericLikelihoodModel and subclasses, I used nloglike

This comment has been minimized.

Copy link
@argriffing

argriffing Oct 29, 2013

Author

Is GenericLikelihoodModel even used? I've run across swaths of statsmodels code that is either commented out or has comments like 'dead code below here' related to the generic likelihood model.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Oct 29, 2013

Member

GenericLikelihoodModel is the rapid prototyping and development version. Most likelihood models grow up and then integrate into the LikelihoodModel hierarchy and don't subclass GenericLikelihoodModel.
GenericLikelihoodModel has more preconfigured or more general defaults than LikelihoodModel.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 29, 2013

I think, you could try to special case it in line L378, if loglike_and_score are available, to make it used automatically.
I haven't looked how this would look like after Skipper's refactoring/outsourcing.

looks good so far.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

I think, you could try to special case it in line L378,
if loglike_and_score are available, to make it used automatically.

I'm not sure exactly what you mean here, but maybe you are talking about statsmodels/base/model.py? If so, I agree that this could make sense. There are so many statsmodels layers that I'm not sure which changes in which places would be most appropriate.

Regarding the comments about loglike, score, nloglike_and_score, and negations, the statmodels situation is confusing to me because functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code. I would prefer to have loglike never mean negative log likelihood, and to have score never mean the negative of the log likelihood gradient, but on the other hand it is probably not a good use of my time to go through and change the variable names everywhere this occurs in statsmodels.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Oct 29, 2013

functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code

Can you just point them out or open an issue for them.
We shouldn't have different definitions with the same name, so it should be cleaned up, but that's not part of this PR.

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

Can you just point them out or open an issue for them.

added as issue #1162

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

Is the Travis failure a known issue? (SVD did not converge in pinv in kalman filter log likelihood) [EDIT: the travis failure is probably sporadic and is related to these and not to this PR in particular]

@argriffing argriffing closed this Oct 29, 2013

@argriffing argriffing reopened this Oct 29, 2013

@argriffing

This comment has been minimized.

Copy link
Author

commented Oct 29, 2013

I've tried closing and reopening to toggle Travis for the SVD convergence problem. On my system nosetests statsmodels passes all tests locally (except for that test_pca.py), with no complaint about SVD convergence. I'm pretty sure that my SVD uses openblas LAPACK. [EDIT: the travis failure is https://github.com/numpy/numpy/issues/1588 ?]

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 1, 2013

@josef-pkt Is there anything preventing merging of this PR? I think that the SVD non-convergence is some numpy bug, and I don't understand your comment about line 378. If you would like me to address that comment before merging, I might need more detail. Is it referring to this line https://github.com/argriffing/statsmodels/blob/907122b797b4bf19ad3cf5e25686bc40c4464c51/statsmodels/base/model.py#L378?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 1, 2013

@argriffing I didn't know this is ready for merge. I can look into it in a few hours.
I need to look at the SVD convergence failure. We didn't have it a while ago and I don't know what triggered it.

In the unit test you give loglike_and_score has method argument. I thought this should be picked up automatically if the model class specifies this method. I didn't work my way fully through the different options of specifying scores when I looked at this PR before.

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 1, 2013

@josef-pkt Thanks! The interface design could become clearer if we add interfaces to more solvers that use loglike+score but not hessian (e.g. tnc), and when more models provide the simultaneous calculations (maybe ARIMA).

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 1, 2013

The latest build on master was without failure, so the svd failure is most likely in this PR
https://travis-ci.org/statsmodels/statsmodels/builds

common problem I have with svd failures is if a nan or inf sneaked into my numbers.

extra_kwargs = dict((x, kwargs[x]) for x in names if x in kwargs)

if extra_kwargs.get('approx_grad', False):
score = None

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Nov 1, 2013

Member

Here we removed the score if approx_grad

if epsilon and not approx_grad:
raise ValueError('a finite-differences epsilon was provided '
'even though we are not using approx_grad')
if approx_grad and (score or loglike_and_score):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Nov 1, 2013

Member

Now we raise, which should actually happen in ARIMA which has score because of LikelihoodModel.fit and approx_grad=True.

AFAICS, which is not very clear yet.

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 2, 2013

In my opinion the root of the approx_grad confusion is that the code has multiple points at which the gradient may be approximated. For example the ARIMA code currently uses a numdiff finite differences function to approximate the gradient (although @jseabold has suggested that this could be replaced by an analytically computed gradient). On the other hand, the scipy fmin_l_bfgs_b function will internally approximate the gradient if no gradient is provided and if approx_grad is True.

So there are multiple ways to specify the gradient.

  1. let lbfgs use its internal gradient approximation
    2a) use numdiff and pass lbfgs the resulting score function
    2b) use numdiff and pass lbfgs a bundled loglike+numdiffscore function
    3a) pass lbfgs a more clever (e.g. analytic, autodiff) score function
    3b) pass lbfgs a bundled loglike+cleverscore function

Of these options, (3) is best, followed by (1), and (2) is worst. We are currently using (2) for ARIMA, but I hope that fixing this does not fall in the scope of this PR.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 2, 2013

(We had a power outage all last evening and part of the night).

I understand the principle, but I'm still not sure whether ARMA is or is not using the score method, and whether that default has been changed by this PR, leading to the SVD failure.

Changing/Improving the score calculation for ARIMA is of course not required by this pull request.
But I don't see how to check which score is actually used. Do we store the optimization options for this?

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 2, 2013

and whether that default has been changed by this PR, leading to the SVD failure.

Yeah I think you're right. I'll change it back.

@coveralls

This comment has been minimized.

Copy link

commented Nov 2, 2013

Coverage Status

Coverage remained the same when pulling 3b14099 on argriffing:mnlogit-loglike-score into 4f675c7 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 2, 2013

It made TravisCI green.

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b. Do I read this correctly?

The svd failure with the different estimation settings might need investigation, but that's not part of this PR.

I can merge this (after another brief look), if you want to use this for your low memory estimation, and we open a new issue for integrating this with automatic loglike_and_score usage.

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 2, 2013

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b.
Do I read this correctly?

Yes, I believe so. In the old statsmodels code (and in its current restored version), ARIMA uses the internal gradient approximation of fmin_l_bfgs_b even though it also defines its own score function which calls numdiff. So that is a little odd. But maybe ARIMA's numdiff score function is used somewhere else, instead of in the MLE search?

open a new issue for integrating this with automatic loglike_and_score usage.

You are suggesting to have the statsmodels machinery automatically detect a loglike_and_score member function of a Model subclass, and statsmodels will automatically use this function when appropriate for parameter estimation (fitting), similarly to how the loglike, score, and hessian member functions are currently treated?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 2, 2013

Yes, that was my first thought when I looked through this PR, automatically using loglike_and_score if available, instead of loglike and score separately.

Now, I think we better wait for Skippers outsourcing of the optimizer code. To make the optimizer code more generally available, we need to be able to specify the objective function and the score even if they are not loglike and score.
Our preferred optimizer is Newton in discrete, and there it would be better to have loglike_and_score_and_hessian.
Furthermore, scipy.optimize has two new more robust newton optimizers, that would be very useful for models like discrete where in many or most cases we have analytical 1st and 2nd derivatives.

So, if you can do your low memory work with the changes in this PR, then I would postpone this integration until Skipper can also have a look at this, and can check how this works with his branch.

@argriffing

This comment has been minimized.

Copy link
Author

commented Nov 2, 2013

@josef-pkt That sounds like a good plan. After this PR is merged I'll work on low memory modifications (or low memory subclasses, if a time/memory tradeoff is required) for MNLogit.

By the way, I saw your mailing list question

Does scipy have another optimizer besides fmin (Nelder-Mead) that is
robust to near-singular, high condition number Hessian?

fmin_bfgs goes into neverland, values become huge until I get some
nans in my calculations.

What would be nice is an optimizer that uses derivatives, but
regularizes, forces Hessian or equivalent to be positive definite.

and I'd suggest giving lbfgs a try, in unconstrained mode. I've often had it succeed when bfgs fails.

josef-pkt added a commit that referenced this pull request Nov 2, 2013

Merge pull request #1161 from argriffing/mnlogit-loglike-score
add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit

@josef-pkt josef-pkt merged commit 831e2ad into statsmodels:master Nov 2, 2013

1 check passed

default The Travis CI build passed
Details
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Nov 2, 2013

Good, merged

unit tests pass, and we will have another look later, see #1141

Thanks you

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1161 from argriffing/mnlogit-loglike-s…
…core

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.