add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

Merged
merged 5 commits into from Nov 2, 2013

Projects

None yet

4 participants

@argriffing

No description provided.

@argriffing

related to issue #1141

@josef-pkt josef-pkt commented on the diff Oct 29, 2013
statsmodels/discrete/discrete_model.py
@@ -1588,6 +1588,21 @@ def score(self, params):
#NOTE: might need to switch terms if params is reshaped
return np.dot(firstterm.T, self.exog).flatten()
+ def loglike_and_score(self, params):
@josef-pkt
josef-pkt Oct 29, 2013 Member

Update: maybe not such a good idea, because your _fit_mle_lbfgs covers several cases of loglike and score

Since this is mainly for optimization and doesn't affect other statistical results, I would define it in terms of the negative logliglihood nloglike_and_score avoids the negation.

in statsmodels.base.model.GenericLikelihoodModel and subclasses, I used nloglike

@argriffing
argriffing Oct 29, 2013

Is GenericLikelihoodModel even used? I've run across swaths of statsmodels code that is either commented out or has comments like 'dead code below here' related to the generic likelihood model.

@josef-pkt
josef-pkt Oct 29, 2013 Member

GenericLikelihoodModel is the rapid prototyping and development version. Most likelihood models grow up and then integrate into the LikelihoodModel hierarchy and don't subclass GenericLikelihoodModel.
GenericLikelihoodModel has more preconfigured or more general defaults than LikelihoodModel.

@josef-pkt
Member

I think, you could try to special case it in line L378, if loglike_and_score are available, to make it used automatically.
I haven't looked how this would look like after Skipper's refactoring/outsourcing.

looks good so far.

@argriffing

I think, you could try to special case it in line L378,
if loglike_and_score are available, to make it used automatically.

I'm not sure exactly what you mean here, but maybe you are talking about statsmodels/base/model.py? If so, I agree that this could make sense. There are so many statsmodels layers that I'm not sure which changes in which places would be most appropriate.

Regarding the comments about loglike, score, nloglike_and_score, and negations, the statmodels situation is confusing to me because functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code. I would prefer to have loglike never mean negative log likelihood, and to have score never mean the negative of the log likelihood gradient, but on the other hand it is probably not a good use of my time to go through and change the variable names everywhere this occurs in statsmodels.

@josef-pkt
Member

functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code

Can you just point them out or open an issue for them.
We shouldn't have different definitions with the same name, so it should be cleaned up, but that's not part of this PR.

@argriffing

Can you just point them out or open an issue for them.

added as issue #1162

@argriffing

Is the Travis failure a known issue? (SVD did not converge in pinv in kalman filter log likelihood) [EDIT: the travis failure is probably sporadic and is related to these and not to this PR in particular]

@argriffing argriffing closed this Oct 29, 2013
@argriffing argriffing reopened this Oct 29, 2013
@argriffing

I've tried closing and reopening to toggle Travis for the SVD convergence problem. On my system nosetests statsmodels passes all tests locally (except for that test_pca.py), with no complaint about SVD convergence. I'm pretty sure that my SVD uses openblas LAPACK. [EDIT: the travis failure is https://github.com/numpy/numpy/issues/1588 ?]

@argriffing

@josef-pkt Is there anything preventing merging of this PR? I think that the SVD non-convergence is some numpy bug, and I don't understand your comment about line 378. If you would like me to address that comment before merging, I might need more detail. Is it referring to this line https://github.com/argriffing/statsmodels/blob/907122b797b4bf19ad3cf5e25686bc40c4464c51/statsmodels/base/model.py#L378?

@josef-pkt
Member

@argriffing I didn't know this is ready for merge. I can look into it in a few hours.
I need to look at the SVD convergence failure. We didn't have it a while ago and I don't know what triggered it.

In the unit test you give loglike_and_score has method argument. I thought this should be picked up automatically if the model class specifies this method. I didn't work my way fully through the different options of specifying scores when I looked at this PR before.

@argriffing

@josef-pkt Thanks! The interface design could become clearer if we add interfaces to more solvers that use loglike+score but not hessian (e.g. tnc), and when more models provide the simultaneous calculations (maybe ARIMA).

@josef-pkt
Member

The latest build on master was without failure, so the svd failure is most likely in this PR
https://travis-ci.org/statsmodels/statsmodels/builds

common problem I have with svd failures is if a nan or inf sneaked into my numbers.

@josef-pkt josef-pkt commented on an outdated diff Nov 1, 2013
statsmodels/base/model.py
extra_kwargs = dict((x, kwargs[x]) for x in names if x in kwargs)
- if extra_kwargs.get('approx_grad', False):
- score = None
@josef-pkt
josef-pkt Nov 1, 2013 Member

Here we removed the score if approx_grad

@josef-pkt josef-pkt commented on an outdated diff Nov 1, 2013
statsmodels/base/model.py
+ approx_grad = extra_kwargs.get('approx_grad', False)
+ loglike_and_score = kwargs.get('loglike_and_score', None)
+ epsilon = kwargs.get('epsilon', None)
+
+ # Choose among three options for dealing with the gradient (the gradient
+ # of a log likelihood function with respect to its parameters
+ # is more specifically called the score in statistics terminology).
+ # The first option is to use the finite-differences
+ # approximation that is built into the fmin_l_bfgs_b optimizer.
+ # The second option is to use the provided score function.
+ # The third option is to use the score component of a provided
+ # function that simultaneously evaluates the log likelihood and score.
+ if epsilon and not approx_grad:
+ raise ValueError('a finite-differences epsilon was provided '
+ 'even though we are not using approx_grad')
+ if approx_grad and (score or loglike_and_score):
@josef-pkt
josef-pkt Nov 1, 2013 Member

Now we raise, which should actually happen in ARIMA which has score because of LikelihoodModel.fit and approx_grad=True.

AFAICS, which is not very clear yet.

@argriffing

In my opinion the root of the approx_grad confusion is that the code has multiple points at which the gradient may be approximated. For example the ARIMA code currently uses a numdiff finite differences function to approximate the gradient (although @jseabold has suggested that this could be replaced by an analytically computed gradient). On the other hand, the scipy fmin_l_bfgs_b function will internally approximate the gradient if no gradient is provided and if approx_grad is True.

So there are multiple ways to specify the gradient.

  1. let lbfgs use its internal gradient approximation
    2a) use numdiff and pass lbfgs the resulting score function
    2b) use numdiff and pass lbfgs a bundled loglike+numdiffscore function
    3a) pass lbfgs a more clever (e.g. analytic, autodiff) score function
    3b) pass lbfgs a bundled loglike+cleverscore function

Of these options, (3) is best, followed by (1), and (2) is worst. We are currently using (2) for ARIMA, but I hope that fixing this does not fall in the scope of this PR.

@josef-pkt
Member

(We had a power outage all last evening and part of the night).

I understand the principle, but I'm still not sure whether ARMA is or is not using the score method, and whether that default has been changed by this PR, leading to the SVD failure.

Changing/Improving the score calculation for ARIMA is of course not required by this pull request.
But I don't see how to check which score is actually used. Do we store the optimization options for this?

@argriffing

and whether that default has been changed by this PR, leading to the SVD failure.

Yeah I think you're right. I'll change it back.

@coveralls

Coverage Status

Coverage remained the same when pulling 3b14099 on argriffing:mnlogit-loglike-score into 4f675c7 on statsmodels:master.

@josef-pkt
Member

It made TravisCI green.

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b. Do I read this correctly?

The svd failure with the different estimation settings might need investigation, but that's not part of this PR.

I can merge this (after another brief look), if you want to use this for your low memory estimation, and we open a new issue for integrating this with automatic loglike_and_score usage.

@argriffing

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b.
Do I read this correctly?

Yes, I believe so. In the old statsmodels code (and in its current restored version), ARIMA uses the internal gradient approximation of fmin_l_bfgs_b even though it also defines its own score function which calls numdiff. So that is a little odd. But maybe ARIMA's numdiff score function is used somewhere else, instead of in the MLE search?

open a new issue for integrating this with automatic loglike_and_score usage.

You are suggesting to have the statsmodels machinery automatically detect a loglike_and_score member function of a Model subclass, and statsmodels will automatically use this function when appropriate for parameter estimation (fitting), similarly to how the loglike, score, and hessian member functions are currently treated?

@josef-pkt
Member

Yes, that was my first thought when I looked through this PR, automatically using loglike_and_score if available, instead of loglike and score separately.

Now, I think we better wait for Skippers outsourcing of the optimizer code. To make the optimizer code more generally available, we need to be able to specify the objective function and the score even if they are not loglike and score.
Our preferred optimizer is Newton in discrete, and there it would be better to have loglike_and_score_and_hessian.
Furthermore, scipy.optimize has two new more robust newton optimizers, that would be very useful for models like discrete where in many or most cases we have analytical 1st and 2nd derivatives.

So, if you can do your low memory work with the changes in this PR, then I would postpone this integration until Skipper can also have a look at this, and can check how this works with his branch.

@argriffing

@josef-pkt That sounds like a good plan. After this PR is merged I'll work on low memory modifications (or low memory subclasses, if a time/memory tradeoff is required) for MNLogit.

By the way, I saw your mailing list question

Does scipy have another optimizer besides fmin (Nelder-Mead) that is
robust to near-singular, high condition number Hessian?

fmin_bfgs goes into neverland, values become huge until I get some
nans in my calculations.

What would be nice is an optimizer that uses derivatives, but
regularizes, forces Hessian or equivalent to be positive definite.

and I'd suggest giving lbfgs a try, in unconstrained mode. I've often had it succeed when bfgs fails.

@josef-pkt josef-pkt merged commit 831e2ad into statsmodels:master Nov 2, 2013

1 check passed

default The Travis CI build passed
Details
@josef-pkt
Member

Good, merged

unit tests pass, and we will have another look later, see #1141

Thanks you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment