ENH: Warn on non-convergence. #1647

Merged
merged 16 commits into from Sep 26, 2014

Projects

None yet

3 participants

@jseabold
Member

I'm wary of adding more noise/warnings, but this seems like an important one and has led to many bug reports/questions.

@josef-pkt
Member

This will not work unconditionally, when we have internal fits that are not the final ones for the user.
And we don't want to warn users if the convergence failure is just an intermediate result and not really relevant for the final outcome.

The easiest might be to add a warn=True keyword, where we can turn off the warning.

@jseabold
Member

What models do we have internal fits for and why might this not be indicative of a failure?

This is why I stuck it in LikelihoodModel and not in the Optimizer class. Intermediate fits should use the Optimzer class ideally.

@coveralls

Coverage Status

Coverage remained the same when pulling 2bf5ac7 on jseabold:convergence-warning into b52bc09 on statsmodels:master.

@josef-pkt
Member

I'm still trying to fix NegativeBinomial.fit_regularized on TravisCI. It subclasses LikelihoodModel.
I'm investigating different version of Poisson.fit or fit_regularized to get good enough starting values which also might fail.

I'd like to start to use this as general pattern.
Similar to what Kerby does in Mixed (where he currently doesn't go through inherited optimizers).

It won't be a failure for the end user, if we need several tries to find good start_params.

fit_regularized needs the same warning

@jseabold
Member

Is it anything that actually needs to subclass LikelihoodModel though? I can add a warn=True flag but the Optimizer refactor might/should make it unnecessary.

@jseabold
Member

fit_regularized will likely have to wait for this. It's still not general. In principal we could have fit_regularized for any LikelihoodModel right?* When I looked into adding this to Optimizer there were design decisions that were too specific to discrete to generalize easily.

*It's unclear to me if there is something like the Lars algorithm for these non-linear models yet that would be better than the general constrained optimizers. I haven't really looked.

@josef-pkt
Member

I'm talking about discrete_models which are the main subclasses of LikelihoodModel.

In several cases we are not only switching optimizers, but we are also switching the model to get the start_params, NegativeBinomial.fit uses Poisson to get start_params. This won't be solved by having stacked optimizers in the Optimization class.

@jseabold
Member

Added a keyword.

@josef-pkt
Member

In principal we could have fit_regularized for any LikelihoodModel right.

Yes, but outside of discrete we currently only have tsa models that are MLE with non-linear optimization, plus GenericLikelihoodModel.
GenericLikelihoodModel would be a good test case for generalizing fit_regularized.
I have never looked into L1 regularized/penalized tsa.

Kerby is looking into a shooting method for regularized fit.
My impression is that the interior point algorithm of the current fit_regularized should work well and be fastest for medium scale problems (where we don't need a sparse params because of the size of the arrays.)
Problem is that we only have slsqp (and optional cvxopt) and no alternative if slsqp fails.

@josef-pkt
Member

the test suite raises now one warning for this

@josef-pkt josef-pkt commented on an outdated diff Apr 29, 2014
statsmodels/base/model.py
@@ -205,7 +205,7 @@ def hessian(self, params):
def fit(self, start_params=None, method='newton', maxiter=100,
full_output=True, disp=True, fargs=(), callback=None,
- retall=False, **kwargs):
+ retall=False, check_convergence=True, **kwargs):
@josef-pkt
josef-pkt Apr 29, 2014 Member

I think warn_convergence would be more explicit

@jseabold
Member

It only warns on the first one. I need to change my warning settings.

@jseabold
Member

On Tue, Apr 29, 2014 at 11:00 AM, Josef Perktold
notifications@github.comwrote:

In principal we could have fit_regularized for any LikelihoodModel right.

Yes, but outside of discrete we currently only have tsa models that are
MLE with non-linear optimization, plus GenericLikelihoodModel.
GenericLikelihoodModel would be a good test case for generalizing
fit_regularized.
I have never looked into L1 regularized/penalized tsa.

It's conditional MLE (OLS), but I am looking recently at TS-LARS.

Kerby is looking into a shooting method for regularized fit.
My impression is that the interior point algorithm of the current
fit_regularized should work well and be fastest for medium scale problems
(where we don't need a sparse params because of the size of the arrays.)
Problem is that we only have slsqp (and optional cvxopt) and no
alternative if slsqp fails.

Yeah, I don't know well the options, if any, outside of Lars for linear
models here. Elements of Statistical Learning mentions predictor-corrector
methods and coordinate descent for logistic regression.

@jseabold
Member

I suppose we should also add the warn_convergence keyword to subclasses or is it for internal use only?

@josef-pkt
Member

Ah yes we need a way to transmit it.
One possibility would be to transmit it through the **kwargs

kwargs.pop('warn_convergence', False)

@coveralls

Coverage Status

Coverage remained the same when pulling bda8c61 on jseabold:convergence-warning into b52bc09 on statsmodels:master.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Apr 29, 2014
statsmodels/base/model.py
@@ -414,6 +421,12 @@ def fit(self, start_params=None, method='newton', maxiter=100,
#TODO: hardcode scale?
if isinstance(retvals, dict):
mlefit.mle_retvals = retvals
+ if kwargs.pop('warn_convergence',
@josef-pkt
josef-pkt Apr 29, 2014 Member

I think you have to pop before calling the optimizer, so that the optimizer doesn't get the extra kwarg

I guess we handle it only once, either here or possible move it to optimizer later.

@jseabold
jseabold Apr 29, 2014 Member

Right. It doesn't seem to matter (I used get before). I think we use get in the optimization functions rather than passing in all the **kwargs, but I'll change it.

@josef-pkt josef-pkt commented on the diff Apr 29, 2014
statsmodels/discrete/tests/test_discrete.py
@@ -613,7 +614,7 @@ def setupClass(cls):
# Actually drop the last columnand do an unregularized fit
exog_no_PSI = data.exog[:, :cls.m]
cls.res_unreg = MNLogit(data.endog, exog_no_PSI).fit(
- disp=0, tol=1e-15)
+ disp=0, tol=1e-15, method='bfgs', maxiter=1000)
@josef-pkt
josef-pkt Apr 29, 2014 Member

were these test cases that set off the warning ?

@jseabold
jseabold Apr 29, 2014 Member

Yeah. There are still more outside of discrete I think. I just haven't tracked them down yet.

@jseabold
jseabold Apr 29, 2014 Member

It didn't seem to matter much. Sometimes convergence failure isn't a big deal if it's just a few more iterations where the xopt doesn't change much.

@josef-pkt
josef-pkt Apr 29, 2014 Member

but you switched this and the next from default newton to bfgs. Is that better?
example for #1649 ?

@jseabold
jseabold Apr 29, 2014 Member

It seemed faster.

@jseabold
jseabold Apr 29, 2014 Member

And to require fewer iterations.

@jseabold
jseabold Apr 29, 2014 Member

Likely just faster though since I think newton has the fastest (theoretical) convergence. Maybe time for some Cython Hessians in the ugly MNLogit case...

@josef-pkt
josef-pkt Apr 29, 2014 Member

Just, these are the unit tests and we want to test also the default optimizer. and not just the fastest.
So better keep some newtons in there.

@jseabold
jseabold Apr 29, 2014 Member

These aren't unit tests for the optimizer. We already have those. I didn't change the optimizer for the regularized model, which is what this is testing.

@josef-pkt
josef-pkt Apr 29, 2014 Member

no problem, I didn't look at the context

@josef-pkt
Member

another use case for turning off the warning is when we explicitly set a low maxiter just to get a few iterations.
Not yet used in subclasses that use LikelihoodModel, but IIRC we changed something similar in RLM.

@jseabold
Member

We could always just suppress the warning, but sure.

@josef-pkt josef-pkt added this to the 0.6 milestone Jul 15, 2014
@jseabold
Member

Rebased.

@josef-pkt
Member

(I think my DNS server is not working, I still cannot open any webpages)

I went over this earlier today.
Can we put the simplefilter activation into our warnings/exception
module, so the filter is set at import of the module and before the first
usage?

I don't remember what the story was about circular imports.

@josef-pkt
Member

https://developers.google.com/speed/public-dns/

Maybe I can look at it later :)

@coveralls

Coverage Status

Coverage increased (+0.0%) when pulling c53154a on jseabold:convergence-warning into 2fcc5ae on statsmodels:master.

@jseabold
Member

I should probably just add a warn_convergence=0 next to a disp=0 in all the tests, but I'm not up for that. I don't see any other way to just silence the warnings for the tests.

@coveralls

Coverage Status

Changes Unknown when pulling e0df366 on jseabold:convergence-warning into * on statsmodels:master*.

@jseabold jseabold merged commit cfebe9a into statsmodels:master Sep 26, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@jseabold jseabold deleted the jseabold:convergence-warning branch Sep 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment