Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Warn on non-convergence. #1647

Merged
merged 16 commits into from Sep 26, 2014

Conversation

Projects
None yet
3 participants
@jseabold
Copy link
Member

commented Apr 29, 2014

I'm wary of adding more noise/warnings, but this seems like an important one and has led to many bug reports/questions.

@josef-pkt josef-pkt added PR labels Apr 29, 2014

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

This will not work unconditionally, when we have internal fits that are not the final ones for the user.
And we don't want to warn users if the convergence failure is just an intermediate result and not really relevant for the final outcome.

The easiest might be to add a warn=True keyword, where we can turn off the warning.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

What models do we have internal fits for and why might this not be indicative of a failure?

This is why I stuck it in LikelihoodModel and not in the Optimizer class. Intermediate fits should use the Optimzer class ideally.

@coveralls

This comment has been minimized.

Copy link

commented Apr 29, 2014

Coverage Status

Coverage remained the same when pulling 2bf5ac7 on jseabold:convergence-warning into b52bc09 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

I'm still trying to fix NegativeBinomial.fit_regularized on TravisCI. It subclasses LikelihoodModel.
I'm investigating different version of Poisson.fit or fit_regularized to get good enough starting values which also might fail.

I'd like to start to use this as general pattern.
Similar to what Kerby does in Mixed (where he currently doesn't go through inherited optimizers).

It won't be a failure for the end user, if we need several tries to find good start_params.

fit_regularized needs the same warning

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

Is it anything that actually needs to subclass LikelihoodModel though? I can add a warn=True flag but the Optimizer refactor might/should make it unnecessary.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

fit_regularized will likely have to wait for this. It's still not general. In principal we could have fit_regularized for any LikelihoodModel right?* When I looked into adding this to Optimizer there were design decisions that were too specific to discrete to generalize easily.

*It's unclear to me if there is something like the Lars algorithm for these non-linear models yet that would be better than the general constrained optimizers. I haven't really looked.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

I'm talking about discrete_models which are the main subclasses of LikelihoodModel.

In several cases we are not only switching optimizers, but we are also switching the model to get the start_params, NegativeBinomial.fit uses Poisson to get start_params. This won't be solved by having stacked optimizers in the Optimization class.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

Added a keyword.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

In principal we could have fit_regularized for any LikelihoodModel right.

Yes, but outside of discrete we currently only have tsa models that are MLE with non-linear optimization, plus GenericLikelihoodModel.
GenericLikelihoodModel would be a good test case for generalizing fit_regularized.
I have never looked into L1 regularized/penalized tsa.

Kerby is looking into a shooting method for regularized fit.
My impression is that the interior point algorithm of the current fit_regularized should work well and be fastest for medium scale problems (where we don't need a sparse params because of the size of the arrays.)
Problem is that we only have slsqp (and optional cvxopt) and no alternative if slsqp fails.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

the test suite raises now one warning for this

@@ -205,7 +205,7 @@ def hessian(self, params):

def fit(self, start_params=None, method='newton', maxiter=100,
full_output=True, disp=True, fargs=(), callback=None,
retall=False, **kwargs):
retall=False, check_convergence=True, **kwargs):

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

I think warn_convergence would be more explicit

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

It only warns on the first one. I need to change my warning settings.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

On Tue, Apr 29, 2014 at 11:00 AM, Josef Perktold
notifications@github.comwrote:

In principal we could have fit_regularized for any LikelihoodModel right.

Yes, but outside of discrete we currently only have tsa models that are
MLE with non-linear optimization, plus GenericLikelihoodModel.
GenericLikelihoodModel would be a good test case for generalizing
fit_regularized.
I have never looked into L1 regularized/penalized tsa.

It's conditional MLE (OLS), but I am looking recently at TS-LARS.

Kerby is looking into a shooting method for regularized fit.
My impression is that the interior point algorithm of the current
fit_regularized should work well and be fastest for medium scale problems
(where we don't need a sparse params because of the size of the arrays.)
Problem is that we only have slsqp (and optional cvxopt) and no
alternative if slsqp fails.

Yeah, I don't know well the options, if any, outside of Lars for linear
models here. Elements of Statistical Learning mentions predictor-corrector
methods and coordinate descent for logistic regression.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

I suppose we should also add the warn_convergence keyword to subclasses or is it for internal use only?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

Ah yes we need a way to transmit it.
One possibility would be to transmit it through the **kwargs

kwargs.pop('warn_convergence', False)

@coveralls

This comment has been minimized.

Copy link

commented Apr 29, 2014

Coverage Status

Coverage remained the same when pulling bda8c61 on jseabold:convergence-warning into b52bc09 on statsmodels:master.

@@ -414,6 +421,12 @@ def fit(self, start_params=None, method='newton', maxiter=100,
#TODO: hardcode scale?
if isinstance(retvals, dict):
mlefit.mle_retvals = retvals
if kwargs.pop('warn_convergence',

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

I think you have to pop before calling the optimizer, so that the optimizer doesn't get the extra kwarg

I guess we handle it only once, either here or possible move it to optimizer later.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

Right. It doesn't seem to matter (I used get before). I think we use get in the optimization functions rather than passing in all the **kwargs, but I'll change it.

@@ -613,7 +614,7 @@ def setupClass(cls):
# Actually drop the last columnand do an unregularized fit
exog_no_PSI = data.exog[:, :cls.m]
cls.res_unreg = MNLogit(data.endog, exog_no_PSI).fit(
disp=0, tol=1e-15)
disp=0, tol=1e-15, method='bfgs', maxiter=1000)

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

were these test cases that set off the warning ?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

Yeah. There are still more outside of discrete I think. I just haven't tracked them down yet.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

It didn't seem to matter much. Sometimes convergence failure isn't a big deal if it's just a few more iterations where the xopt doesn't change much.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

but you switched this and the next from default newton to bfgs. Is that better?
example for #1649 ?

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

It seemed faster.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

And to require fewer iterations.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

Likely just faster though since I think newton has the fastest (theoretical) convergence. Maybe time for some Cython Hessians in the ugly MNLogit case...

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

Just, these are the unit tests and we want to test also the default optimizer. and not just the fastest.
So better keep some newtons in there.

This comment has been minimized.

Copy link
@jseabold

jseabold Apr 29, 2014

Author Member

These aren't unit tests for the optimizer. We already have those. I didn't change the optimizer for the regularized model, which is what this is testing.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Apr 29, 2014

Member

no problem, I didn't look at the context

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 29, 2014

another use case for turning off the warning is when we explicitly set a low maxiter just to get a few iterations.
Not yet used in subclasses that use LikelihoodModel, but IIRC we changed something similar in RLM.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2014

We could always just suppress the warning, but sure.

@josef-pkt josef-pkt added this to the 0.6 milestone Jul 15, 2014

@jseabold jseabold force-pushed the jseabold:convergence-warning branch from ef61378 to c53154a Sep 23, 2014

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2014

Rebased.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Sep 23, 2014

(I think my DNS server is not working, I still cannot open any webpages)

I went over this earlier today.
Can we put the simplefilter activation into our warnings/exception
module, so the filter is set at import of the module and before the first
usage?

I don't remember what the story was about circular imports.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2014

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Sep 23, 2014

https://developers.google.com/speed/public-dns/

Maybe I can look at it later :)

@coveralls

This comment has been minimized.

Copy link

commented Sep 23, 2014

Coverage Status

Coverage increased (+0.0%) when pulling c53154a on jseabold:convergence-warning into 2fcc5ae on statsmodels:master.

@jseabold jseabold force-pushed the jseabold:convergence-warning branch from c53154a to d814b13 Sep 24, 2014

@jseabold jseabold force-pushed the jseabold:convergence-warning branch from d814b13 to 0720242 Sep 24, 2014

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2014

I should probably just add a warn_convergence=0 next to a disp=0 in all the tests, but I'm not up for that. I don't see any other way to just silence the warnings for the tests.

@jseabold jseabold force-pushed the jseabold:convergence-warning branch from 0720242 to d834864 Sep 26, 2014

@jseabold jseabold force-pushed the jseabold:convergence-warning branch from d834864 to e0df366 Sep 26, 2014

@coveralls

This comment has been minimized.

Copy link

commented Sep 26, 2014

Coverage Status

Changes Unknown when pulling e0df366 on jseabold:convergence-warning into * on statsmodels:master*.

jseabold added a commit that referenced this pull request Sep 26, 2014

Merge pull request #1647 from jseabold/convergence-warning
ENH: Warn on non-convergence.

@jseabold jseabold merged commit cfebe9a into statsmodels:master Sep 26, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@jseabold jseabold deleted the jseabold:convergence-warning branch Sep 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.