New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: refactor and speedup of mixed LME #1940
Conversation
an observation on the timing: The largest amount of time is spent in
As far as I can see, you don't use it at all, and it might be automatically called in super fit (I haven't verified this.) Can I renamed |
about the hessian and loglike_full versus loglike_sqrt parameterization: As far as I understand MixedLM, in the You can separate which parameterization to use from the one you use to estimate, as you are doing for But we need a keyword in |
correction: |
The only other interesting change I made to MixedLM fit during my experimentation, is to allow users to overwrite "newton" and some others require that hessian is also defined for |
Thanks for looking into this, I've been looking at it too. One thing I've turned up is that I'm pretty sure there s a bug in the Since we don't report standard errors for variance parameters (similar to If the Hessian is wrong, this would explain why the optimization methods I also have code to calculate the analytic hessian for the square root On Sat, Aug 30, 2014 at 11:21 PM, Josef Perktold notifications@github.com
|
I thought initially the difference in the hessian in the last value is because of the sqrt transform. with use_sqrt=False, and changing
newton converges to almost the same values as bfgs except for the last params
but it's quite a bit slower than One possibility that I used in some draft versions for nonlinear robust M-estimators, is to use the analytical gradient and hessian for the mean part of the model, and the numerical derivatives for the scale estimator. Follow-up question: |
correction to last:
(res_m is with 'bfgs', res_m3 is with 'newton')
score at optimum with newton is smaller than with bfgs
|
another issue the complex step hessian is wrong, I guess
So to me your hessian_full looks correct. But in the calculation of the loglike there is something that messes up complex values. |
Thanks that's great. I will continue to try to get the analytic hessian On Sun, Aug 31, 2014 at 1:55 AM, Josef Perktold notifications@github.com
|
based on master, without the changes in this PR I tried to use |
Based on comparing just a few test cases, we seem to be doing around 2 I would like to refactor that transformation code that converts gradients Also, there is a more or less complete rewrite needed to accommodate On Sun, Aug 31, 2014 at 12:06 PM, Josef Perktold notifications@github.com
|
Another possible optimization is to improve the I think it could start at least with OLS |
another piece: effect of start_params, I changed the source to allow
|
I just tried out starting with |
Several updates here (sorry, should have placed in different commits):
Remaining issues: the analytic and numeric Hessians do not agree well away from the MLE. This was always true, not a new issue. I can't find the issue now. Passes all tests locally, waiting on Travis. Based on very limited profiling, this is now around 7x slower than GEE-exchangeable, compared to 65x slower before, so around 10 times faster than before. |
One quick question based on your summary: I had to revert OLS start_params because the unit tests failed with it, I had opened #1947 |
There was an odd thing happening in which the first steepest descent step I also dialed back the rtol parameter a bit in one place, line 212. I Other than this, all the same tests I had before run clean. I would still On Mon, Sep 1, 2014 at 2:50 PM, Josef Perktold notifications@github.com
|
two questions: Would it be useful to keep steepest descend around for cases where where the starting values are not good? I don't know when we might run into convergence problems with bfgs and EM. Is there a test case with unbalanced panel? I only did a rough browsing through this PR (and might not go over details before merging). I guess the only thing I have additionally is to add This PR is only one commit behind master, so I can just hit the merge button, if you think it's ready. |
I'm adding the steepest ascent in now, then you can merge. On Mon, Sep 1, 2014 at 4:24 PM, Josef Perktold notifications@github.com
|
@kshedden One more question about the different optimization algorithms: Do you know if there is a recommendation with respect to which one step: one step of steepest descend, one step of bfgs, one step of newton, and so on? If we have |
to clarify the question: do all the different optimizer do one full updating step with maxiter=1? |
I'm just guessing, but I would expect that any theory justifying a one step I've often wondered whether the different optimizers always start by On Tue, Sep 2, 2014 at 11:49 AM, Josef Perktold notifications@github.com
|
I'm merging this as is. (I'm a little bit out of this topic again and I'm not able to read it quickly again.) |
REF: refactor and speedup of mixed LME
I changed the description to reflect a better what is in this PR |
This PR has turned into a refactoring to improve the structure of the code, and to speed it up.
comments below about the details and mailing list thread, triggered by this
https://groups.google.com/d/msg/pystatsmodels/KXF3CxqYZcI/ZV5m_1ENHTkJ
description was:
Fix one docstring.
Fix a bug that was preventing the history from being returned.