New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative Binomial Rebased #795
Conversation
I think this is ready to merge. I'll leave it for another look though. |
Giving the docs a once over. Any idea why the equations under loglike would show up as right-justified? My latex-fu is not strong, and I don't recall every seeing this with sphinx before. |
what are the many trailing backslashes for? In general, equations align on equal sign in a list of equations (at least in scientific workplace) |
BTW: if there is so much latex, I used sometimes r''' to make it into a raw string to avoid doubling the backslashes |
Trailing backslashes indicate a newline. I can put them in an align environment and they should align, I just found it odd that they show up right aligned. Never seen this, though I guess it might show up other places I just don't know about. Yeah I changed it to a raw string. |
Ok. Now should be good to go. Outstanding issues: NB1 fit is slow. I think we should look into Cython for approx_hess, etc. Presumably, these get called a lot and we might get decent speed-ups in optimizations that use them. Need to do some profiling first though. Merge? |
hess_arr[i,j] = np.sum(-exog[:,i,None]*exog[:,j,None] *\ | ||
const_arr, axis=0) | ||
hess_arr[i,j] = np.sum(-exog[:,i,None] * exog[:,j,None] * | ||
const_arr, axis=0) | ||
hess_arr[np.triu_indices(dim, k=1)] = hess_arr.T[np.triu_indices(dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting the triu_indices into a temp variable would make this nicer (and saves to calculate them twice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I just rebased, so I'm not going to push until right before I merge if you're still looking things over.
the hessian looks painful (did you use sympy to help) Why did you switch back to bfgs? Why does the transform not affect the hessian (except for the call to transform)? You can merge whenever. after travis has reported in. |
Yes, the Hessian was painful. I used sympy in the end for dlnL/dalpha dalpha for nb1. I kept screwing up somewhere and got tired of trying. I should've used it from the start, but luckily I'd already done the confusing indexing for the Hessian for nb2, so that wasn't too bad. Newton doesn't converge well in the NB1 case for alpha. I don't see why, but it's likely a problem with the step-size in our newton. The Hessian is calculated in Model.fit, so it gets lnalpha but returns the Hessian in terms of alpha, so it's fine. Actually, I can remove the double calculation for nb1. It's not needed anymore. |
Put another way, the Hessian is now always the hessian for dlnL/d alpha. Even if you give it lnalpha or alpha, provided that transparams correctly handles lnalpha. Actually come to think of it, we need to set transparam in fit not |
Ok, I guess I roughly understand transparams is where we evaluate the hessian, but the hessian is always wrt alpha. nice work (and quite a lot of it) |
Maybe back to this: Newton expects that the hessian is with respect to lnalpha. If we want to use the hessian during optimization, then we need wrt lnalpha not alpha. For cov_params we use hessian wrt alpha as you have it now (with transparams). |
D'oh. Yeah that makes complete sense. So...we could set _transparams=False for Newton, then we'll get alpha in terms of exp(alpha). This works fine and fast in the test case. Given that the sign is correct in the Hessian my intuition says that it will always converge to the exp(alpha) with alpha > 1. It's kind of a hack, though. Thoughts? |
Yeah, it seems to work okay. I think it's alright for now and we can leave the default to bfgs. We'll need to support transforms like this in a more general way at some point, but I'd rather leave it for the optimize refactor I've been wanting to do anyway. |
I think using the chain rule alpha = exp(lnalpha), we should be able to get the transformation, the additional terms for the hessian wrt lnalpha. With some quick calculation, the cross derivative, would just need to be multiplied by alpha, but the second derivative wrt lnalpha now includes and additional first derivative term d2 f() / (d lnalpha d lnaplha) = d2 f() / (d alpha d aplha) * alpha**2 + d f() / d alpha * alpha d2 f() / (d alpha d aplha) is current hessian[-1, -1] Not sure, no intuition on the second derivative wrt lnalpha adjustment. |
Hack seems to work okay. We can fix it for real later unless you want to submit a patch. I'm not going to go back through the math and have to change gears for some work I have due next week. |
is clean/correct but fragile if we have other optimizers that use the hessian then they also would need to be included. What about the score/gradient, does it calculate wrt lnalpha depending on transparams? (I haven't checked yet.) |
Newton-conjugate gradient is the only other one that uses the Hessian and needs to be account for. |
ncg is still pretty robust. It works both ways, but it's better if exempted like newton. |
I'm going to go ahead and merge shortly. We can file a ticket for the score/hessians, but I'm not going to work on it now. Thinking about it some more, it does seem like the score and hessians should return wrt to lnalpha since this is what the optimizer needs to step in the direction of. It still seems to work decently for the gradient-based optimizers, which is what's confusing me. Might get a bit faster convergence and slightly higher precision results if we fix it. |
Hmm, actually, nevermind about merging. This may be why we get such low precision accuracy on alpha in most of the tests and should probably be fixed. |
Are you getting good precision with newton? without transform it should be correct and converge to a nice result (if or when it converges) |
Now that I'm looking at it briefly again, I think it's the test results not us. We agree with stata up to a large precision but just not the R results from the unpublished package. I'll update the parameters there and see. It's likely that R is using a hessian approximation or the optimization needed to be tweaked. |
Yeah, we agree with Stata up to 8 decimal places. So I'll merge and file a ticket to look at the score/hessian when I get back into this 3 years down the road...? |
ENH: Add Negative Binomial Model.
ENH: Add Negative Binomial Model.
Rebased. Fixed merge conflicts. Slight style refactor. Renamed ll -> loglike_method. Fixed alpha vs lnalpha standard error.