Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Binomial Rebased #795

Merged
merged 59 commits into from Apr 29, 2013
Merged

Conversation

jseabold
Copy link
Member

Rebased. Fixed merge conflicts. Slight style refactor. Renamed ll -> loglike_method. Fixed alpha vs lnalpha standard error.

@jseabold
Copy link
Member Author

I think this is ready to merge. I'll leave it for another look though.

@jseabold
Copy link
Member Author

Giving the docs a once over. Any idea why the equations under loglike would show up as right-justified? My latex-fu is not strong, and I don't recall every seeing this with sphinx before.

@josef-pkt
Copy link
Member

what are the many trailing backslashes for?

In general, equations align on equal sign in a list of equations (at least in scientific workplace)

@josef-pkt
Copy link
Member

BTW: if there is so much latex, I used sometimes r''' to make it into a raw string to avoid doubling the backslashes

@jseabold
Copy link
Member Author

Trailing backslashes indicate a newline.

I can put them in an align environment and they should align, I just found it odd that they show up right aligned. Never seen this, though I guess it might show up other places I just don't know about.

Yeah I changed it to a raw string.

@jseabold
Copy link
Member Author

@jseabold
Copy link
Member Author

Ok. Now should be good to go. Outstanding issues: NB1 fit is slow. I think we should look into Cython for approx_hess, etc. Presumably, these get called a lot and we might get decent speed-ups in optimizations that use them. Need to do some profiling first though.

Merge?

hess_arr[i,j] = np.sum(-exog[:,i,None]*exog[:,j,None] *\
const_arr, axis=0)
hess_arr[i,j] = np.sum(-exog[:,i,None] * exog[:,j,None] *
const_arr, axis=0)
hess_arr[np.triu_indices(dim, k=1)] = hess_arr.T[np.triu_indices(dim,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting the triu_indices into a temp variable would make this nicer (and saves to calculate them twice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I just rebased, so I'm not going to push until right before I merge if you're still looking things over.

@josef-pkt
Copy link
Member

the hessian looks painful (did you use sympy to help)

Why did you switch back to bfgs?

Why does the transform not affect the hessian (except for the call to transform)?

You can merge whenever. after travis has reported in.

@jseabold
Copy link
Member Author

Yes, the Hessian was painful. I used sympy in the end for dlnL/dalpha dalpha for nb1. I kept screwing up somewhere and got tired of trying. I should've used it from the start, but luckily I'd already done the confusing indexing for the Hessian for nb2, so that wasn't too bad.

Newton doesn't converge well in the NB1 case for alpha. I don't see why, but it's likely a problem with the step-size in our newton.

The Hessian is calculated in Model.fit, so it gets lnalpha but returns the Hessian in terms of alpha, so it's fine. Actually, I can remove the double calculation for nb1. It's not needed anymore.

@jseabold
Copy link
Member Author

Put another way, the Hessian is now always the hessian for dlnL/d alpha. Even if you give it lnalpha or alpha, provided that transparams correctly handles lnalpha. Actually come to think of it, we need to set transparam in fit not __init__, so that you can call fit on the same model twice.

@josef-pkt
Copy link
Member

Ok, I guess I roughly understand transparams is where we evaluate the hessian, but the hessian is always wrt alpha.

nice work (and quite a lot of it)

@josef-pkt
Copy link
Member

Maybe back to this: Newton expects that the hessian is with respect to lnalpha. If we want to use the hessian during optimization, then we need wrt lnalpha not alpha.

For cov_params we use hessian wrt alpha as you have it now (with transparams).

@jseabold
Copy link
Member Author

D'oh. Yeah that makes complete sense. So...we could set _transparams=False for Newton, then we'll get alpha in terms of exp(alpha). This works fine and fast in the test case. Given that the sign is correct in the Hessian my intuition says that it will always converge to the exp(alpha) with alpha > 1. It's kind of a hack, though. Thoughts?

@jseabold
Copy link
Member Author

Yeah, it seems to work okay. I think it's alright for now and we can leave the default to bfgs.

We'll need to support transforms like this in a more general way at some point, but I'd rather leave it for the optimize refactor I've been wanting to do anyway.

@josef-pkt
Copy link
Member

I think using the chain rule alpha = exp(lnalpha), we should be able to get the transformation, the additional terms for the hessian wrt lnalpha.

With some quick calculation, the cross derivative, would just need to be multiplied by alpha, but the second derivative wrt lnalpha now includes and additional first derivative term

d2 f() / (d lnalpha d lnaplha) = d2 f() / (d alpha d aplha) * alpha**2 + d f() / d alpha * alpha

d2 f() / (d alpha d aplha) is current hessian[-1, -1]

Not sure, no intuition on the second derivative wrt lnalpha adjustment.

@jseabold
Copy link
Member Author

Hack seems to work okay. We can fix it for real later unless you want to submit a patch. I'm not going to go back through the math and have to change gears for some work I have due next week.

@josef-pkt
Copy link
Member

if self.loglike_method.startswith('nb') and method != 'newton':

is clean/correct but fragile

if we have other optimizers that use the hessian then they also would need to be included.

What about the score/gradient, does it calculate wrt lnalpha depending on transparams? (I haven't checked yet.)

@jseabold
Copy link
Member Author

Newton-conjugate gradient is the only other one that uses the Hessian and needs to be account for.

@jseabold
Copy link
Member Author

ncg is still pretty robust. It works both ways, but it's better if exempted like newton.

@jseabold
Copy link
Member Author

I'm going to go ahead and merge shortly. We can file a ticket for the score/hessians, but I'm not going to work on it now. Thinking about it some more, it does seem like the score and hessians should return wrt to lnalpha since this is what the optimizer needs to step in the direction of. It still seems to work decently for the gradient-based optimizers, which is what's confusing me. Might get a bit faster convergence and slightly higher precision results if we fix it.

@jseabold
Copy link
Member Author

Hmm, actually, nevermind about merging. This may be why we get such low precision accuracy on alpha in most of the tests and should probably be fixed.

@josef-pkt
Copy link
Member

Are you getting good precision with newton? without transform it should be correct and converge to a nice result (if or when it converges)

@jseabold
Copy link
Member Author

Now that I'm looking at it briefly again, I think it's the test results not us. We agree with stata up to a large precision but just not the R results from the unpublished package. I'll update the parameters there and see. It's likely that R is using a hessian approximation or the optimization needed to be tweaked.

@jseabold
Copy link
Member Author

Yeah, we agree with Stata up to 8 decimal places. So I'll merge and file a ticket to look at the score/hessian when I get back into this 3 years down the road...?

jseabold added a commit that referenced this pull request Apr 29, 2013
ENH: Add Negative Binomial Model.
@jseabold jseabold merged commit 6d95699 into statsmodels:master Apr 29, 2013
@jseabold jseabold deleted the vincent-nbin branch April 29, 2013 20:01
@josef-pkt josef-pkt added the PR label Apr 14, 2014
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants