New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/ENH: Probit for continuous data #7210
Comments
I'm raising this to a serious bug We are advertising QMLE, but Probit might return some wrong things without complaining or indication that anything is unusual, or that it requires binary endog. |
I think we can drop the "q" trick if that's the culprit. If we have 10 levels/choices in multinomial, then saving 9/10 makes a larger difference, also in terms of memory. |
on the GLM side missing unit test #4598 there are generic or parameterized tests that use searching for "probit" finds the parameterized tests for gradient optimization without and with weights.
|
the |
maybe we should keep the current q trick for Probit score and hessian don't have a log term, but they have division by prob and (1 - prob) an article that includes recommendation for numerical stability, e.g. also recommends a tail approximation of cdf, pdf terms |
Looks kind of bad here Logit also uses the q trick in loglike. Stata 14 has
Note, for interval data: we have only QMLE and loglike, |
After fixing precision problems in GLM probit link second derivative #1878, GLM now agrees with Stata fracreg for both Logit and Probit also in standard errors bse (except for the correction factor in the default). |
a picture for fun, using GLM for fractional regression cauchy seems to predict better based on the plot Stata and SAS also don't have cauchy link, AFAICS |
removing prio high Whether we extend probit to continuous data QMLE is still an open question. |
I suspect that Probit only works for binary {0, 1} endog.
If I add some noise to get a continuous endog, then Probit does not produce the same result as the GLM version, but it does for Logit.
I need to go through the math to check, but I think the
q = 2*self.endog - 1
trick in loglike, score and similar might not work for continuous endog. (I haven't looked at this in many years)My guess is that this makes it similar to MNLogit and OrderedModel, where we only compute the choice that has been selected in that observation.
Or there are some other problems related to a noncanonical link for Binomial.
This web page has loglike and score with and without the q trick
https://www.statlect.com/fundamentals-of-statistics/probit-model-maximum-likelihood
The text was updated successfully, but these errors were encountered: