BUG/ENH: Probit for continuous data #7210

josef-pkt · 2020-12-16T19:27:21Z

I suspect that Probit only works for binary {0, 1} endog.

If I add some noise to get a continuous endog, then Probit does not produce the same result as the GLM version, but it does for Logit.

I need to go through the math to check, but I think the q = 2*self.endog - 1 trick in loglike, score and similar might not work for continuous endog. (I haven't looked at this in many years)
My guess is that this makes it similar to MNLogit and OrderedModel, where we only compute the choice that has been selected in that observation.

Or there are some other problems related to a noncanonical link for Binomial.

This web page has loglike and score with and without the q trick
https://www.statlect.com/fundamentals-of-statistics/probit-model-maximum-likelihood

The text was updated successfully, but these errors were encountered:

josef-pkt · 2020-12-16T19:38:22Z

I'm raising this to a serious bug

We are advertising QMLE, but Probit might return some wrong things without complaining or indication that anything is unusual, or that it requires binary endog.

josef-pkt · 2020-12-16T19:45:46Z

I think we can drop the "q" trick if that's the culprit.
In the binary case, there is not much to gain computationally. if we compute prob(y=1 | x), then we also have 1 - prob.

If we have 10 levels/choices in multinomial, then saving 9/10 makes a larger difference, also in terms of memory.

josef-pkt · 2020-12-16T20:04:10Z

relevant commits

#2044 changed from binary check to interval check, related to fractional Logit issue #2040
#1978 original PR that added binary check for endog

josef-pkt · 2020-12-16T20:18:33Z

on the GLM side missing unit test #4598
I don't find any unit tests in genmod tests for Binomial with link specified, i.e. for using noncanonical, non-logit link, using file
search
That's weird.

there are generic or parameterized tests that use family=family(link=link()) which don't show up when searching for Binomial( or ``Binomial(link`

searching for "probit" finds the parameterized tests for gradient optimization without and with weights.

# class TestGlmBernoulliProbit(CheckModelResultsMixin):
#    pass

josef-pkt · 2020-12-16T21:33:40Z

the q trick also assumes a symmetric distribution 1 - F(xb) = F(-xb)

josef-pkt · 2020-12-16T21:58:52Z

maybe we should keep the current q trick for Probit
it looks like it shoiuld be numerically more stable and avoids getting close to 0 * log(0) at the unlikely choice, i.e. y log(prob) is only evaluated where y = 1, and we don't have 0 * log(prob) in loglike.

score and hessian don't have a log term, but they have division by prob and (1 - prob)

an article that includes recommendation for numerical stability, e.g. also recommends a tail approximation of cdf, pdf terms
Demidenko, Eugene. "Computational aspects of probit model." Mathematical Communications 6, no. 2 (2001): 233-247.

josef-pkt · 2020-12-17T00:15:12Z

Looks kind of bad here

Logit also uses the q trick in loglike.
using method="newton" doesn't use loglike in the optimization, Logit and GLM-Logit agree in params and cov_params, but not in llf.
Using bfgs causes a convergence warning and a bit different numbers

Stata 14 has fracreg
agrees with GLM and Logit in params
bse using HC0 agree with Stata using a correction factor
bse = bse_stata * np.sqrt((nobs - 1) / nobs)
(I guess this is one of the disagreements between Stata and our GLM cov_type, correction factors are used in unit tests. There should be an option for small sample corrections.)

llf loglike differs between Stata and GLM, and Logit is again different.

Note, for interval data: we have only QMLE and loglike, llf cannot be used for inference anyway.

josef-pkt · 2020-12-17T18:13:39Z

After fixing precision problems in GLM probit link second derivative #1878, GLM now agrees with Stata fracreg for both Logit and Probit also in standard errors bse (except for the correction factor in the default).

josef-pkt · 2020-12-17T20:57:19Z

a picture for fun,

using GLM for fractional regression
endog has a bit of noise added to make it interval data
orange is cauchy link
blue is probit link

cauchy seems to predict better based on the plot
but HC0 p-values for cauchy are very large, none significant, Maybe still a bug in my changes to CDFLinks
loglike with cauchy link is larger than with logit and probit.
no aic in summary, and I would like tic and imratio here (waiting for #7166 )

Stata and SAS also don't have cauchy link, AFAICS

josef-pkt · 2021-01-21T04:33:20Z

removing prio high
the PR that adds an exception for continuous data in probit has been merged, #7229

Whether we extend probit to continuous data QMLE is still an open question.
I'm tending now to not supporting it in discrete Probit, because after improving probit link in #7226, GLM should be a good alternative to discrete Probit that already allows for continuous data

josef-pkt added type-bug type-enh comp-discrete type-bug-wrong serious bugs that silently return incorrect numbers labels Dec 16, 2020

josef-pkt mentioned this issue Dec 16, 2020

SUMM/TST missing unit tests for GLM Probit #7211

Open

josef-pkt added the prio-high label Dec 17, 2020

josef-pkt mentioned this issue Dec 22, 2020

BUG: raise if non-binary endog in Probit #7229

Merged

josef-pkt removed the prio-high label Jan 21, 2021

josef-pkt mentioned this issue Dec 8, 2021

Documentation and Implementation Differ for Deviance in Logistic Regression #7934

Open

josef-pkt mentioned this issue Aug 26, 2022

ENH: MNLogit for counts or frequencies, similar to Binomial #8380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/ENH: Probit for continuous data #7210

BUG/ENH: Probit for continuous data #7210

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 17, 2020 •

edited

josef-pkt commented Dec 17, 2020

josef-pkt commented Dec 17, 2020 •

edited

josef-pkt commented Jan 21, 2021

BUG/ENH: Probit for continuous data #7210

BUG/ENH: Probit for continuous data #7210

Comments

josef-pkt commented Dec 16, 2020 • edited

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020 • edited

josef-pkt commented Dec 16, 2020

josef-pkt commented Dec 16, 2020 • edited

josef-pkt commented Dec 17, 2020 • edited

josef-pkt commented Dec 17, 2020

josef-pkt commented Dec 17, 2020 • edited

josef-pkt commented Jan 21, 2021

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 16, 2020 •

edited

josef-pkt commented Dec 17, 2020 •

edited

josef-pkt commented Dec 17, 2020 •

edited