New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compute pseudo R^2 for GLM estimators #5861
Comments
Do you mean R^2? I don't know what "psquared" would mean here. There is no R^2 outside of linear regression, but there are many "pseudo R^2" values that people commonly use to compare GLM's. Many of these can be easily computed from the log-likelihood function, which statsmodels provides as |
Thank you. by psquared, i mean pseudo R^2. . I'll look at the given link. |
I thought we had it but it's missing in GLM DiscreteResults has McFadden's pseudo-rsquared (attribute and in summary). It should be added to GLMResults also
other pseudo-rsquared versions could be made available as additional method |
Thank you, where should I add this function? Here the result of summary Results: Generalized linear model
=================================================================
Model: GLM AIC: 1001677.4666
Link Function: log BIC: -15698.7316
Dependent Variable: mij Log-Likelihood: -5.0080e+05
Date: 2019-06-10 23:15 LL-Null: -5.5684e+05
No. Observations: 24360 Deviance: 2.2994e+05
Df Model: 40 Pearson chi2: 2.70e+05
Df Residuals: 24319 Scale: 1.0000
Method: IRLS
-----------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975] |
compare with a discrete model AFAICS it is before log-likelihood
or better: put it in the right column before AIC, i.e. on top (AIC and BIC would be better at the bottom, but we don't change this for now.) |
@josef-pkt please, I don't understand. Where should I put the top_right dict? Thank you. Here is my code df = pd.read_excel("../data/pseudo.xlsx")
df = df.fillna(value=0)
df.hist()
pl.show()
data = df
train_cols = data.columns[3:]
#logit = sm.Logit(data["admit"], data[train_cols])
logit = sm.GLM(data['mij'], data[train_cols], family=sm.families.NegativeBinomial())
result = logit.fit()
print(result.summary2()) |
about the definition with respect to constant in the model In the linear regression model the rsquared takes into account whether a constant is included or not among the regressors. In discrete models and GLM we don't make the results statistic depend on the presence of a constant. (*) which would mean in the Logit case that the predicted probability is 0.5. |
you are using does logit.summary2() show the pseudo-rsquared? |
I think summary2 adds pseudo-rsquared by default if the attribute is available in the results instance |
background: We have both |
Here the result for Generalized Linear Model Regression Results
=======================================================
Dep. Variable: mij No. Observations: 24360
Model: GLM Df Residuals: 24319
Model Family: NegativeBinomial Df Model: 40
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -5.0080e+05
Date: Tue, 11 Jun 2019 Deviance: 2.2994e+05
Time: 01:49:12 Pearson chi2: 2.70e+05
No. Iterations: 100 Covariance Type: nonrobust
========================================================== It's same output |
But using Optimization terminated successfully.
Current function value: 0.573147
Iterations 6
Results: Logit
=======================================================
Model: Logit Pseudo R-squared: 0.083
Dependent Variable: admit AIC: 470.5175
Date: 2019-06-10 22:08 BIC: 494.4663
No. Observations: 400 Log-Likelihood: -229.26
Df Model: 5 LL-Null: -249.99
Df Residuals: 394 LLR p-value: 7.5782e-08
Converged: 1.0000 Scale: 1.0000
No. Iterations: 6.0000 |
AFAICS, this is still open, we don't have pseudo R-squared outside discrete models yet |
Pseudo R-squared is available for smf.logit but not for smf.glm. Are there plans to add this? The attributes are already there to calculate the measure. Thanks
|
We would like to work on this issue. |
…n the result.summary()
Hi what about Adj. R^2 and AIC? |
Hi, How can I compute the psquared for GLM estimators? I don't find the result in the summary as opposed to Logit and other estimators.
Thank you for giving me what values to use in the result to manually compute psquared.
Thank you.
The text was updated successfully, but these errors were encountered: