How to compute pseudo R^2 for GLM estimators #5861

jpainam · 2019-06-10T15:30:45Z

Hi, How can I compute the psquared for GLM estimators? I don't find the result in the summary as opposed to Logit and other estimators.
Thank you for giving me what values to use in the result to manually compute psquared.

Thank you.

kshedden · 2019-06-10T15:55:23Z

Do you mean R^2? I don't know what "psquared" would mean here. There is no R^2 outside of linear regression, but there are many "pseudo R^2" values that people commonly use to compare GLM's. Many of these can be easily computed from the log-likelihood function, which statsmodels provides as llf. A lot of discussion about this can be found on-line, below is one good reference:

https://statisticalhorizons.com/r2logistic

jpainam · 2019-06-10T16:17:13Z

Thank you. by psquared, i mean pseudo R^2. . I'll look at the given link.

josef-pkt · 2019-06-10T17:03:54Z

I thought we had it but it's missing in GLM

DiscreteResults has McFadden's pseudo-rsquared (attribute and in summary). It should be added to GLMResults also

    @cache_readonly
    def prsquared(self):
        return 1 - self.llf/self.llnull

other pseudo-rsquared versions could be made available as additional method

jpainam · 2019-06-10T17:11:41Z

Thank you, where should I add this function? Here the result of summary

   Results: Generalized linear model
=================================================================
Model:              GLM              AIC:            1001677.4666
Link Function:      log              BIC:            -15698.7316
Dependent Variable: mij              Log-Likelihood: -5.0080e+05
Date:               2019-06-10 23:15 LL-Null:        -5.5684e+05
No. Observations:   24360            Deviance:       2.2994e+05
Df Model:           40               Pearson chi2:   2.70e+05
Df Residuals:       24319            Scale:          1.0000
Method:             IRLS
-----------------------------------------------------------------
                 Coef.  Std.Err.     z     P>|z|   [0.025  0.975]

josef-pkt · 2019-06-10T17:16:48Z

compare with a discrete model

AFAICS it is before log-likelihood

        top_right = [('No. Observations:', None),
                     ('Df Residuals:', None),
                     ('Df Model:', None),
                     ('Pseudo R-squ.:', ["%#6.4g" % self.prsquared]),
                     ('Log-Likelihood:', None),
                     ('LL-Null:', ["%#8.5g" % self.llnull]),
                     ('LLR p-value:', ["%#6.4g" % self.llr_pvalue])
                     ]

or better: put it in the right column before AIC, i.e. on top

(AIC and BIC would be better at the bottom, but we don't change this for now.)

jpainam · 2019-06-10T17:39:12Z

@josef-pkt please, I don't understand. Where should I put the top_right dict? Thank you. Here is my code

df = pd.read_excel("../data/pseudo.xlsx")

df = df.fillna(value=0)
df.hist()
pl.show()

data = df
train_cols = data.columns[3:]
#logit = sm.Logit(data["admit"], data[train_cols])
logit = sm.GLM(data['mij'], data[train_cols], family=sm.families.NegativeBinomial())
result = logit.fit()
print(result.summary2())

josef-pkt · 2019-06-10T17:39:40Z

about the definition with respect to constant in the model

In the linear regression model the rsquared takes into account whether a constant is included or not among the regressors.

In discrete models and GLM we don't make the results statistic depend on the presence of a constant.
llnull is always the model with only a constant as explanatory variable.
The analogue to regression through zero (no constant) would be to assume that the linear prediction is zero (*), but we don't use this in GLM and discrete models. (at least not yet)

(*) which would mean in the Logit case that the predicted probability is 0.5.

josef-pkt · 2019-06-10T17:42:43Z

you are using summary2 not summary. I need to check, I don't know that code very well.

does logit.summary2() show the pseudo-rsquared?

josef-pkt · 2019-06-10T17:47:43Z

I think summary2 adds pseudo-rsquared by default if the attribute is available in the results instance
in iolib.summary2.summary_model a dict defines the optional attributes, AFAIR/AFAIU
info['Pseudo R-squared:'] = lambda x: "%#8.3f" % x.prsquared

josef-pkt · 2019-06-10T17:50:18Z

background: We have both summary and summary2 because we couldn't agree on a design.
summary is rigid with fine-tuned text formatting
summary2 is more flexible but has a different default formatting

jpainam · 2019-06-10T17:50:49Z

Here the result for result.summary()

                 Generalized Linear Model Regression Results
=======================================================
Dep. Variable:                    mij   No. Observations:                24360
Model:                            GLM   Df Residuals:                    24319
Model Family:        NegativeBinomial   Df Model:                           40
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -5.0080e+05
Date:                Tue, 11 Jun 2019   Deviance:                   2.2994e+05
Time:                        01:49:12   Pearson chi2:                 2.70e+05
No. Iterations:                   100   Covariance Type:             nonrobust
==========================================================

It's same output

jpainam · 2019-06-10T17:56:13Z

But using logit = sm.Logit(data["mij"], data[train_cols]) prints the pseudo-squared

Optimization terminated successfully.
         Current function value: 0.573147
         Iterations 6
                         Results: Logit
=======================================================

Model:              Logit            Pseudo R-squared: 0.083
Dependent Variable: admit            AIC:              470.5175
Date:               2019-06-10 22:08 BIC:              494.4663
No. Observations:   400              Log-Likelihood:   -229.26
Df Model:           5                LL-Null:        -249.99
Df Residuals:       394              LLR p-value:      7.5782e-08
Converged:          1.0000           Scale:        1.0000
No. Iterations:     6.0000

josef-pkt · 2019-10-14T19:52:56Z

AFAICS, this is still open, we don't have pseudo R-squared outside discrete models yet

vnijs · 2021-01-26T07:19:41Z

Pseudo R-squared is available for smf.logit but not for smf.glm. Are there plans to add this? The attributes are already there to calculate the measure. Thanks

logit_fit = smf.glm(
    formula="biden_wins ~ dem_lead_2016",
    family=Binomial(link=logit()),
    data=biden_county,
).fit()
logit_fit.summary()

# pseudo rsquared
(1 - logit_fit.llf / logit_fit.llnull)

anuragwatane · 2021-02-25T19:29:53Z

We would like to work on this issue.

…n the result.summary()

…esults_GLM.py. See statsmodels#5861

…t cases. See statsmodels#5861

…de pattern. See statsmodels#5861

nikkopante · 2021-04-11T06:34:04Z

Hi what about Adj. R^2 and AIC?

bashtage added comp-genmod type-enh labels Jun 10, 2019

jpainam changed the title ~~How to compute psquared for GLM estimators~~ How to compute pseudo R^2 for GLM estimators Jun 11, 2019

jpainam closed this as completed Oct 14, 2019

josef-pkt reopened this Oct 14, 2019

anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 7, 2021

statsmodels#5861 ENH: McFadden and Cox&Snell Pseudo R squared added i…

57cabc7

…n the result.summary()

anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021

ENH: Added test cases for McFadden and Cox-Snell Pseudo R-square to r…

4c7895c

…esults_GLM.py. See statsmodels#5861

anuragwatane mentioned this issue Mar 9, 2021

ENH: McFadden and Cox-Snell Pseudo R-square added to GLM. See statsmodels/statsmodels#5861 #7367

Closed

4 tasks

anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021

ENH: Using assert_allclose() instead of assert_almost_equal() for tes…

6ece27b

…t cases. See statsmodels#5861

anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021

ENH: Updated the formula for Cox-Snell Pseudo R-square to maintain co…

b4ecba0

…de pattern. See statsmodels#5861

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compute pseudo R^2 for GLM estimators #5861

How to compute pseudo R^2 for GLM estimators #5861

jpainam commented Jun 10, 2019

kshedden commented Jun 10, 2019

jpainam commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019 •

edited

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019 •

edited

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019

jpainam commented Jun 10, 2019

josef-pkt commented Oct 14, 2019

vnijs commented Jan 26, 2021

anuragwatane commented Feb 25, 2021

nikkopante commented Apr 11, 2021 •

edited

How to compute pseudo R^2 for GLM estimators #5861

How to compute pseudo R^2 for GLM estimators #5861

Comments

jpainam commented Jun 10, 2019

kshedden commented Jun 10, 2019

jpainam commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019 • edited

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019 • edited

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

josef-pkt commented Jun 10, 2019

jpainam commented Jun 10, 2019

jpainam commented Jun 10, 2019

josef-pkt commented Oct 14, 2019

vnijs commented Jan 26, 2021

anuragwatane commented Feb 25, 2021

nikkopante commented Apr 11, 2021 • edited

jpainam commented Jun 10, 2019 •

edited

jpainam commented Jun 10, 2019 •

edited

nikkopante commented Apr 11, 2021 •

edited