Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compute pseudo R^2 for GLM estimators #5861

Open
jpainam opened this issue Jun 10, 2019 · 16 comments
Open

How to compute pseudo R^2 for GLM estimators #5861

jpainam opened this issue Jun 10, 2019 · 16 comments

Comments

@jpainam
Copy link

jpainam commented Jun 10, 2019

Hi, How can I compute the psquared for GLM estimators? I don't find the result in the summary as opposed to Logit and other estimators.
Thank you for giving me what values to use in the result to manually compute psquared.

Thank you.

@kshedden
Copy link
Contributor

Do you mean R^2? I don't know what "psquared" would mean here. There is no R^2 outside of linear regression, but there are many "pseudo R^2" values that people commonly use to compare GLM's. Many of these can be easily computed from the log-likelihood function, which statsmodels provides as llf. A lot of discussion about this can be found on-line, below is one good reference:

https://statisticalhorizons.com/r2logistic

@jpainam
Copy link
Author

jpainam commented Jun 10, 2019

Thank you. by psquared, i mean pseudo R^2. . I'll look at the given link.

@josef-pkt
Copy link
Member

I thought we had it but it's missing in GLM

DiscreteResults has McFadden's pseudo-rsquared (attribute and in summary). It should be added to GLMResults also

    @cache_readonly
    def prsquared(self):
        return 1 - self.llf/self.llnull

other pseudo-rsquared versions could be made available as additional method

@jpainam
Copy link
Author

jpainam commented Jun 10, 2019

Thank you, where should I add this function? Here the result of summary

   Results: Generalized linear model
=================================================================
Model:              GLM              AIC:            1001677.4666
Link Function:      log              BIC:            -15698.7316
Dependent Variable: mij              Log-Likelihood: -5.0080e+05
Date:               2019-06-10 23:15 LL-Null:        -5.5684e+05
No. Observations:   24360            Deviance:       2.2994e+05
Df Model:           40               Pearson chi2:   2.70e+05
Df Residuals:       24319            Scale:          1.0000
Method:             IRLS
-----------------------------------------------------------------
                 Coef.  Std.Err.     z     P>|z|   [0.025  0.975]

@josef-pkt
Copy link
Member

compare with a discrete model

AFAICS it is before log-likelihood

        top_right = [('No. Observations:', None),
                     ('Df Residuals:', None),
                     ('Df Model:', None),
                     ('Pseudo R-squ.:', ["%#6.4g" % self.prsquared]),
                     ('Log-Likelihood:', None),
                     ('LL-Null:', ["%#8.5g" % self.llnull]),
                     ('LLR p-value:', ["%#6.4g" % self.llr_pvalue])
                     ]

or better: put it in the right column before AIC, i.e. on top

(AIC and BIC would be better at the bottom, but we don't change this for now.)

@jpainam
Copy link
Author

jpainam commented Jun 10, 2019

@josef-pkt please, I don't understand. Where should I put the top_right dict? Thank you. Here is my code

df = pd.read_excel("../data/pseudo.xlsx")

df = df.fillna(value=0)
df.hist()
pl.show()

data = df
train_cols = data.columns[3:]
#logit = sm.Logit(data["admit"], data[train_cols])
logit = sm.GLM(data['mij'], data[train_cols], family=sm.families.NegativeBinomial())
result = logit.fit()
print(result.summary2())

@josef-pkt
Copy link
Member

about the definition with respect to constant in the model

In the linear regression model the rsquared takes into account whether a constant is included or not among the regressors.

In discrete models and GLM we don't make the results statistic depend on the presence of a constant.
llnull is always the model with only a constant as explanatory variable.
The analogue to regression through zero (no constant) would be to assume that the linear prediction is zero (*), but we don't use this in GLM and discrete models. (at least not yet)

(*) which would mean in the Logit case that the predicted probability is 0.5.

@josef-pkt
Copy link
Member

you are using summary2 not summary. I need to check, I don't know that code very well.

does logit.summary2() show the pseudo-rsquared?

@josef-pkt
Copy link
Member

I think summary2 adds pseudo-rsquared by default if the attribute is available in the results instance
in iolib.summary2.summary_model a dict defines the optional attributes, AFAIR/AFAIU
info['Pseudo R-squared:'] = lambda x: "%#8.3f" % x.prsquared

@josef-pkt
Copy link
Member

background: We have both summary and summary2 because we couldn't agree on a design.
summary is rigid with fine-tuned text formatting
summary2 is more flexible but has a different default formatting

@jpainam
Copy link
Author

jpainam commented Jun 10, 2019

Here the result for result.summary()

                 Generalized Linear Model Regression Results
=======================================================
Dep. Variable:                    mij   No. Observations:                24360
Model:                            GLM   Df Residuals:                    24319
Model Family:        NegativeBinomial   Df Model:                           40
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -5.0080e+05
Date:                Tue, 11 Jun 2019   Deviance:                   2.2994e+05
Time:                        01:49:12   Pearson chi2:                 2.70e+05
No. Iterations:                   100   Covariance Type:             nonrobust
==========================================================

It's same output

@jpainam
Copy link
Author

jpainam commented Jun 10, 2019

But using logit = sm.Logit(data["mij"], data[train_cols]) prints the pseudo-squared

Optimization terminated successfully.
         Current function value: 0.573147
         Iterations 6
                         Results: Logit
=======================================================

Model:              Logit            Pseudo R-squared: 0.083
Dependent Variable: admit            AIC:              470.5175
Date:               2019-06-10 22:08 BIC:              494.4663
No. Observations:   400              Log-Likelihood:   -229.26
Df Model:           5                LL-Null:        -249.99
Df Residuals:       394              LLR p-value:      7.5782e-08
Converged:          1.0000           Scale:        1.0000
No. Iterations:     6.0000

@jpainam jpainam changed the title How to compute psquared for GLM estimators How to compute pseudo R^2 for GLM estimators Jun 11, 2019
@jpainam jpainam closed this as completed Oct 14, 2019
@josef-pkt
Copy link
Member

AFAICS, this is still open, we don't have pseudo R-squared outside discrete models yet

@josef-pkt josef-pkt reopened this Oct 14, 2019
@vnijs
Copy link

vnijs commented Jan 26, 2021

Pseudo R-squared is available for smf.logit but not for smf.glm. Are there plans to add this? The attributes are already there to calculate the measure. Thanks

logit_fit = smf.glm(
    formula="biden_wins ~ dem_lead_2016",
    family=Binomial(link=logit()),
    data=biden_county,
).fit()
logit_fit.summary()

# pseudo rsquared
(1 - logit_fit.llf / logit_fit.llnull)

@anuragwatane
Copy link

We would like to work on this issue.

anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 7, 2021
anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021
anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021
anuragwatane pushed a commit to anuragwatane/statsmodels that referenced this issue Mar 9, 2021
@nikkopante
Copy link

nikkopante commented Apr 11, 2021

Hi what about Adj. R^2 and AIC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants