Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: MLEInfluence for two-part models, extra params, BetaModel #7912

Merged
merged 2 commits into from Dec 5, 2021

Conversation

josef-pkt
Copy link
Member

@josef-pkt josef-pkt commented Nov 28, 2021

It works but I have not looked at the results numbers yet, only smoke test

uses numdiff for _deriv_score_obs_dendog, see #7891

_deriv_score_obs_dendog takes derivative w.r.t. endog
this might be a problem in discrete models
count models NBP, GPP should be ok
Probit will likely be a problem, and we cannot use complex step derivatives.
I'm not sure whether definitions used in MLEInfluence apply to Probit.

Standardized score residuals use sf[0] / hf[0]. Does this work in general?
Do we need the same for the second score_factor?

Note, _deriv_mean_dparams needs to be w.r.t. full params even if mean only depends on mean params (for consistent shape with hessian)

R betareg has some of the influence outlier measures to test against.

@josef-pkt
Copy link
Member Author

my hat_diag (leverage) are the same as gleverage in R betareg.
however, hatvalues and cooks.distance in betareg are very different from what I have here.

checking:
Patricia L. Espinheira, Silvia L.P. Ferraria, Francisco Cribari-Neto Influence diagnostics in beta regression

in their appendix:
They derive df_beta and likelihood displacement (cook's distance) using only the mean model, submatrix of hessian wrt. mean parameters. (in Appendix A)
For generalized leverage in Appendix B, they use the full model, with hessian combined for mean parameter and (constant) precision parameter.

betareg doesn't have df_betas and dffits as public methods.

I might have to split params so we can have cook's distance separately for mean and precision.

The ranking in betareg's and my cook distance looks pretty different.
Some of my high cook's distance observations might mainly affect precision.

Aside:
docstring of BetaModel doesn't have references.
I don't remember which is the exog_precision article, the above articles with influence are for constant precision parameter.

@josef-pkt
Copy link
Member Author

checking with explicit LOO loop (copied from OLSInfluence and adjusted for second exog)

d_params looks close, except I would have thought diff should be the other way (opposite sign)

print(pd.DataFrame(loo["params"] - self.results.params).iloc[:4])
          0         1         2         3         4
0 -0.024227 -0.000235  0.010587 -0.121261  0.030607
1 -0.022435  0.000258  0.004342  0.006072 -0.005709
2  0.015340 -0.000324 -0.001168 -0.033104  0.007280
3  0.057402 -0.000816  0.002249  0.123817 -0.008962

print(pd.DataFrame(influ.d_params[:4]))
          0         1         2         3         4
0  0.024031  0.000241 -0.010945  0.144920 -0.036369
1  0.022966 -0.000271 -0.004306 -0.010235  0.007268
2 -0.015331  0.000321  0.001211  0.034539 -0.007405
3 -0.057996  0.000822 -0.002156 -0.110825  0.007823

@josef-pkt
Copy link
Member Author

reference for influence in model with variable dispersion
Rocha, Andréa V., and Alexandre B. Simas. "Influence diagnostics in a general class of beta regression models." Test 20, no. 1 (2011): 95-119.

I guess I don't bother matching R betareg for now.

They ignore the effect of changing dispersion parameter estimates.
generic formula in Rocha and Simas is what I'm using in MLEInfluence. It works generically, without needing to code the model specific formulas.

I might just verify a few things with the LOO.

I haven't checked the fitted/resid related influence and outlier statistics yet. e.g. (internally) studentized residual

@josef-pkt
Copy link
Member Author

score residual as used in MLEInfluence.resid_studentized are still unclear
some references and discussion in #7897 (comment)

My definition is kind of ok and more generic. But it's not what the literature on beta regression is doing.
I leave this as is for now. I need to go back to infrastructure work.

The score residuals defined as sf / hf are reasonably close to pearson residuals in the BetaModel case for the income-foodexpenditure data with one slope variable for precision.

@josef-pkt
Copy link
Member Author

infrastructure
MLEInfluence requires resid_pearson in __init__ but then doesn't seem to use it.
Make it optional, e.g. NBP and similar don't have resid_pearson yet.

@josef-pkt
Copy link
Member Author

latest commit adds same pattern as in BetaModel to NBP and GPP using numdiff
_deriv_score_obs_dendog in NBBP and GPP are identical, not DRY,
and the same as the one in BetaReg except for not having exog_precision.

I didn't look at the new results numbers yet.
GPP has a Warning in my test run

statsmodels/discrete/tests/test_predict.py::TestGeneralizedPoissonPredict::test_influence
  ...\statsmodels\statsmodels\stats\outliers_influence.py:475: RuntimeWarning: invalid value encountered in sqrt
    return sf / np.sqrt(hf) / np.sqrt(1 - self.hat_matrix_diag)

needs checking

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 29, 2021

I'm not sure if I should add the get_influence method as public method and advertising.
(maybe warning note in docstring is enough)

update decision: I will add the results methods. Then, I have a good place to provide model specific explanations and references. This is mainly for the BetaModel case, where I can refer to Rocha/Simas to explain the differences to R betareg.
NBP and GPP will have just generic warning and explanations until I look at the literature for those models.

Probit doesn't support MLEInfluence yet.
The old NegativeBinomial doesn't have score_factor and hessian_factor yet, but it's redundant with NBP.

for later: we should add explicit looo loop for selected observations to the Influence classes.

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 30, 2021

reading up again a bit for MLE

fisher information matrix,, hessian is the variance of the score
i.e. that's the information matrix equality for correctly specified MLE
evaluated at MLE

Var(score) = E(hessian)

so sf / hf standardizes the score by the hessian in a one parameter model (without exog)

in 2-parameter model
V([sf1 sf2]) = [[hf11, hf12], [hf12, hf22]]

so standardizing sf1 only needs hf11 (no matrix inverse)

However, according to this, I think we should standardize sf1 by sqrt(hf11).
???

the above needs to use negative hessien, -hf

update
I'm using the sqrt(-hf) already !!!
sf / np.sqrt(hf) / np.sqrt(1 - self.hat_matrix_diag)

So this is the analogue to the constant distribution parameter case, except we take derivatives wrt the linear predictors.

I found a related article, that uses score_obs directly (including the exog part)
score residuals are defined as quadratic form, similar to score test for individual observations.
They mention using subsets or similar for individual parameters.
They mention analogy of quadratic form to cooks distance which is also quadratic form
see #7913 for using subsets or linear combination/weights for cooks distance.

Kapitula, Laura Ring, and Edward J. Bedrick. 2005. “Diagnostics for the Exponential Normal Growth Curve Model.” Statistics in Medicine 24 (1): 95–108. https://doi.org/10.1002/sim.1919.
page 100 section 3.4 on Residuals

Aside:
cooks distance looks like the analogue of a hausman test by observations, or is it a wald test analogue?
score_obs residual is the analogue to score test by observation
I guess deviance residuals or likelihood displacement are something like a LR test by observation,

@josef-pkt
Copy link
Member Author

josef-pkt commented Nov 30, 2021

one question then is which "score residuals" I should include and how

  • score_factor residual for mean
  • score_factor residual for all components
  • score_obs quadratic form
  • resid_pearson (currently not used for studentized, only available for mean models or mean-het models. We could use (scipy) distribution, if available, to compute mean and var if non-mean parameterization.

MLEInfluence.resid_studentized is cached attribute and we cannot add options to it.

score_obs quadratic form should always be available possibly using numdiff score_obs

One more, I just realized

I'm using sf_i / hf_i i.e. dividing by hessian_factor for an individual observation

Kapitula and Bedrick use information matrix/hessian of the sample
score_obs @ I^{-1} @ score_obs

where I is negative hessian (second derivative) at MLE.

Cooks distance uses cov_params ie. also hessian at MLE, but it might be a robust version and not hessian.

@josef-pkt
Copy link
Member Author

One problem still is that there is no citable reference for what I'm doing.
In special cases it will coincide with generalized residuals or a variation of those.

Another aside:
I include the derivative of the mean function (link.inverse) in the score_factor and hessian_factor.
This assumes that the analogy is a constant parameter model but with a nonlinear transformation (link) for the parameter.

@josef-pkt
Copy link
Member Author

(back to this after a detour to downloading and skimming articles0

GPP hf[0] has around 5 to 6% of observations with wrong sign (positive) in docvis example

hessian agrees with numdiff hessian.
I just checked test_jac in test_discrete is not inherited by GPP test classes.
so hessian_factor is not tested against hessian.

However, results look good using the code of check_jac to compare hessian
the hessian based on hessian_factor agrees with original direct computation of hessian.

np.max(np.abs(h2 - h1))
2.9103830456733704e-11

GPP test classes do not subclass any other class in tests

aside default in GPP is p=1
internal parameterization attribute =0.

I'm getting about the same fraction of hf with wrong sign if p=2

@josef-pkt
Copy link
Member Author

josef-pkt commented Dec 4, 2021

I don't see any bug or problem in GPP hessian_factor yet.
Maybe the GPP loglike is not globally concave ?

res_gpp._dispersion_factor is around 5 (at p=1 model)., So, I guess, we are far away from under dispersion problems.

update
It looks like the loglike as function of linear predictor of mean function is not globally concave.
It looks nicely concave around the maximum of the loglike, but can have a increasing but convex segment.

Also the loglike looks globally concave as function of mean mu, based on plot and numdiff hessian at the same point.

docvis data, second to last observation has positive hessian_factor (warning in unit test)

func = lambda lp_m: genpoisson_p._logpmf(14, np.exp(lp_m), 1.3471951385672025, p=1)
m = np.array([4.731128867167516])
lp_m = np.log(m)
lp_m, approx_fprime_cs(lp_m, func), approx_hess(lp_m, func)
(array([1.55416384]), array([[1.5913773]]), array([[0.0685618]]))

mus = np.linspace(1, 50, 101)
lp_ms = np.log(mus)
fy_lp = genpoisson_p._logpmf(14, np.exp(lp_ms), 1.3471951385672025, p=1)
plt.plot(lp_ms, fy_lp)

observation is at lp_m = 1.55
image

@josef-pkt
Copy link
Member Author

josef-pkt commented Dec 4, 2021

Also we need a separate results class for NBP.
I'm currently not supporting the post-estimation parts for the older NegativeBinomial model.

I'm giving up for now on investigating GPP hessian. That takes more work and should be a separate issue.
GP-2, GPP(p=2), has unit tests against Stata user package gnpoisson by Hilbe https://ideas.repec.org/c/boc/bocode/s457328.html
VGAM has several parameterization of GP, but I think we never compared with that.

I want to add predict(which="var") to GPP and NBP, then we can get resid_pearson

asides:
NegativeBinomialResults has lnalpha property, but it's independent of model.transparams. Is this correct for different optimizers?
NegativeBinomialResults overrides aic, bic, but GeneralizedPoissonResults does not
Is inherited DiscreteResults. aic correct? it doesn't add k_extra.
update GP gets the correct aic, bic inheriting from NBResults GeneralizedPoissonResults(NegativeBinomialResults)

The ZI models inherit aic, bic from DiscreteResults, AFAICS. Are those correct?
aic is unit tested for ZINBP and ZIGPP, but at large tolerance. bic is skipped (pass)

@josef-pkt
Copy link
Member Author

merging this as is. all green
I'm postponing the decision which resid to use and whether to choose different resid across models.

I might switch to using resid_pearson as default for resid_studentized for GPP or in general.
NBP and GPP don't have resid_pearson yet.

@josef-pkt josef-pkt merged commit 3828c34 into statsmodels:main Dec 5, 2021
@josef-pkt josef-pkt deleted the influence_numdiff_beta branch October 19, 2022 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant