-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: GLM, discrete with endogenous regressors, control functions #7689
Comments
AFAICS, without reading much y1 ~ Poisson(f(x b + y2 a)) where f is inverse link, exp second stage: use joined moment conditions as function of (b, a, b2): I guess we can get the asy cov_params from GMM on the joined moment conditions. (exactly identified ?) using generic two-stage: |
It looks like my GMMPoisson has good test coverage against Stata The stata results where created using the gmm command, and not ivpoisson |
We might want to add some helper functions to compute cov_params for exactly identified recursive, triangular system like 2-stage estimators. AFAIR, we don't have computational shortcut for GMM when it is exactly identified disadvantage compared to using GMM directly is that a helper function does not inherit robust cov_types (besides generic sandwich, HC ) slides showing how to use GMM for standard errors for control function 2-stage |
It should be possible to base unit tests for control function approach on this. |
What do we use for The estimating linear predictor is (open question, I have not looked at this.) |
implementation detail for control functions, API design which resid, which functions to use as control functions? maybe "control_kwds" resid might depend on context, e.g inverse mills ratio for probit 1st stage or generalized residual or resid_outcome or resid_pearson. everything gets more complicated if we have more than one endogenous variable with different 1st stage models, e.g. binary (Logit, Probit) or OLS/multivariateLS (as in IV2SLS), or multinomial, or ... partitioned formulas in general using a formula would be good, e.g. for interactions terms in control functions "~ r + I(r**2) + r:x" (without main effect of x, I guess "nested" in patsy) another option like extended regression in Stata update |
I'm trying out an example with Poisson outcome model and a linear endogenous variable as regressors However it works, both with "true" first stage error, and with residuals from linear projection on instruments. Poisson with cf estimates the slope parameters without bias (my nobs=5000 for the MC). |
another implementation issue which results class? Wald inference will be supported, but other postestimation score_test, influence (e.g. in Poisson) might not be correct. predict and get_prediction methods in results should be generic, and we only need to change the model predict. If we return a plain LikelihoodModelResults, then we loose out on methods that could work, or need to add/copy them individually. |
missing first stage model MultivariateLS for the 2-stage parameter estimates with linear 1st stage I can use Aside: I guess I don't need scale in least squares score_obs and hessian, if I use a standard sandwich, but will need to add scale for nonrobust cov_params. |
success for residual inclusion against R OneSampleMR, in their docstring baby example
agreement looks very high, precision in optimization and numdiff is not necessarily this high. |
flexibility versus efficient or more precise computation for derivatives For response residual inclusion we could get the analytical cross-derivative instead of using numdiff. This will be more difficult if we use other residuals (e.g. for Probit first stage) or use functions (formula) of the residuals. We would have to switch methods for cases where analytical derivative is not available. |
api: which With control functions we don't need to replace endogenous explanatory variables by their 1st stage estimate, we only need to append the control functions. We can allow for terms of the endogenous variables other than simple main effect, e.g. polynomials and interaction. If explanvar includes endogenous explanatory variables, then we need to know which are the exogenous variables or require that those are included in the If explanvar includes functions of endogenous variables, then control functions should include something similar. (formula handling in the difficult cases will be a pain. We usually don't get enough information in statsmodels.) |
(back to econometrics)
#1742 ivpoisson, GMMPoisson
control function approach with linear function for endogenous regressor doesn't look so scary.
https://stats.stackexchange.com/questions/281848/control-function-for-iv-poisson-regression
https://www.stata.com/features/overview/poisson-regression-with-endogenous-variables/
Cameron Trivedi count book 2nd edition, p. 402ff has docvis example that compares Poisson-CF, NB2-CF and additive and multiplicative version of GMMPoisson.
main caveat: They recommend bootstrap standard errors, because it is a two step estimation with generated regressor in second stage.
(maybe there is a Murphy/Topel or GMM asymptotic distribution for the two step procedure, see next comment)
related also fully parametric bivariate/multivariate models.
Cameron Trivedi also include a case with an endogenous categorical regressor modelled through simulated MLE for mixed multinomial.
IIRC, I have seen some time ago somewhere the use of splines for the control functions, to have less restrictive parametric assumptions.
partially related
#7636 heckman procedure with endogenous binary treatment but linear model at final stage.
The text was updated successfully, but these errors were encountered: