Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IV support #58

Closed
6 of 9 tasks
s3alfisc opened this issue Apr 27, 2023 · 5 comments
Closed
6 of 9 tasks

IV support #58

s3alfisc opened this issue Apr 27, 2023 · 5 comments
Milestone

Comments

@s3alfisc
Copy link
Member

s3alfisc commented Apr 27, 2023

Required Steps:

  • allow three-part formulas
  • implement IV estimation & inference. Start with just-identified models.
  • update summary() and tidy() methods. For tidy(), simply add index "stage1" and "stage2". Split summary() into two parts. Check out what fixest does.

Nice to have's (for now):

  • common IV diagnostics
  • AR confidence intervals

In Practice:

  • implement IV estimator (Z'X)^(-1) Z'Y.
  • always create X and Z. For OLS, set X = Z
  • pass everything through, done (at least for models with only one endog variable)
  • after this is implemented, allow for over identified models. This requires implementation of the 2SLS estimator + updates to inference procedures
@s3alfisc s3alfisc added this to the v2.0 milestone May 1, 2023
@s3alfisc
Copy link
Member Author

s3alfisc commented May 6, 2023

Some design considerations for internal processing:

Assume the following model formula: fml = "Y + Y2~ X1 | csw0(X3, X4) .

Currently, information on model variables is e.g. encoded in self.var_dict in the following way

>>> self.var_dict
{'0': ['Y', 'Y2', 'X1'], 'X3': ['Y', 'Y2', 'X1'], 'X3+X4': ['Y', 'Y2', 'X1']}

and in self.fml_dict as

>>> self.fml_dict
{'0': ['Y~X1', 'Y2~X1'], 'X3': ['Y~X1', 'Y2~X1'], 'X3+X4': ['Y~X1', 'Y2~X1']}

With instruments, the formula might look as fml = "Y + Y2~ X1 | csw0(X3, X4) | X1 ~ Z1". It is likely preferable to use a "clearer" data structure to save the information above. E.g.

{'0': [{"depvar": "Y", x_vars = ["X1"], z_vars = ["Z1"]}, {{"depvar": "Y2", x_vars = ["X1"], z_vars = ["Z1"]}}, "X3":  ...].

@s3alfisc
Copy link
Member Author

Status:

  • In dev branch, IV-Estimator is implemented + associated SEs (iid, HC1-3, CRV1).
  • some tests against fixest are implemented & pass
  • still to be done: error handling + tests for multiple estimations

Also, formula syntax currently deviates from fixest. For models without covariates, fixest syntax is

feols(Y ~ 1 | fe | endogvar ~ instrument)

while pyfixest is

Fixest(df).feols("Y ~ endogvar | fe | endogvar ~ instrument"). 

After tackling all points above, I will merge the PR and implement the 2SLS estimator in a separate PR.

@s3alfisc
Copy link
Member Author

Basic IV support added with version 0.5.

@s3alfisc
Copy link
Member Author

2SLS estimator now implemented in the dev branch.

@s3alfisc
Copy link
Member Author

s3alfisc commented May 19, 2023

To do:

  • currently, there is a minor deviation to fixest syntax. For models without any covariates, fixest syntax is feols(depvar ~ 1 | fe | endogvar ~ instruments), while pyfixest requires feols(depvar ~ endogvar | fe | endogvar ~ instruments). Align pyfixest syntax with fixest.
  • multiple estimation syntax does not work in the first part of the three-part formula, i.e. depvar ~ csw(X1, X2) | fe | endogvar ~ instrument leads to a bug
  • add tests for IV with multiple estimation
  • add F-test (for first stage and in general)
  • some documentation improvements are definitely possible

@s3alfisc s3alfisc closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant