# Getting Started with PyFixest

In a first step, we load the module and some example data:

In [68]:
%load_ext autoreload
%autoreload 2

from pyfixest.estimation import feols, fepois
from pyfixest.summarize import summary, etable
from pyfixest.visualize import coefplot, iplot
from pyfixest.utils import get_data

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [69]:
data = get_data()
data.head()

Unnamed: 0,Y,Y2,X1,X2,f1,f2,f3,group_id,Z1,Z2
0,,-9.166216,2.0,0.457858,9.0,15.0,0.0,1.0,2.464146,1.583723
1,3.221964,-2.835142,,-4.998406,8.0,6.0,9.0,11.0,,-2.749629
2,1.449755,-3.721375,1.0,1.55848,,11.0,0.0,18.0,0.44956,0.91013
3,5.179868,14.696121,2.0,1.560402,15.0,1.0,4.0,15.0,0.823438,0.9149
4,1.193511,-6.568647,2.0,-3.472232,20.0,19.0,9.0,5.0,0.895978,-3.056434


## OLS Estimation

We can estimate a fixed effects regression via the `feols()` function. `feols()` has three arguments: a two-sided model formula, the data, and optionally, the type of inference.

In [70]:
fit = feols(fml="Y~X1 | f1", data=data, vcov="HC1")
type(fit)

pyfixest.feols.Feols

The first part of the formula contains the dependent variable and "regular" covariates, while the second part contains fixed effects.

`feols()` returns an instance of the `Fixest` class.

To inspect the results, we can use a summary function or method:

In [71]:
fit.summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.057 |     6.321 |      0.000 |   0.249 |    0.474 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038


Alternatively, the `.summarize` module contains a `summary` function, which can be applied on instances of regression model objects 
or lists of regression model objects. 

In [72]:
summary(fit)

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.057 |     6.321 |      0.000 |   0.249 |    0.474 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038


You can access individual elements of the summary via dedicated methods: `.tidy()` returns a "tidy" `pd.DataFrame`, 
`.coef()` returns estimated parameters, and `se()` estimated standard errors. Other methods include `pvalue()`, `confint()`
and `tstat()`.

In [73]:
fit.coef()

Coefficient
X1    0.361642
Name: Estimate, dtype: float64

In [74]:
fit.se()

Coefficient
X1    0.057212
Name: Std. Error, dtype: float64

## Standard Errors and Inference

Supported covariance types are "iid", "HC1-3", CRV1 and CRV3 (up to two-way clustering). Inference can be adjusted "on-the-fly" via the
`.vcov()` method:

In [75]:
fit.vcov({"CRV1": "group_id + f1"}).summary()
fit.vcov({"CRV3": "group_id"}).summary()

###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.068 |     5.315 |      0.000 |   0.219 |    0.505 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  CRV3
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.072 |     5.032 |      0.000 |   0.211 |    0.513 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038


It is also possible to run a wild (cluster) bootstrap after estimation (via the [wildboottest module](https://github.com/s3alfisc/wildboottest)):

In [76]:
fit2 = feols(fml="Y~ X1", data=data, vcov={"CRV1": "group_id"})
fit2.wildboottest(param="X1", B=999)

param                            X1
t value                    5.452993
Pr(>|t|)                        0.0
bootstrap_type                   11
inference         CRV(['group_id'])
impose_null                    True
dtype: object

Note that the wild bootstrap currently does not support fixed effects in the regression model. Supporting fixed effects is work in progress.

## IV Estimation 

It is also possible to estimate instrumental variable models with *one* endogenous variable and (potentially multiple) instruments:

In [77]:
iv_fit = feols(fml="Y2~ 1 | f1 + f2 | X1 ~ Z1 + Z2", data=data)
iv_fit.summary()

###

Estimation:  IV
Dep. var.: Y2, Fixed effects: f1+f2
Inference:  CRV1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.043 |        0.307 |     0.138 |      0.891 |  -0.586 |    0.671 |
---


If the model does not contain any fixed effects, just drop the second part of the formula above:

In [78]:
feols(fml="Y~ 1 | X1 ~ Z1 + Z2", data=data).summary()

###

Estimation:  IV
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.282 |        0.120 |    19.031 |      0.000 |   2.047 |    2.518 |
| X1            |      0.280 |        0.107 |     2.609 |      0.009 |   0.069 |    0.491 |
---


IV estimation with multiple endogenous variables and multiple estimation syntax is currently not supported. The syntax is "depvar ~ exog.vars | fixef effects | endog.vars ~ instruments".

## Poisson Regression 

With version `0.8.4`, it is possible to estimate Poisson Regressions (not yet on PyPi): 

In [79]:
from pyfixest.utils import get_data

pois_data = get_data(model="Fepois")
pois_fit = fepois(fml="Y~X1 | f1+f2", data=pois_data, vcov={"CRV1": "group_id"})
pois_fit.summary()

###

Estimation:  Poisson
Dep. var.: Y, Fixed effects: f1+f2
Inference:  CRV1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |     -0.000 |        0.033 |    -0.010 |      0.992 |  -0.066 |    0.065 |
---
Deviance: 1068.044


## Multiple Estimation 

`PyFixest` supports a range of multiple estimation functionality: `sw`, `sw0`, `csw`, `csw0`, and multiple dependent variables. If multiple regression syntax is used, 
`feols()` and `fepois` returns an instance of a `FixestMulti` object, which essentially consists of a dicionary of `Fepois` or `Feols` instances.

In [80]:
multi_fit = feols(fml="Y~X1 | csw0(f1, f2)", data=data, vcov="HC1")
multi_fit

<pyfixest.FixestMulti.FixestMulti at 0x2650c75db10>

In [81]:
multi_fit.summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  HC1
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.231 |        0.091 |    24.542 |      0.000 |   2.052 |    2.409 |
| X1            |      0.333 |        0.070 |     4.745 |      0.000 |   0.195 |    0.471 |
---
RMSE: 1.765  Adj. R2: 0.021  Adj. R2 Within: 0.021
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  HC1
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.057 |     6.321 |      0.000 |   0.249 |    0.474 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  HC1
Observations

Alternatively, you can look at the estimation results via the `etable()` method:

In [82]:
multi_fit.etable()

fml,Y~X1,Y~X1,Y~X1|f1,Y~X1|f1+f2
Coefficient,Intercept,X1,X1,X1
Estimate,2.231,0.333,0.362,0.351
Std. Error,0.091,0.07,0.057,0.049
t value,24.542,4.745,6.321,7.102
Pr(>|t|),0.0,0.0,0.0,0.0
2.5 %,2.052,0.195,0.249,0.254
97.5 %,2.409,0.471,0.474,0.448


If you are only insterested in some parameters, e.g. "X1", you can use the following syntax:

In [83]:
multi_fit.etable().xs("X1", level=1, axis=1)

fml,Y~X1,Y~X1|f1,Y~X1|f1+f2
Estimate,0.333,0.362,0.351
Std. Error,0.07,0.057,0.049
t value,4.745,6.321,7.102
Pr(>|t|),0.0,0.0,0.0
2.5 %,0.195,0.249,0.254
97.5 %,0.471,0.474,0.448


You can access an individual model by its name - i.e. a formula - via the `all_fitted_models` attribure.

In [84]:
multi_fit.all_fitted_models["Y~X1"].tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.230613,0.090888,24.542465,0.0,2.052259,2.408967
X1,0.332869,0.070151,4.745052,2e-06,0.195209,0.470529


or equivalently via the `fetch_model` method:

In [85]:
multi_fit.fetch_model(0).tidy()

Model:  Y~X1


Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,2.230613,0.090888,24.542465,0.0,2.052259,2.408967
X1,0.332869,0.070151,4.745052,2e-06,0.195209,0.470529


Here, `0` simply fetches the first model stored in the `all_fitted_models` dictionary, `1` the second etc.

Objects of type `Fixest` come with a range of additional methods: `tidy()`, `coef()`, `vcov()` etc, which 
essentially loop over the equivalent methods of all fitted models. E.g. `Fixest.vcov()` updates inference for all 
models stored in `Fixest`.

In [86]:
multi_fit.vcov("iid").summary()

###

Estimation:  OLS
Dep. var.: Y
Inference:  iid
Observations:  998

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| Intercept     |      2.231 |        0.088 |    25.261 |      0.000 |   2.057 |    2.404 |
| X1            |      0.333 |        0.069 |     4.807 |      0.000 |   0.197 |    0.469 |
---
RMSE: 1.765  Adj. R2: 0.021  Adj. R2 Within: 0.021
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1
Inference:  iid
Observations:  997

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5 % |   97.5 % |
|:--------------|-----------:|-------------:|----------:|-----------:|--------:|---------:|
| X1            |      0.362 |        0.057 |     6.369 |      0.000 |   0.250 |    0.473 |
---
RMSE: 1.422  Adj. R2: 0.038  Adj. R2 Within: 0.038
###

Estimation:  OLS
Dep. var.: Y, Fixed effects: f1+f2
Inference:  iid
Observations

If you have estimated multiple models without multiple estimation syntax and still want to compare them, you can use the `etable()` function: 

In [87]:
from pyfixest.summarize import etable

etable([fit, fit2])

| Coefficient   | est1             | est2             |
|:--------------|:-----------------|:-----------------|
| X1            | 0.362*** (0.072) | 0.333*** (0.061) |
| Intercept     |                  | 2.231*** (0.079) |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001


## Visualization 

`PyFixest` provides two functions to visualize the results of a regression: `coefplot` and `iplot`.

In [88]:
from lets_plot import *

LetsPlot.setup_html()

multi_fit.coefplot().show()

## Difference-in-Differences / Event Study Designs

`PyFixest` supports eventy study designs via two-way fixed effects and Gardner's 2-stage estimator. 

In [89]:
import pandas as pd
import numpy as np
from pyfixest.experimental.did import did2s

file_path = "../pyfixest/experimental/data/df_het.csv"
df_het = pd.read_csv(file_path)
df_het.head()

Unnamed: 0,unit,state,group,unit_fe,g,year,year_fe,treat,rel_year,rel_year_binned,error,te,te_dynamic,dep_var
0,1,33,Group 2,7.043016,2010,1990,0.066159,False,-20.0,-6,-0.086466,0,0.0,7.022709
1,1,33,Group 2,7.043016,2010,1991,-0.03098,False,-19.0,-6,0.766593,0,0.0,7.778628
2,1,33,Group 2,7.043016,2010,1992,-0.119607,False,-18.0,-6,1.512968,0,0.0,8.436377
3,1,33,Group 2,7.043016,2010,1993,0.126321,False,-17.0,-6,0.02187,0,0.0,7.191207
4,1,33,Group 2,7.043016,2010,1994,-0.106921,False,-16.0,-6,-0.017603,0,0.0,6.918492


In [90]:
fit_did2s = did2s(
    df_het,
    yname="dep_var",
    first_stage="~ 0 | state + year",
    second_stage="~i(rel_year)",
    treatment="treat",
    cluster="state",
    i_ref1=[-1.0, np.inf],
)

fit_twfe = feols(
    "dep_var ~ i(rel_year) | state + year",
    df_het,
    i_ref1=[-1.0, np.inf],
    vcov={"CRV1": "state"},
)

iplot(
    [fit_did2s, fit_twfe], coord_flip=False, figsize=(900, 400), title="TWFE vs DID2S"
)

  warn('spsolve is more efficient when sparse b '


The `event_study()` function provides a common API for several event study estimators.

In [91]:
from pyfixest.experimental.did import event_study
from pyfixest.summarize import etable

fit_twfe = event_study(
    data=df_het,
    yname="dep_var",
    idname="state",
    tname="year",
    gname="g",
    estimator="twfe",
)

fit_did2s = event_study(
    data=df_het,
    yname="dep_var",
    idname="state",
    tname="year",
    gname="g",
    estimator="did2s",
)

etable([fit_twfe, fit_did2s])

| Coefficient   | est1             | est2             |
|:--------------|:-----------------|:-----------------|
| ATT           | 2.135*** (0.044) | 2.152*** (0.048) |
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
