## Difference-in-Differences Estimation

`PyFixest` supports eventy study designs via the canonical two-way fixed effects design Gardner's 2-stage estimator, and the local projections approach following [Dube et al (2023)](https://www.nber.org/papers/w31184).

In [41]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
from pyfixest.estimation import feols
from pyfixest.did.estimation import did2s
from pyfixest.did.estimation import lpdid

url = "https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/did/data/df_het.csv"
df_het = pd.read_csv(url)
df_het.head()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Unnamed: 0,unit,state,group,unit_fe,g,year,year_fe,treat,rel_year,rel_year_binned,error,te,te_dynamic,dep_var
0,1,33,Group 2,7.043016,2010,1990,0.066159,False,-20.0,-6,-0.086466,0,0.0,7.022709
1,1,33,Group 2,7.043016,2010,1991,-0.03098,False,-19.0,-6,0.766593,0,0.0,7.778628
2,1,33,Group 2,7.043016,2010,1992,-0.119607,False,-18.0,-6,1.512968,0,0.0,8.436377
3,1,33,Group 2,7.043016,2010,1993,0.126321,False,-17.0,-6,0.02187,0,0.0,7.191207
4,1,33,Group 2,7.043016,2010,1994,-0.106921,False,-16.0,-6,-0.017603,0,0.0,6.918492


### DiD Estimation via `feols()`, `did2s()` and `lpdid()`

We can estimate a simple two-way fixed effects DiD regression via `feols()`:

In [42]:
fit_twfe = feols(
    "dep_var ~ i(rel_year) | state + year",
    df_het,
    i_ref1=[-1.0, np.inf],
    vcov={"CRV1": "state"},
)

To do the same via Gardners 2-stage estimator, we employ the the `did2s()` function: 

In [44]:
from pyfixest.did.estimation import did2s

fit_did2s = did2s(
    df_het,
    yname="dep_var",
    first_stage="~ 0 | state + year",
    second_stage="~i(rel_year)",
    treatment="treat",
    cluster="state",
    i_ref1=[-1.0, np.inf],
)

Last, we can estimate the ATT for each time period via local projections by using the `lpdid()` function: 

In [45]:
from pyfixest.did.estimation import lpdid

fit_lpdid = lpdid(
    data=df_het,
    yname="dep_var",
    gname="g",
    tname="year",
    idname="unit",
    vcov={"CRV1": "state"},
    pre_window=-20,
    post_window=20,
    att=False,
)

Let's look at some results: 

In [46]:
figsize = [1200, 400]

In [47]:
fit_twfe.iplot(
    coord_flip=False,
    title="TWFE-Estimator",
    figsize=figsize,
    xintercept=18.5,
    yintercept=0,
    figsize=[1200, 400],
).show()

In [48]:
fit_did2s.iplot(
    coord_flip=False,
    title="DID2s-Estimator",
    figsize=figsize,
    xintercept=18.5,
    yintercept=0,
    figsize=[1200, 400],
).show()

In [49]:
fit_lpdid.iplot(
    coord_flip=False,
    title="Local-Projections-Estimator",
    figsize=figsize,
    yintercept=0,
    xintercept=18.5,
    figsize=[1200, 400],
).show()

What if we are not interested in the ATT per treatment period, but in a pooled effects? 

In [50]:
fit_twfe = feols(
    "dep_var ~ i(treat) | unit + year",
    df_het,
    vcov={"CRV1": "state"},
)

fit_did2s = did2s(
    df_het,
    yname="dep_var",
    first_stage="~ 0 | unit + year",
    second_stage="~i(treat)",
    treatment="treat",
    cluster="state",
)

fit_lpdid = lpdid(
    data=df_het,
    yname="dep_var",
    gname="g",
    tname="year",
    idname="unit",
    vcov={"CRV1": "state"},
    pre_window=-20,
    post_window=20,
    att=True,
)

In [51]:
fit_twfe.tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
C(treat)[T.True],1.98254,0.019331,102.55618,0.0,1.943439,2.021642


In [52]:
fit_did2s.tidy()

Unnamed: 0_level_0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
C(treat)[T.True],2.230482,0.024709,90.271437,0.0,2.182052,2.278911


In [53]:
fit_lpdid.tidy()

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|),2.5 %,97.5 %,N
treat_diff,2.506746,0.071357,35.129648,0.0,2.362413,2.65108,5716.0
