<a href="https://colab.research.google.com/github/ireneb612/impact_EUTurkey_deal/blob/main/twfe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Two way fixed effects

In [1]:
#!pip install pandas==1.3.5


Collecting pandas==1.3.5
  Downloading pandas-1.3.5.tar.gz (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.7/4.7 MB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: pandas
  Building wheel for pandas (pyproject.toml) ... [?25l[?25hcanceled
[31mERROR: Operation cancelled by user[0m[31m
[0m

In [2]:
!pip install linearmodels
!pip install regtabletotext

Collecting linearmodels
  Downloading linearmodels-6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.9 kB)
Collecting mypy-extensions>=0.4 (from linearmodels)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Collecting pyhdfe>=0.1 (from linearmodels)
  Downloading pyhdfe-0.2.0-py3-none-any.whl.metadata (4.0 kB)
Collecting formulaic>=1.0.0 (from linearmodels)
  Downloading formulaic-1.1.1-py3-none-any.whl.metadata (6.9 kB)
Collecting setuptools-scm<9.0.0,>=8.0.0 (from setuptools-scm[toml]<9.0.0,>=8.0.0->linearmodels)
  Downloading setuptools_scm-8.1.0-py3-none-any.whl.metadata (6.6 kB)
Collecting interface-meta>=1.2.0 (from formulaic>=1.0.0->linearmodels)
  Downloading interface_meta-1.3.0-py3-none-any.whl.metadata (6.7 kB)
Downloading linearmodels-6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
import pandas as pd
import numpy as np
import sqlite3
import datetime as dt
import itertools
import linearmodels as lm

from regtabletotext import prettify_result, prettify_result

In [5]:
df = pd.read_csv('/content/drive/MyDrive/migration/regression_FINAL_complete_DF.csv')
df['date'] = pd.to_datetime(df['date'])

In [54]:
model_ols = (lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + 1",
    data=df.set_index(["Route", "date"]),
  ).fit()
)
prettify_result(model_ols)

Panel OLS Model:
Total_Deaths ~ flow + ATT + 1

Covariance Type: Unadjusted

Coefficients:
           Estimate  Std. Error  t-Statistic  p-Value
Intercept    62.631      16.743        3.741    0.000
flow          0.001       0.000        1.464    0.146
ATT           1.481       0.218        6.778    0.000

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.279, Within R-squared: 0.072



As expected, the regression output shows significant coefficients. However, the simple model actually may have a lot of omitted variables, so our coefficients are most likely biased. As there is a lot of unexplained variation in our simple model (indicated by the rather low adjusted R-squared).

One way to tackle the issue of omitted variable bias is to get rid of as much unexplained variation as possible by including fixed effects - i.e., model parameters that are fixed for specific Routes.

In [55]:
model_ols = (lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + EntityEffects",
    data=df.set_index(["Route", "date"]),
  ).fit()
)
prettify_result(model_ols)

Panel OLS Model:
Total_Deaths ~ flow + ATT + EntityEffects

Covariance Type: Unadjusted

Coefficients:
      Estimate  Std. Error  t-Statistic  p-Value
flow     0.001       0.001        1.575    0.118
ATT      0.825       0.237        3.478    0.001

Included Fixed Effects:
        Total
Entity      4

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.420, Within R-squared: 0.123



The regression output shows some of unexplained variation at the route level that is taken care of by including the route fixed effect as the R-squared rises by 0.2.

By including year fixed effects, we can take out the effect of unobservables that vary over time. The two-way fixed effects regression.

In [56]:
model_ols = (lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects",
    data=df.set_index(["Route", "date"]),
  ).fit()
)
prettify_result(model_ols)

Panel OLS Model:
Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects

Covariance Type: Unadjusted

Coefficients:
      Estimate  Std. Error  t-Statistic  p-Value
flow     0.001       0.001        1.643    0.104
ATT      0.794       0.270        2.946    0.004

Included Fixed Effects:
        Total
Entity      4
Time       37

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.568, Within R-squared: 0.122



##Clustering Standard Errors

Apart from biased estimators, we usually have to deal with potentially complex dependencies of our residuals with each other. Such dependencies in the residuals invalidate the i.i.d. assumption of OLS and lead to biased standard errors. With biased OLS standard errors, we cannot reliably interpret the statistical significance of our estimated coefficients.

In [57]:
# the code chunk below applies one-way clustering by Route

model_ols = lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects",
    data=df.set_index(["Route", "date"]),
  ).fit(cov_type="clustered", cluster_entity=True, cluster_time=False)

prettify_result(model_ols)

Panel OLS Model:
Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects

Covariance Type: Clustered

Coefficients:
      Estimate  Std. Error  t-Statistic  p-Value
flow     0.001       0.000        2.398    0.018
ATT      0.794       0.054       14.663    0.000

Included Fixed Effects:
        Total
Entity      4
Time       37

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.568, Within R-squared: 0.122



In [58]:
# two-way fe

# the code chunk below applies one-way clustering by Route and clustering at time level

model_ols = lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects",
    data=df.set_index(["Route", "date"]),
  ).fit(cov_type="clustered", cluster_entity=True, cluster_time=True)

prettify_result(model_ols)

Panel OLS Model:
Total_Deaths ~ flow + ATT + EntityEffects + TimeEffects

Covariance Type: Clustered

Coefficients:
      Estimate  Std. Error  t-Statistic  p-Value
flow     0.001       0.000        2.096    0.039
ATT      0.794       0.299        2.654    0.009

Included Fixed Effects:
        Total
Entity      4
Time       37

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.568, Within R-squared: 0.122



# TWFE with conflicts

There are some simmgs coming from the Weastern AF

In [103]:
#conflicts

Unnamed: 0,date,Route,fatalities
0,2014-02-01,central,57.0
1,2014-02-01,western_m,0.0
2,2014-02-01,eastern,0.0
3,2014-03-01,central,80.0
4,2014-03-01,western_m,0.0
...,...,...,...
106,2017-01-01,western_m,0.0
107,2017-01-01,eastern,113.0
108,2017-02-01,central,137.0
109,2017-02-01,western_m,0.0


In [112]:
#df

Unnamed: 0,Route,date,flow,Total_Deaths,after,deaths_over_cross,ATT_df1,ATT,fatalities
0,central,2014-02-01,3335,9.0,0.0,0.002691,1.069824,-9.756466,57.0
1,central,2014-03-01,5550,1.0,0.0,0.000180,-4.842315,-11.337913,80.0
2,central,2014-04-01,15679,41.0,0.0,0.002608,-36.632737,-44.958592,23.0
3,central,2014-05-01,14597,299.0,0.0,0.020073,13.452974,25.596141,159.0
4,central,2014-06-01,22778,314.0,0.0,0.013598,12.412410,18.671266,84.0
...,...,...,...,...,...,...,...,...,...
139,eastern,2016-10-01,4195,2.0,1.0,0.000477,-58.673449,-61.306353,349.0
140,eastern,2016-11-01,2680,14.0,1.0,0.005197,-65.243473,-68.134970,197.0
141,eastern,2016-12-01,2131,5.0,1.0,0.002341,-65.572272,-68.490003,192.0
142,eastern,2017-01-01,1826,1.0,1.0,0.000547,-69.903317,-72.939835,113.0


In [113]:
# the code chunk below applies one-way clustering by Route

model_ols_ent_wa = lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + fatalities + EntityEffects + TimeEffects",
    data=df.set_index(["Route", "date"]),
  ).fit(cov_type="clustered", cluster_entity=True, cluster_time=False)

prettify_result(model_ols_ent_wa)

Panel OLS Model:
Total_Deaths ~ flow + ATT + fatalities + EntityEffects + TimeEffects

Covariance Type: Clustered

Coefficients:
            Estimate  Std. Error  t-Statistic  p-Value
flow           0.001       0.000        3.003    0.003
ATT            0.877       0.042       21.161    0.000
fatalities     0.315       0.099        3.196    0.002

Included Fixed Effects:
        Total
Entity      4
Time       37

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.589, Within R-squared: 0.161



In [114]:
# the code chunk below applies one-way clustering by Route

model_ols_ent_wa = lm.PanelOLS.from_formula(
    formula="Total_Deaths ~ flow + ATT + fatalities + EntityEffects + TimeEffects",
    data=df.set_index(["Route", "date"]),
  ).fit(cov_type="clustered", cluster_entity=True, cluster_time=True)

prettify_result(model_ols_ent_wa)

Panel OLS Model:
Total_Deaths ~ flow + ATT + fatalities + EntityEffects + TimeEffects

Covariance Type: Clustered

Coefficients:
            Estimate  Std. Error  t-Statistic  p-Value
flow           0.001       0.000        2.543    0.012
ATT            0.877       0.298        2.948    0.004
fatalities     0.315       0.137        2.295    0.024

Included Fixed Effects:
        Total
Entity      4
Time       37

Summary statistics:
- Number of observations: 144
- R-squared (incl. FE): 0.589, Within R-squared: 0.161

