<div >
<img src = "../../../banner.jpg" />
</div>

<a target="_blank" href="https://colab.research.google.com/github/ignaciomsarmiento/BDML_SS/blob/main/Lecture05/Notebook_SS05/Notebook_SS05_FWL.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<style type="text/css">

.responsive {
 width: 100%;
 height: 25%;
}

.list-group-item.active, .list-group-item.active:focus, .list-group-item.active:hover {
    z-index: 2;
    color: #fff;
    background-color: #1B175E;
    border-color: #337ab7;
}
h1, h2, h3, h4 {
    color: #000002;
    background-color: #1B175E;
    background-image:
      linear-gradient(to right,
       #fff, #ffff00
     );

}

h1, h2, h3, h4, p {
    color: #000002;
}

a {
    color: #1B175E;
}
</style>

# Introduction: The FWL Theorem



The FWL Theorem shows how to decompose a regression of $y$ on a set of variables $X$ into two pieces. 

To illustrate, suppose that we want to study the black-white wage gap: 

\begin{align}
 ln(w) & = \beta_0 + \beta_1 Black +  \theta X_{controls}  + u 
\end{align}

Where $Black$ is a dummy that takes 1 if the person is African American, and 0 otherwise. $X_{controls}$ are a set of controls in the regression.

FWL says that the above long regression will give the same $\beta_1$ than the following three step procedure:

1) Regress $Black$ on $X_{controls}$ and take the residuals
2) Regress $ ln(w)$ on $X_{controls} $ and take the residuals
3) Regress the residuals from step 2 on the residuals from step 1

This is, in essence, the FWL Theorem. Let's take it to the NLSY data and show that it works.

Let's load the packages, the data set, and keep only a few variables:

In [1]:
#packages
require("pacman")
p_load("tidyverse","stargazer")

nlsy <- read_csv('https://raw.githubusercontent.com/ignaciomsarmiento/datasets/main/nlsy97.csv')

nlsy = nlsy  %>%   drop_na(educ) #dropea los valores faltantes (NA)

#Select some predictors
nlsy<- nlsy  %>% select(lnw_2016, 
                        educ,
                        black,
                        hispanic,
                        other,
                        exp,
                        afqt,
                        mom_educ,
                        dad_educ)

## Long regression

In [2]:
long<-lm(lnw_2016~black+ hispanic+ other+ educ + exp + afqt + mom_educ + dad_educ,data=nlsy)

stargazer(long,type="text")

## Short Regression

1) Regress $Black$ on $X_{controls}$ and take the residuals

In [3]:
reg_step1<-lm(black~ hispanic+ other+ educ + exp + afqt + mom_educ + dad_educ,data=nlsy)

In [4]:
nlsy<-nlsy %>% mutate(black_resid=reg_step1$residuals) #Residuals of black~controls

2) Regress $ ln(w)$ on $X_{controls} $ and take the residuals


In [5]:
reg_step2<-lm(lnw_2016~ hispanic+ other+ educ + exp + afqt + mom_educ + dad_educ,data=nlsy)

In [6]:
nlsy<-nlsy %>% mutate(lnw_resid=reg_step2$residuals) #Residuals of black~controls

3) Regress the residuals from step 2 on the residuals from step 1

In [7]:
reg_step3<-lm(lnw_resid~black_resid,data=nlsy)

In [8]:
stargazer(reg_step3,long,type="text")

# Function that does these steps?




In [9]:
black_wage_gap<-function(data,index){
    reg_step1<-lm(black~ hispanic+ other+ educ + exp + afqt + mom_educ + dad_educ,data=data,subset=index)
    reg_step2<-lm(lnw_2016~ hispanic+ other+ educ + exp + afqt + mom_educ + dad_educ,data=data,subset=index)

    nlsy<-nlsy %>% mutate(black_resid=reg_step1$residuals) #Residuals of black~controls
    nlsy<-nlsy %>% mutate(lnw_resid=reg_step2$residuals) #Residuals of black~controls
    reg_step3<-lm(lnw_resid~black_resid,data=nlsy)
    reg_step3$coefficients[2]
}

Test the function:

In [10]:
black_wage_gap(nlsy,1:nrow(nlsy))