# A Model with Heterogeneous Effects
Up until this point, we've considered models with effects that don't vary across individuals, i.e, where the structural  model is $Y = \gamma_0 + \gamma_1X_1 + \epsilon$. Now let's make a more realistic assumption where effects are also random and have their own distribution. Specifically, let $\gamma_1$ now be a random variable instead of a parameter. One way to write this is to explicitly put an $i$ subscript on everything that varies across individuals.
$$
Y_i = \gamma_0 + \gamma_{1,i}X_{1,i} + \epsilon_i,
$$
where now we explicitly write the random variables with $i$ subscripts to emphasize that the slope coefficient $\gamma_{1,i}$ is now a random variable. Notice that we can rewrite this model as
$$
Y_i = \gamma_0 + (\gamma_{1,i} + E[\gamma_{1,i}] - E[\gamma_{1,i}])X_{1,i} + \epsilon_i 
$$
$$
= \gamma_0 + E[\gamma_{1,i}]X_{1,i} + (\gamma_{1,i} - E[\gamma_{1,i}])X_{1,i} + \epsilon_i.
$$
If we assume that $\gamma_{1,i}\perp X_{1,i}$ (gains are independent of choice of $X_{1,i}$) and $Cov[X_{1,i},\epsilon_i]=0$ (no selection bias), then the slope coefficient from a regression of $Y$ on $X_1$ will identify $E[\gamma_{1,i}]$. Why?
$$
\beta_1 = E[\gamma_{1,i}] + \frac{Cov[X_{1,i}, (\gamma_{1,i} - E[\gamma_{1,i}])X_{1,i} + \epsilon_i]}{Var[X_{1,i}]}
$$
where 
$$
Cov[X_{1,i}, (\gamma_{1,i} - E[\gamma_{1,i}])X_{1,i}] = E[X_{1,i}(\gamma_{1,i} - E[\gamma_{1,i}])X_{1,i}] - E[X_{1,i}]E[(\gamma_{1,i} - E[\gamma_{1,i}])X_{1,i}]
$$
$$
= E[X_{1,i}^2\gamma_{1,i}] - E[X_{1,i}^2]E[\gamma_{1,i}] - E[X_{1,i}]E[X_{1,i}\gamma_{1,i}] + E[X_{1,i}]^2E[\gamma_{1,i}]
$$
where since $\gamma_{1,i}\perp X_{1,i}$ then this term equals 0.

What we have shown is that under some assumption about the dependence between $\gamma_{1,i}$ and $X_{1,i}$ (independence would imply $E[\gamma_{1,i}X_{1,i}]=E[\gamma_{1,i}]E[X_{1,i}]$ and $E[\gamma_{1,i}X_{1,i}^2]=E[\gamma_{1,i}]E[X_{1,i}^2]$), that regression recovers the mean of the slope parameters of our underlying causal model. 

This result is very important. Often we think that we are trying to identify a single underlying structural parameter. For instance, consider if we want to understand the effect of a new anti-cancer drug on tumor size. If may be, due to differences in biological makeup, that the drug can actually help some people, do nothing or even hurt others.

But what happens when we don't assume anything about the dependence between $\gamma_{1,i}$ and $X_{1,i}$? We recover some sort of average effect, which disproportionately weights those individuals with high returns. One extreme and intuitive example is the case where $X_1$ is binary.

## Example: Binary Treatment Effect
Suppose $X_1$ is binary and maintain only the assumption that $Cov[X_1,\epsilon]=0$. Then the regression of $Y$ on $X_1$ has a slope coefficient
$$
\beta_1 = \frac{Cov[Y,X_1]}{Var[X_1]} = \frac{Cov[\gamma_1X_1,X_1]}{Var[X_1]},
$$
where since $X_1$ is binary this is equal to
$$
= E[\gamma_1X_1|X_1=1] - E[\gamma_1X_1|X_0=0] = E[\gamma_1|X_1=1].
$$
Thus, we estimate the average treatment effect for those selected into treatment. Is this effect policy relevant? It depends on our policy. Suppose our population of interest were residents of Illinois and we were interested in estimating the effect of a new drug for Alzheimer's. We have data on the drug from a test run in Florida. Suppose the populations of people that would choose to take the drug at current prices are identical in Illinois and Florida. Then this effect, estimated on Florida would be relevant for the impact of the drug in Illinois. 

However, suppose the drug was available nationally and we want to know the effect of a policy that lowers the price of the drug nationally. Then the population of people who take the drug now (after lowering the price) will on average have a lower willingness-to-pay, which may mean that their $\gamma_{1,i}$ are on average, lower. This would mean that $\beta_1$ would overestimate the positive effect of the drug with the new, lower price.

## Two Sources of Bias
We have already covered the types of omitted variable bias where $Cov[X_{1,i},\epsilon_i]\neq 0$. This is often referred to as "selection bias." Another contributor to omitted variable bias is called "sorting on gains." Sorting on gains occurs when $\gamma_{1,i}$ and $X_{1,i}$ are correlated.

In the context of the returns to schooling, selection bias would refer to the idea that individuals with higher ability might find schooling less costly (in terms of effort) and therefore would obtain more years of schooling. But since ability also positively effects earnings, we cannot differentiate the effect of higher schooling levels from higher ability.

Even if we suppose that this mechanism of selection is not present in the data, because of heterogeneous returns we have another source of omitted variable bias: sorting on gains. Now suppose that individuals with greater ability can get more out of schooling (e.g., you need to have a basic competency in math in order succeed in college level business courses which land you the high paying jobs). Then high ability people will get more schooling and they will subsequently earn more. Thus, $\beta_1$ will disproportionately reflect the returns of the high return (high ability) individuals.

# Using Observed Variables to Describe Heterogeneity
You've already had the chance to look at heterogeneous effects, but the model probability didn't look as nearly as complicated as above. Recall the problem set on the heterogeneous returns to schooling. We estimated a regression of $Y$ on schooling $X_1$ and an indicator for being black $Z_1$ (don't be confused by the notation; $Z_1$ is not supposed to denote an instrument in this example), and interacted this variable with $X_1$. If we simply write $\gamma_{1,i} = \alpha_0 + \alpha_1Z_{1,i} + u_i$, then our model above becomes
$$
Y_i = \gamma_0 + (\alpha_0 + \alpha_1Z_{1,i} + u_i)X_{1,i} + \epsilon_i
$$
$$
= \gamma_0 + \alpha_0X_{1,i} + \alpha_1Z_{1,i}X_{1,i} + u_iX_{1,i} + \epsilon_i.
$$
If we assume $u_iX_{1,i},\epsilon_i\perp X_{1,i},Z_{1,i}X_{1,i}$, then we can identify $\alpha_0$ and $\alpha_1$ by regression. The partial effect of $X_1$ on $Y$ is
$$
\frac{\partial Y}{\partial X_1} = \alpha_0 + \alpha_1Z_1 + u,
$$
and if we further assume that $E[u]=0$, then
$$
E\left[\frac{\partial Y}{\partial X_1}\right] = \alpha_0 + \alpha_1E[Z_1].
$$
That is, regression will allow us to identify the "average marginal effect" of $X_1$ on $Y$. If $Z_1$ were binary, as in your problem set on the heterogeneous effect of the returns to schooling, then $\alpha_0 + \alpha_1$ would be the average partial effect for those with $Z_1=1$ and $\alpha_0$ the average partial effect for those with $Z_1=0$.

# IV with Heterogeneous Effects: The LATE Estimand
Now consider the setup where $X_1=D$ is binary and we have a binary instrument for $D$ called $Z$. We are interested in using $Z$ to estimate an average causal effect of $D$ on $Y$ using the IV estimand, and we will derive here over what population we average over. Let $Y_d$ be the counterfactural outcome when treatment is $D=d$ and $D_z$ be the counterfactural treatment when the instrument is $Z=z$. Thus, $Y=DY_1 + (1-D)Y_0$ and $D=ZD_1 + (1-Z)D_0$. We make the following LATE assumptions:
1. Exogeneity: $Y_1,Y_0,D_1,D_0\perp Z$
2. Relevance: $P(D_1\neq D_0)>0$. This just means that the instrument influences some decisions
3. Uniformity: $P(D_1\geq D_0)=1$, the instrument influences everyone in the same direction (in this case, toward treatment). This rules out the case where $D_1=0$ but $D_0=1$. 

Let us define the never takers as $N=(D_0=0,D_1=0)$, the compliers as $C=(D_1=1,D_0=0)$, the always takers as $A=(D_1=1,D_0=1)$ and the defiers as $DEF = (D_0=1,D_1=0)$. The monotonicity condition means $P(DEF)=0$.

Now consider the IV estimand:
$$
\beta_1^{IV} = \frac{\frac{Cov[Y,Z]}{Var[Z]}}{\frac{Cov[D,Z]}{Var[Z]}}
$$
$$
= \frac{E[Y|Z=1] - E[Y|Z=0]}{E[D|Z=1] - E[D|Z=0]}.
$$
Consider the numerator. By the Law of Iterated Expectations, 
$$
E[Y|Z=1] = E[Y|Z=1,A]P(A) + E[Y|Z=1,N]P(N) + E[Y|Z=1,C]P(C)
$$
$$
=E[Y_1|A]P(A) + E[Y_0|N]P(N) + E[Y_1|C]P(C)
$$
and similarly,
$$
E[Y|Z=0] = E[Y_1|A]P(A) + E[Y_0|N]P(N) + E[Y_0|C]P(C)
$$
so that
$$
E[Y|Z=1]-E[Y|Z=0] = E[Y_1-Y_0|C]P(C).
$$

Now consider the denominator.
$$
E[D|Z=1] - E[D|Z=0] = P(D=1|Z=1) - P(D=1|Z=0) = P(A) + P(C) - P(A) = P(C).
$$
Thus, the IV estimand obtains
$$
\beta_1^{IV} = E[Y_1 - Y_0|C].
$$
That is, we compute the average treatment effect for the compliers. That is, for those people whose treatment decision was influenced by the instrument $Z$. Hence, we call this the Local Average Treatment Effect, because we only obtain an average treatment effect for a specific population. If our model is such that $Y=\gamma_0 + \gamma_1D + \epsilon$, where $\gamma_1$ is a random variable, then $Y_1-Y_0=\gamma_1$ and $\beta_1^{IV}=E[\gamma_1|C]$. Some important takeways from this analysis include the following:
1. The population over which we take the average treatment effect depends on the instrument. If we use a different instrument the population of complies will change.
2. This analysis provides intuition for the non-binary case as well. We can think of the IV estimand as an average causal effect over the population whose choice of "treatment" is influenced by the instrument.