In [12]:
import pandas as pd
import warnings
warnings.filterwarnings(action='ignore',category=FutureWarning)
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
%matplotlib inline
plt.rcParams["figure.figsize"] = (15,10)
plt.rcParams["xtick.labelsize"] = 16
plt.rcParams["ytick.labelsize"] = 16
plt.rcParams["axes.labelsize"] = 20
plt.rcParams['legend.fontsize'] = 20

## Helpful Links

Here are two links that will give you some background on Manski Bounds:

- https://mason.gmu.edu/~atabarro/ManskiBoundsSlides1.pdf

- https://stats.stackexchange.com/questions/96248/treatment-effect-bounds

## Manski Bounds

This notebook will go through the steps required to calculate a Manski Bound.

The impact of treatments is not a straightforward procedure. Manski Bounds offer a way to get a potential interval for the Average Treatment Effect.

We need to following 5 pieces of information in order to calculate Manski Bounds:

- $P(Treated)$

- $P(Untreated) = 1 - P(Treated)$

- $E(Outcome)$

- $E(Outcome \ |\ Treated)$

- $E(Outcome \ |\ Untreated)$

In [23]:
data = pd.read_stata("Malnutrition.dta")
data.head()

Unnamed: 0,breakfast,malnutrition
0,1.0,1.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0
4,1.0,1.0


Above, we have two dummy variables. 1.0 in the breakfast column means that a child goes to a school with a breakfast program. 1.0 in the malnutrition column means that a child suffers from malnutrition.

We are going to assume perfect compliance in this example. The breakfast column is our treatment. Let's now calculate the information we need for the construction of a Manski bound.

$P(Treated) = \frac{500}{1000} = 0.5$

$P(Untreated) = \frac{500}{1000} = 0.5$

In [24]:
data['breakfast'].value_counts() # Counts for treated and Untreated.

0.0    500
1.0    500
Name: breakfast, dtype: int64

Recall, that $E(Outcome) = P(Outcome)$ if the outcome is a binary variable.

$E(Outcome) = \frac{500}{1000} = 0.5$

In [25]:
data['malnutrition'].value_counts() # Counts for Outcomes

0.0    500
1.0    500
Name: malnutrition, dtype: int64

$E(Outcome \ |\ Treated) = \frac{50}{500} = 0.1$

$E(Outcome \ |\ Untreated) = \frac{450}{500} = 0.9$

In [26]:
data[data['breakfast'] == 1.0]["malnutrition"].value_counts()

0.0    450
1.0     50
Name: malnutrition, dtype: int64

In [27]:
data[data['breakfast'] == 0.0]["malnutrition"].value_counts()

1.0    450
0.0     50
Name: malnutrition, dtype: int64

## Manski Calculation

Let's recall the notation:

- $P(Treated) = P(D=1)$

- $P(Untreated) = P(D=0)$

- $E(Outcome) = P(Y)$

- $E(Outcome \ |\ Treated) = (Y_{1i} \ |\ D=1) $

- $E(Outcome \ |\ Unreated) = (Y_{0i} \ |\ D=0) $


Furthermore, lets recall the equation for the ATE:

$\Delta ATE = E(Y_{1i}) - E(Y_{0i})$

$E(Y_{1i})$ - this is the impact on the treated

$E(Y_{0i})$ - this is the impact on the untreated

Now, let's get into the calculation required for $E(Y_{1i})$:

$E(Y_{1i}) = P(D=1) \cdot P(Y_{1i}\ |\ D = 1) + P(D=0) \cdot P(Y_{1i}\ |\ D = 0)$

Notice that the **counterfactual** expression is $P(Y_{1i}\ |\ D = 0)$ - this expression represents the expected outcome of those who were given treated, had they not actually been given treatment.

That is to say, this expresion is: What would have happened, with regards to the outcome, to those individuals in the treatment group, had they not been given treatment. This is a counterfactual as we will never observe this outcome in reality.

This is where the concept of the manski bound comes into play. Manski posits the following:

- Assume the worst possible outcome for this counterfactual. This represents the **lower bound** for $E(Y_{1i})$

- Assume the best possible outcome for this counterfactual. This represents the **upper bound** for $E(Y_{1i})$

Let's plug and chug:

$E(Y_{1i}) = (0.5)(0.1) + (0.5)P(Y_{1i}\ |\ D = 0)$

Now, we know that, $P(Y_{1i}\ |\ D = 0)$, is a probability, as such, it must be in $[0,1]$

- $E(Y_{1i})^{LB} = (0.5)(0.1) + (0.5)(0) = 0.05$

- $E(Y_{1i})^{UB} = (0.5)(0.1) + (0.5)(1) = 0.55$

Following identical logic, lets do the same for $E(Y_{0i})$:

- $E(Y_{0i})^{LB} = (0.5)(0.9) + (0.5)(0) = 0.45$

- $E(Y_{0i})^{UB} = (0.5)(0.9) + (0.5)(1) = 0.95$


Recall that the $ATE$ is a comparison between the treated and untreated groups. As such:

- $ATE^{UB} = E(Y_{1i})^{UB} - E(Y_{0i})^{LB} = (0.55) - (0.45) = 0.1$

- $ATE^{LB} = E(Y_{1i})^{LB} - E(Y_{0i})^{UB} = (0.05) - (0.95) = -0.9$


Manski Bound = $(ATE^{LB}, ATE^{UB}) = (-0.9,0.1)$

## Interpretation

From the above bound, we can say that the program could have a potential impact of reducing the probability of the outcome by 90% or increasing the probability of the outcome by 10% - or by any amount within that range.