# What to cover

## Potential Outcomes Frameweork
## Ignorability/Exchengability
## Identifiability
## Positivity
## Uncounfoundness
## Consistency

## 1. Potential Outcomes Framework

When thinking about Causal Inference, our main goal is to evaluate two different states of the world, one where an intervention (our treatment happened) and another one where it didn't happen. By comparing these two states of the world, where everything is similar except that in one the treatment was applied and in the other it was not, we are able to understand the effect of the intervention.





In [117]:
import pandas as pd
import numpy as np

np.random.seed(1)

# Defining the number of observations
n_observations = 10000


# Defining the number of observations
person_id = list(range(n_observations))
group = np.random.binomial(1, p=0.3, size=n_observations)

# Creating DataFrame
data = pd.DataFrame(
    {
        'person_id': person_id,
        'group': group
    }
).sort_values('group')

# Creating Intervention
data['intervention'] = np.concatenate(
    (
        np.random.binomial(1, p=0.9, size=len(group[group==0])),
        np.random.binomial(1, p=0.1, size=len(group[group==1]))
    )
)
 

# Defining Outcome
data['outcome'] = (
    60  # Intercept
    - 20 * data.group # Effect of being in the group
    + 10 * data.intervention  # Effect of having intervention
    + np.random.normal(0, 5, n_observations) # Noise
)

data.sample(frac=1).head(10)

Unnamed: 0,person_id,group,intervention,outcome
6817,6817,0,1,69.759907
3445,3445,0,1,68.792428
1592,1592,0,1,72.983914
7408,7408,0,1,73.653991
9516,9516,0,0,61.110772
6714,6714,0,1,67.698347
6021,6021,0,1,73.372315
9418,9418,1,0,52.27439
9451,9451,1,0,43.14341
5260,5260,1,0,50.260817


In [103]:
np.mean(data[data.intervention == 1]['outcome']) - np.mean(data[data.intervention == 0]['outcome'])

5.3661729840215315

In [99]:
data['treated_outcome'] = np.where(
    data.intervention == 1, data.outcome, None
)

data['non_treated_outcome'] = np.where(
    data.intervention == 0, data.outcome, None
)

data.sample(frac=1).head(10)

Unnamed: 0,person_id,group,intervention,outcome,treated_outcome,non_treated_outcome
2349,2349,0,0,61.02711,,61.02711
7501,7501,0,0,62.304947,,62.304947
6879,6879,1,1,59.451299,59.451299,
5429,5429,1,1,51.93208,51.93208,
6284,6284,1,1,50.802477,50.802477,
4225,4225,0,0,58.939222,,58.939222
1688,1688,1,1,51.686074,51.686074,
5805,5805,0,0,52.64319,,52.64319
417,417,0,0,49.95756,,49.95756
6951,6951,0,0,61.943601,,61.943601


In [100]:
# Defining Outcome
data['treated_outcome'] = (
    60  # Intercept
    - 20 * data.group # Effect of being in the group
    + 10 * 1  # Effect of having intervention
    + np.random.normal(0, 5, n_observations) # Noise
)

data['non_treated_outcome'] = (
    60  # Intercept
    - 20 * data.group # Effect of being in the group
    + 10 * 0  # Effect of having intervention
    + np.random.normal(0, 5, n_observations) # Noise
)

data.sample(frac=1).head(10)

Unnamed: 0,person_id,group,intervention,outcome,treated_outcome,non_treated_outcome
9406,9406,0,1,76.500766,66.822935,60.439053
8535,8535,0,1,57.660051,69.658358,51.831571
5116,5116,1,1,52.422235,47.134493,35.366701
4073,4073,0,1,67.565761,71.277531,59.561752
3669,3669,0,0,56.659744,71.934533,57.828974
664,664,0,0,58.703103,65.165936,61.601707
4376,4376,0,1,77.768707,69.900374,56.627836
8671,8671,1,1,46.044704,45.279236,41.404327
4064,4064,0,1,83.408217,67.335576,59.713257
4939,4939,0,0,59.117568,73.678347,59.274609


In [101]:
# How many years more the treated group have compared to the non treated
np.mean(data.treated_outcome - data.non_treated_outcome)

9.937352487275337

In [118]:
# Running regression to evaluate the effect
import statsmodels.formula.api as smf

model = smf.ols(formula='outcome ~ intervention + group', data=data)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                outcome   R-squared:                       0.878
Model:                            OLS   Adj. R-squared:                  0.878
Method:                 Least Squares   F-statistic:                 2.389e+04
Date:                Mon, 08 May 2023   Prob (F-statistic):               0.00
Time:                        15:46:11   Log-Likelihood:                -30260.
No. Observations:               10000   AIC:                         6.053e+04
Df Residuals:                    9996   BIC:                         6.056e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept             59.7191      0