# Causal Machine Learning Tutorial

This is a tutorial in using two different causal machine learning approaches: targeted maximum likelihood estimation (TMLE) and double/debiased machine learning (DML). These methods are both going to be used to estimate the treatment effect for a binary treatment.  
  
This is strongly based on a tutorial given in: https://migariane.github.io/TMLE.nb.html, which implemented TMLE in `R` and https://github.com/matthewvowels1/TargetedLearningTutorial which implemented TMLE on the same problem in `Python`. Here, both are to be implemented in `Python` and we will compare the performance of TMLE and DML.  
  
See the paper here: *Luque-Fernandez MA, Schomaker M, Rachet B, Schnitzer ME. Targeted maximum likelihood estimation for a binary treatment: A tutorial. Statistics in Medicine. 2018; 37: 2530–2546. https://doi.org/10.1002/sim.7628*

In [73]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

### Data Generating Function
  
Causal ML is about inferring parameters of some data generating function. We copy the generating function from the original paper

In [74]:
def sigmoid(x):
    return 1/(1 + np.exp(-x))

def logit(p):
    return np.log(p) - np.log(1 - p)

def generate_data(n):
    np.random.seed(1)
    w1 = np.random.binomial(1, 0.5, n)        
    w2 = np.random.binomial(1, 0.65, n)    
    w3 = np.round(np.random.uniform(0, 4, n), 3)
    w4 = np.round(np.random.uniform(0, 5, n), 3)
    p_A = sigmoid(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4 +0.15*w2*w4)
    
    A = np.random.binomial(1, p_A, n)
    
    p_y1 = sigmoid(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3+ 0.2*w4 + 0.15*w2*w4)
    p_y0 = sigmoid(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4 + 0.15*w2*w4)
    Y1 = np.random.binomial(1, p_y1, n)
    Y0 = np.random.binomial(1, p_y0, n)
    
    Y = Y1 * A + Y0*(1-A)
    
    cols = ['w1', 'w2', 'w3', 'w4', 'A','Y', 'Y1', 'Y0']
    df = pd.DataFrame([w1, w2, w3, w4, A, Y, Y1, Y0]).T
    df.columns = cols
    return df

In [75]:
df = generate_data(10000)

In [76]:
# add columns for the future interventional variables (we want to get the difference between everyone treated and everyone not treated)

df['A0'] = 0
df['A1'] = 1

In [77]:
true_psi = (df['Y1']-df['Y0']).mean() # true value of ATE 
print(f'True Psi = {true_psi}')

True Psi = 0.2026


In [78]:
print(df.head())

    w1   w2     w3     w4    A    Y   Y1   Y0  A0  A1
0  0.0  0.0  3.223  1.797  0.0  1.0  1.0  1.0   0   1
1  1.0  1.0  2.166  4.725  1.0  1.0  1.0  1.0   0   1
2  0.0  0.0  3.516  4.177  1.0  1.0  1.0  1.0   0   1
3  0.0  0.0  3.790  0.605  1.0  1.0  1.0  1.0   0   1
4  0.0  1.0  3.331  1.879  1.0  1.0  1.0  0.0   0   1


## TMLE  
  
Here are the steps implementing the TMLE method

In [79]:
X_cols = ['w1', 'w2', 'w3', 'w4', 'A']
y_cols = ['Y']

In [87]:
X = sm.add_constant(df[X_cols], prepend=True)

mod = sm.OLS(df[y_cols],  X) # tries a naive estimate of how the outcome is related to the covariates, using ordinary least squares
res = mod.fit()
res.summary()



0,1,2,3
Dep. Variable:,Y,R-squared:,0.121
Model:,OLS,Adj. R-squared:,0.121
Method:,Least Squares,F-statistic:,275.3
Date:,"Wed, 20 Nov 2024",Prob (F-statistic):,9.98e-277
Time:,11:14:48,Log-Likelihood:,-5591.5
No. Observations:,10000,AIC:,11200.0
Df Residuals:,9994,BIC:,11240.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3007,0.014,21.734,0.000,0.274,0.328
w1,-0.0248,0.008,-2.922,0.003,-0.041,-0.008
w2,0.1184,0.009,13.340,0.000,0.101,0.136
w3,0.0444,0.004,12.113,0.000,0.037,0.052
w4,0.0505,0.003,16.988,0.000,0.045,0.056
A,0.2034,0.009,21.966,0.000,0.185,0.222

0,1,2,3
Omnibus:,1600.679,Durbin-Watson:,2.034
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1287.363
Skew:,-0.786,Prob(JB):,2.84e-280
Kurtosis:,2.215,Cond. No.,13.2


In [88]:
biased_psi = res.params.A

In [89]:
print('Biased estimate of Psi (coefficient):', biased_psi)
print('Amount of naive bias:', np.abs(biased_psi - true_psi))

naive_relative_bias = ((biased_psi - true_psi) / true_psi)*100
print('Relative naive bias:', naive_relative_bias, '%')

Biased estimate of Psi (coefficient): 0.20336342193773688
Amount of naive bias: 0.0007634219377368745
Relative naive bias: 0.3768124075700269 %
