## Causal Inference and Machine Learning

You will revisit the exercise in Week 6 about estimating the effect of the mindset intervention (Athey and Wager, 2019). Download the synthetic data `synthetic_mindset_data.csv` from our Week 6 module. More data descriptions are available in our computer lab exercise questions.

In this assignment, you are asked to implement the machine learning estimation of the average treatment effect over all schools using various machine learning estimators.

In [13]:
%matplotlib inline

import numpy as np
import pandas as pd

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier, GradientBoostingRegressor, AdaBoostClassifier
from tqdm import tqdm

import tensorflow
import random as python_random
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.regularizers import l2

from tensorflow.keras import layers, optimizers, regularizers
from keras_tuner.tuners import RandomSearch
from sklearn.model_selection import train_test_split

In [2]:
# Import and clean the data as in the computer lab

data_original = pd.read_csv('synthetic_mindset_data.csv').rename(columns={'Z':'D'})
n = data_original.shape[0]
data = pd.get_dummies(data_original,columns=['C1','XC'])
X = data.iloc[:,3:]
Y = data.Y
D = data.D
schoolid = data.schoolid-1 # then the schoolid starts from 0
K = 76 # Number of schools 
delta_school_dml=np.zeros((K,)) # Initialize the school-specific estimates

# Initialize the dict objects to store estimates.

delta_debias={}
delta_raw={}

### Exercise 1

Estimate the average treatment effects by using the random forest (of 200 trees) algorithm including a variable selection as proposed by Athey and Wager (2019). Note that you can extract the feature importance directly from sklearn. You should always use the debiased ML estimator across clusters, as in the Week 6 computer lab.

In [4]:
# Initialize models

n_estimators = 500 # Number of trees to estimate 

rf_mu = RandomForestRegressor(
    n_estimators=n_estimators,     
    random_state=42,
)

rf_p = RandomForestClassifier(
    n_estimators=n_estimators,     
    random_state=42,
)

# Estimate the treatment effect for each school

for k in tqdm(range(K)):
    
    # Fit pilot random forests
    
    selector = (schoolid == k)
    rf_mu.fit(X[~selector], Y[~selector])
    rf_p.fit(X[~selector], D[~selector])
    
    # Select the subset of variables with importance higher than their average
    # Repeat this procedure for each school
    
    importances_mu = rf_mu.feature_importances_
    importances_p = rf_p.feature_importances_
    selected_features_mu = X.columns[importances_mu > np.mean(importances_mu)]
    selected_features_p = X.columns[importances_p > np.mean(importances_p)]
    
    # Fit random forests again, using only the selected variables
    
    rf_mu.fit(X.loc[~selector, selected_features_mu], Y[~selector])
    rf_p.fit(X.loc[~selector, selected_features_p], D[~selector])
    Y_res = Y[selector].values-rf_mu.predict(X.loc[selector, selected_features_mu])
    v = D[selector].values-rf_p.predict_proba(X.loc[selector, selected_features_p])[:,1]
    
    # Store the estimated treatment effect for each school
    
    delta_school_dml[k] = np.mean(Y_res*v)/np.mean(v**2) # Debiased estimator


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [01:50<00:00,  1.46s/it]


In [5]:
# Save your estimate here

delta_debias['RF'] = np.mean(delta_school_dml)
print(delta_debias['RF'])

0.24199960349182964


### Exercise 2

Estimate the average treatment effects by using:

- Least-squares boosting for estimation of $\mu^{(0)}(x) = E[Y^{(0)}_i|X_i = x]$, using a learning rate of no more than 0.1.
- AdaBoost for estiamtion of $p(x) = P(D_i = 1|X_i = x)$

You don't need to consider variable selection. Keep the number of bases to no more than 500 for each model.

You should always use the debiased ML estimator across clusters, as in the Week 6 computer lab.

In [6]:
# Initialize the models

n_estimators = 500 

gb_mu = GradientBoostingRegressor(
    n_estimators=n_estimators, 
    learning_rate=0.1, 
    random_state=42
)

ada_p = AdaBoostClassifier(
    n_estimators=n_estimators, 
    random_state=42
)

# Estimate the treatment effect for each school

for k in tqdm(range(K)):
    
    # Selector for current school
    
    selector = (schoolid == k)

    # Fit models
    
    gb_mu.fit(X[~selector], Y[~selector])
    ada_p.fit(X[~selector], D[~selector])

    # Predict outcomes and treatment probabilities for each school
    
    Y_res = Y[selector].values-gb_mu.predict(X[selector])
    v = D[selector].values-ada_p.predict_proba(X[selector])[:,1]
    
    # Store the estimated treatment effect for each school
    
    delta_school_dml[k] = np.mean(Y_res*v)/np.mean(D[selector].values*v) # Debiased estimator

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [00:08<00:00,  9.22it/s]


In [7]:
# Estimates

delta_debias['Boost']=np.mean(delta_school_dml)
print(delta_debias['Boost'])

0.2124460216317207


### Exercise 3

Estimate the average treatment effects by estimating $\mu^{(0)}(x) = E[Y^{(0)}_i|X_i = x]$ and $p(x) = P(D_i = 1|X_i = x)$ with two separate neural networks.

Use no more than 6 neurons in each hidden layer, no more than 3 hidden layers, and a weight decay penalty parameter no larger than 0.1. Use ReLU for deep neural nets but sigmoid for shallow nets.

You should always use the debiased ML estimator across clusters, as in the Week 6 computer lab.

We perform a Grid Search to find the optimal number of hidden layers and units for each layer.

In [8]:
# Split the data into training and validation sets

X_train, X_val, Y_train, Y_val, D_train, D_val = train_test_split(X, Y, D, test_size=0.2, random_state=42)

def build_model(hp):
    
    model = Sequential()

    # Choose the number of hidden layers (1, 2, or 3)
    
    num_hidden_layers = hp.Int('num_hidden_layers', 1, 3)

    for i in range(num_hidden_layers):
        
        # Choose the number of neurons (1 to 6)
        
        units = hp.Int(f'units_layer_{i}', min_value=1, max_value=6)

        # Add hidden layers with ReLU activation and L2 regularization
        
        model.add(layers.Dense(units=units, activation='relu',
                               kernel_regularizer=regularizers.l2(0.1)))

    # Output layer with linear activation
    
    model.add(layers.Dense(1, activation='linear'))

    # Compile the model
    
    model.compile(
        optimizer=optimizers.Adam(learning_rate=hp.Choice('learning_rate', values=[1e-3, 1e-4])),
        loss='mean_absolute_error',
        metrics=['mean_absolute_error']
    )

    return model

# Set up the RandomSearch tuner

tuner = RandomSearch(
    build_model,
    objective='val_mean_absolute_error',
    max_trials=10,  
    executions_per_trial=3,  
    directory='my_dir',
    project_name='hidden_layer_tuning'
)

# Perform the search

tuner.search(
    X_train, Y_train,
    epochs=20,  
    batch_size=32,  
    validation_data=(X_val, Y_val),
    verbose=0
)

# Get the best hyperparameters

best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparameters")
print(f"Number of hidden layers: {best_hps.get('num_hidden_layers')}")
for i in range(best_hps.get('num_hidden_layers')):
    print(f"Units in layer {i}: {best_hps.get(f'units_layer_{i}')}")
print(f"Learning rate: {best_hps.get('learning_rate')}")

# Build the model with the best hyperparameters

best_model = tuner.hypermodel.build(best_hps)

# Train the best model

history = best_model.fit(
    X_train, Y_train,
    epochs=50,  
    batch_size=32,
    validation_data=(X_val, Y_val),
    verbose=0
)

# Evaluate the best model on validation data

val_loss, val_mae = best_model.evaluate(X_val, Y_val, verbose=0)
print(f"Validation Loss: {val_loss}, Validation MAE: {val_mae}")

Best hyperparameters
Number of hidden layers: 1
Units in layer 0: 6
Learning rate: 0.001
Validation Loss: 0.4469817280769348, Validation MAE: 0.44183072447776794


In [9]:
# Initialize the models

def create_model(input_dim, output_activation, loss, weight_decay=0.1, units_layer_0=4):
    model = Sequential()
    model.add(Dense(units=units_layer_0, activation='relu', input_dim=input_dim, kernel_regularizer=l2(weight_decay)))
    model.add(Dense(units=1, activation=output_activation, kernel_regularizer=l2(weight_decay)))
    model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])
    return model

# Models for mu0 and p

model_mu = create_model(input_dim=X.shape[1], output_activation='linear', loss='mean_squared_error', weight_decay=0.1, units_layer_0=4)
model_p = create_model(input_dim=X.shape[1], output_activation='sigmoid', loss='binary_crossentropy', weight_decay=0.1, units_layer_0=4)

delta_school_dml = np.zeros(K)

# Estimate the treatment effect for each school

for k in tqdm(range(K)):
    
    # Selector for the current school
    
    selector = (schoolid == k)

    # Fit models using training data excluding the current school
    
    model_mu.fit(X[~selector], Y[~selector], epochs=50, batch_size=32, verbose=0)
    model_p.fit(X[~selector], D[~selector], epochs=50, batch_size=32, verbose=0)

    # Predict outcomes and treatment probabilities
    
    Y_res = Y[selector].values - model_mu.predict(X[selector]).flatten()
    v = D[selector].values - model_p.predict(X[selector]).flatten()

    # Store the estimated treatment effect for each school
    
    delta_school_dml[k] = np.mean(Y_res * v) / np.mean(D[selector].values * v)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  0%|                                                                                                                               | 0/76 [00:00<?, ?it/s]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 


  1%|█▌                                                                                                                     | 1/76 [00:10<13:17, 10.64s/it]

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 784us/step
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 665us/step


  3%|███▏                                                                                                                   | 2/76 [00:20<12:21, 10.02s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


  4%|████▋                                                                                                                  | 3/76 [00:29<12:01,  9.89s/it]

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 782us/step
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 722us/step


  5%|██████▎                                                                                                                | 4/76 [00:39<11:44,  9.79s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


  7%|███████▊                                                                                                               | 5/76 [00:49<11:30,  9.72s/it]

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


  8%|█████████▍                                                                                                             | 6/76 [00:58<11:19,  9.70s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


  9%|██████████▉                                                                                                            | 7/76 [01:08<11:09,  9.70s/it]

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 11%|████████████▌                                                                                                          | 8/76 [01:18<10:59,  9.69s/it]

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 12%|██████████████                                                                                                         | 9/76 [01:27<10:50,  9.71s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 13%|███████████████▌                                                                                                      | 10/76 [01:37<10:38,  9.67s/it]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step


 14%|█████████████████                                                                                                     | 11/76 [01:47<10:31,  9.71s/it]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step


 16%|██████████████████▋                                                                                                   | 12/76 [01:57<10:23,  9.75s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 17%|████████████████████▏                                                                                                 | 13/76 [02:06<10:13,  9.74s/it]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step


 18%|█████████████████████▋                                                                                                | 14/76 [02:16<10:03,  9.74s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 20%|███████████████████████▎                                                                                              | 15/76 [02:26<09:53,  9.73s/it]

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 922us/step
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 939us/step


 21%|████████████████████████▊                                                                                             | 16/76 [02:36<09:42,  9.70s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 22%|██████████████████████████▍                                                                                           | 17/76 [02:45<09:33,  9.72s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 24%|███████████████████████████▉                                                                                          | 18/76 [02:55<09:23,  9.71s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 25%|█████████████████████████████▌                                                                                        | 19/76 [03:05<09:14,  9.73s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 26%|███████████████████████████████                                                                                       | 20/76 [03:14<09:04,  9.72s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 28%|████████████████████████████████▌                                                                                     | 21/76 [03:24<08:54,  9.71s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 29%|██████████████████████████████████▏                                                                                   | 22/76 [03:34<08:46,  9.75s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 30%|███████████████████████████████████▋                                                                                  | 23/76 [03:44<08:36,  9.74s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 32%|█████████████████████████████████████▎                                                                                | 24/76 [03:53<08:26,  9.74s/it]

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 715us/step
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 691us/step


 33%|██████████████████████████████████████▊                                                                               | 25/76 [04:03<08:13,  9.68s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 988us/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 946us/step


 34%|████████████████████████████████████████▎                                                                             | 26/76 [04:13<08:02,  9.66s/it]

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 36%|█████████████████████████████████████████▉                                                                            | 27/76 [04:22<07:53,  9.66s/it]

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 958us/step
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 987us/step


 37%|███████████████████████████████████████████▍                                                                          | 28/76 [04:32<07:43,  9.66s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 38%|█████████████████████████████████████████████                                                                         | 29/76 [04:42<07:35,  9.68s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 39%|██████████████████████████████████████████████▌                                                                       | 30/76 [04:51<07:25,  9.68s/it]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step


 41%|████████████████████████████████████████████████▏                                                                     | 31/76 [05:01<07:16,  9.70s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 42%|█████████████████████████████████████████████████▋                                                                    | 32/76 [05:11<07:07,  9.71s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 43%|███████████████████████████████████████████████████▏                                                                  | 33/76 [05:21<06:57,  9.72s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 45%|████████████████████████████████████████████████████▊                                                                 | 34/76 [05:30<06:46,  9.68s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step


 46%|██████████████████████████████████████████████████████▎                                                               | 35/76 [05:40<06:37,  9.69s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step


 47%|███████████████████████████████████████████████████████▉                                                              | 36/76 [05:49<06:26,  9.65s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 49%|█████████████████████████████████████████████████████████▍                                                            | 37/76 [05:59<06:16,  9.66s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 50%|███████████████████████████████████████████████████████████                                                           | 38/76 [06:09<06:06,  9.64s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 51%|████████████████████████████████████████████████████████████▌                                                         | 39/76 [06:18<05:57,  9.66s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 53%|██████████████████████████████████████████████████████████████                                                        | 40/76 [06:28<05:47,  9.66s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 54%|███████████████████████████████████████████████████████████████▋                                                      | 41/76 [06:38<05:38,  9.67s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 55%|█████████████████████████████████████████████████████████████████▏                                                    | 42/76 [06:47<05:29,  9.69s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 57%|██████████████████████████████████████████████████████████████████▊                                                   | 43/76 [06:57<05:20,  9.71s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 58%|████████████████████████████████████████████████████████████████████▎                                                 | 44/76 [07:07<05:11,  9.72s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 59%|█████████████████████████████████████████████████████████████████████▊                                                | 45/76 [07:17<05:01,  9.72s/it]

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 61%|███████████████████████████████████████████████████████████████████████▍                                              | 46/76 [07:26<04:50,  9.70s/it]

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 686us/step
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 704us/step


 62%|████████████████████████████████████████████████████████████████████████▉                                             | 47/76 [07:36<04:39,  9.64s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 63%|██████████████████████████████████████████████████████████████████████████▌                                           | 48/76 [07:46<04:29,  9.64s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 64%|████████████████████████████████████████████████████████████████████████████                                          | 49/76 [07:55<04:20,  9.65s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 66%|█████████████████████████████████████████████████████████████████████████████▋                                        | 50/76 [08:05<04:11,  9.67s/it]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step


 67%|███████████████████████████████████████████████████████████████████████████████▏                                      | 51/76 [08:15<04:02,  9.68s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 68%|████████████████████████████████████████████████████████████████████████████████▋                                     | 52/76 [08:24<03:53,  9.72s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 70%|██████████████████████████████████████████████████████████████████████████████████▎                                   | 53/76 [08:34<03:43,  9.72s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 71%|███████████████████████████████████████████████████████████████████████████████████▊                                  | 54/76 [08:44<03:33,  9.72s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 72%|█████████████████████████████████████████████████████████████████████████████████████▍                                | 55/76 [08:54<03:24,  9.73s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 74%|██████████████████████████████████████████████████████████████████████████████████████▉                               | 56/76 [09:03<03:14,  9.72s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 75%|████████████████████████████████████████████████████████████████████████████████████████▌                             | 57/76 [09:13<03:04,  9.70s/it]

[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 615us/step
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 596us/step


 76%|██████████████████████████████████████████████████████████████████████████████████████████                            | 58/76 [09:22<02:52,  9.60s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 78%|███████████████████████████████████████████████████████████████████████████████████████████▌                          | 59/76 [09:32<02:43,  9.59s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 79%|█████████████████████████████████████████████████████████████████████████████████████████████▏                        | 60/76 [09:42<02:34,  9.63s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 80%|██████████████████████████████████████████████████████████████████████████████████████████████▋                       | 61/76 [09:51<02:24,  9.64s/it]

[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 504us/step
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 468us/step


 82%|████████████████████████████████████████████████████████████████████████████████████████████████▎                     | 62/76 [10:01<02:13,  9.56s/it]

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


 83%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                    | 63/76 [10:10<02:04,  9.61s/it]

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 694us/step
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 658us/step


 84%|███████████████████████████████████████████████████████████████████████████████████████████████████▎                  | 64/76 [10:20<01:54,  9.58s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 86%|████████████████████████████████████████████████████████████████████████████████████████████████████▉                 | 65/76 [10:30<01:45,  9.62s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍               | 66/76 [10:39<01:36,  9.63s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 88%|████████████████████████████████████████████████████████████████████████████████████████████████████████              | 67/76 [10:49<01:26,  9.62s/it]

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


 89%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▌            | 68/76 [10:59<01:17,  9.64s/it]

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 922us/step
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 967us/step


 91%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▏          | 69/76 [11:08<01:07,  9.62s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▋         | 70/76 [11:18<00:57,  9.65s/it]

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏       | 71/76 [11:28<00:48,  9.67s/it]

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 910us/step
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 904us/step


 95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▊      | 72/76 [11:37<00:38,  9.63s/it]

[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 853us/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 846us/step


 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎    | 73/76 [11:47<00:28,  9.61s/it]

[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 841us/step
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 849us/step


 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉   | 74/76 [11:56<00:19,  9.59s/it]

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 75/76 [12:06<00:09,  9.60s/it]

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [12:15<00:00,  9.68s/it]


In [10]:
# Save your estimate here

delta_debias['NN'] = np.mean(delta_school_dml)
print(delta_debias['NN'])

0.26989146250125645


### Exercise 4

Repeat Exercise 1, but now _without_ using debiased methods:

- Estimate the regression functions $\mu^{(0)}(x) = E[Y^{(0)}_i|X_i = x]$ and $p(x) = P(D_i = 1|X_i = x)$ by pooling ALL observations.
- Estimate the treatment effects for each school.
- Average the estimated treatment effects over schools.

In [11]:
n_estimators = 200

rf_mu = RandomForestRegressor(
    n_estimators=n_estimators,     
    random_state=42,
)

rf_p = RandomForestClassifier(
    n_estimators=n_estimators,     
    random_state=42,
)

delta_school_dml = np.zeros(K)

# Fit the models using observations over all schools

rf_mu.fit(X, Y)
rf_p.fit(X, D)

# Select the subset of variables with importance higher than their average

importances_mu = rf_mu.feature_importances_
importances_p = rf_p.feature_importances_

selected_features_mu = X.columns[importances_mu > np.mean(importances_mu)]
selected_features_p = X.columns[importances_p > np.mean(importances_p)]

# Fit random forests again, using only the selected variables

rf_mu.fit(X.loc[:,selected_features_mu], Y)
rf_p.fit(X.loc[:,selected_features_p], D)


# Estimate the treatment effect for each school

for k in tqdm(range(K)):
    
    selector = (schoolid == k)

    # Calculate the treatment effect
    
    Y_res = Y[selector].values-rf_mu.predict(X.loc[selector, selected_features_mu])
    D_k = D[selector].values
    
    
    # Store the estimated treatment effect for each school
    
    delta_school_dml[k] = np.mean(Y_res * D_k) / np.mean(D_k**2) # Conventional LS estimator

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 1358.53it/s]


In [12]:
# Save your estimate here

delta_raw['RF'] = np.mean(delta_school_dml)
print(delta_raw['RF'])

0.13165539876210328
