# Bayesian Causal Inference in Non-Randomized Experiments

**Author**: Leo Guelman

* [1. Problem Statment](#problem1)
 * [1.1 The National Study of Learning Mindsets](#mindsets11)
 * [1.2 Data Description](#data12)
 * [1.2 The Questions](#questions13)
* [2. Analysis](#analysis2) 
 * [2.1 Imports](#imports21)

* [References](#ref)
 
 


# 1. Problem Statment <a class="anchor" id="problem1"></a>

## 1.1 The National Study of Learning Mindsets <a class="anchor" id="mindsets11"></a>

We look at the causal inference challenge presented by the *National Study of Learning Mindsets* (Yeager et al., 2019) from a Bayesian perspective. 

The NSLM is a randomized experiment designed to assess the effectiveness of an intervention to improve academic outcomes of students with a *growth mindset*. The *growth mindset* is a belief that people can develop intelligence, as opposed to the *fixed mindset* view which sees intelligence as an innate trait that is fixed at birth.

The original study consisted in a randomized experiment composed of students from 76 schools drawn from the national probability sample of U.S. public schools. In addition, to assessing the average treatment effect (ATE), the study was designed to estimate the degree of heterogeneity in treatment effect across both students and schools. 

A synthetic dataset was generated to mimic the original data, but with the goal of creating an observational study that includes confounding effects not present in the original randomized experiment. Besides this difference, the synthetic data resembles the real NSLM data in terms of covariate distribution, data structures, and effect sizes. 

During the 2018 Atlantic Causal Inference Conference, eight groups of participans were invited to analyze the synthetic data to assess the questions of average treatment effect and treatment effect variation in non-randomized experimental settings. Participants employed a diverse set of methods, ranging from matching and flexible outcome modeling to semiparametric estimation and ensemble approaches. In this study, we employ an alternative approach founded in Bayesian inference principles.

## 1.2 Data Description <a class="anchor" id="data12"></a>

The analysis is based on the sythetic dataset of $n=10,391$ children from a sample of $J=76$ schools. For each children $i=\{1, \ldots, n\}$, we observe a binary treatment indicator $Z_i$, a real-valued outcome $Y_i$, as well as 10 categorical or real-valued covariates as outlined in the table below. For a full description of the data generating process refer to Carvalho et al., 2019.


| Covariate | Description |
| :---        |    :----   | 
| S3 | Student’s self-reported expectations for success in the future, a proxy for prior achievement, measured prior to random assignment|
| C1 | Categorical variable for student race/ethnicity |
|C2 | Categorical variable for student identified gender
|C3 | Categorical variable for student first-generation status, i.e. first in family to go to college
|XC | School-level categorical variable for urbanicity of the school, i.e. rural, suburban, etc.
| X1 | School-level mean of students’ fixed mindsets, reported prior to random assignment
| X2|  School achievement level, as measured by test scores and college preparation for the previous 4 cohorts of students
|X3  | School racial/ethnic minority composition, i.e., percentage of student body that is Black, Latino, or Native American
| X4 | School poverty concentration, i.e., percentage of students who are from families whose incomes fall below the federal poverty line
| X5 | School size, i.e., total number of students in all four grade levels in the school
| Y | Post-treatment outcome, a continuous measure of achievement
|Z | Treatment, i.e., receipt of the intervention





## 1.3 The Questions <a class="anchor" id="questions13"></a>

The two questions we are aiming to address as part of this study are the following:

1. Was the mindset intervention effective in improving student achievement?
2. Was the effect of the intervention moderated by school level achievement (X2) or pre-existing mindset norms (X1)? In particular there are two competing hypotheses about how X2 moderates the effect of the intervention: Either it is largest in middle-achieving schools (a "Goldilocks effect") or is decreasing in school-level achievement.


# 2. Analysis <a class="anchor" id="analysis2"></a>

## 2.1 Imports <a class="anchor" id="imports21"></a>

In [None]:
import os
os.chdir('/Users/lguelman/Library/Mobile Documents/com~apple~CloudDocs/LG_Files/Development/BCI/python')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
parameters = {'figure.figsize': (8, 4),
              'font.size': 8, 
              'axes.labelsize': 12}
plt.rcParams.update(parameters)
plt.style.use('fivethirtyeight')

import pystan
import multiprocessing
import stan_utility
import arviz as az

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from xgboost import XGBClassifier


import seaborn as sns

from acic_utils import preprocess_p_scores, stan_model_summary

## 2.2 Get Data  <a class="anchor" id="data22"></a>

In [None]:
df = pd.read_csv("../data/synthetic_data.csv")
df
df.info()
df.describe()

## 2.3 Assessing Balance of Covariates

Covariate balance is the degree to which the distribution of covariates is similar across levels of the treatment. Here we assess the extent to which the treatment assignment was uniformly randomized accross observational units, or there are some selection effects. To that end, we use *Prognostic scores* (Hansen 2008).

The prognostic score is defined as the predicted outcome under the control condition, reflecting the baseline "risk". It is estimated by fitting a model of the outcome in the control group, and then using that model to to obtain predictions of the outcome under the control condition for all individuals. The standardize difference in the mean prognostic scores between treatment and control groups is the used as a meansure of covariate balance. 

Here we simply use a Bayesian linear regression model to get a posterior distribution of the standardize difference in the mean prognostic scores between treatment and control groups.

We first pre-process the data (encode categorical features and scaling).

In [None]:
X, z, y = preprocess_p_scores(df)

print("Features dimension:", X.shape)
print("Treatment dimension:", z.shape)
print("Response dimension:", y.shape)
print("Number of treated / control units:", sum(z), "/", X.shape[0]-sum(z))

Now we fit the model in stan. We store the stan code spearatetly in `stan_linear_reg.stan` within the `stan` folder.

In [None]:
n, p = X[z==0,:].shape # Fit model using control units only

stan_data_dict = {'N': n,
                  'K': p,
                  'x': X[z==0,:],
                  'y': y[z==0],
                  'N_new': X.shape[0],
                  'x_new': X
                  }

sm = pystan.StanModel('../stan/stan_linear_reg.stan') 
multiprocessing.set_start_method("fork", force=True)
fit = sm.sampling(data=stan_data_dict, iter=1000, chains=4)

In [None]:
fit_summary = stan_model_summary(fit)
fit_summary

From the analysis below, notice that students with highest potential outcomes under control are more likely to receive treatment. Thus, we proceed the analysis as an observational study instead on a randomized one. 

**Add issues based on Bayesian Tree paper**

In [None]:
# Extract prognostic scores
samples = fit.extract(permuted=True)
prog_scores = samples['prog_scores'].T

# Compute mean and standardize mean differences in scores
mcmc_samples = prog_scores.shape[1]
prog_scores_std_diff = np.zeros(mcmc_samples)

for s in range(mcmc_samples):
    prog_scores_diff[s] = np.mean(prog_scores[z==1,s]) - np.mean(prog_scores[z==0,s])
    prog_scores_std_diff[s] = prog_scores_diff[s] / np.std(prog_scores[:,s])
  
                               
plt.hist(prog_scores_std_diff, bins = 30)
plt.title("Standardized mean difference in Prognostic scores", fontsize=12)
plt.show()  


In [None]:
prog_scores_df = pd.DataFrame({'prog_score_mean': np.mean(prog_scores, axis =1),'z':z})
prog_scores_df['prog_score_mean_quantile']= pd.qcut(prog_scores_df['prog_score_mean'], 
                                                    q = 5, labels = False)+1
prog_scores_df.groupby(['prog_score_mean_quantile'])['z'].mean().plot(xticks=list(range(1,6)), 
                                                                     xlabel='Mean Prognostic Score (Quantile)',
                                                                     ylabel='Proportion treated')

# References <a class="anchor" id="ref"></a>

Yeager, D.S., Hanselman, P., Walton, G.M. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019). https://doi.org/10.1038/s41586-019-1466-y

Carvalho, C., Feller, A., Murray, J., Woody, S., and Yeager, D. Assessing Treatment Effect Variation in Observational Studies: Results from a Data Challenge, (2019). https://arxiv.org/abs/1907.07592

Hansen, Ben B. The Prognostic Analogue of the Propensity Score. Biometrika 95 (2), 481–88, (2008). https://doi.org/10.1093/biomet/asn004.