In [2]:
pip install -U DoubleML

Collecting DoubleML
  Downloading DoubleML-0.4.1-py3-none-any.whl (116 kB)
Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py): started
  Building wheel for sklearn (setup.py): finished with status 'done'
  Created wheel for sklearn: filename=sklearn-0.0-py2.py3-none-any.whl size=1309 sha256=6c00057363e1c17d360bd936e904f419ca72abca540d03edfbb2f44110af3ee6
  Stored in directory: c:\users\kenia\appdata\local\pip\cache\wheels\e4\7b\98\b6466d71b8d738a0c547008b9eb39bf8676d1ff6ca4b22af1c
Successfully built sklearn
Installing collected packages: sklearn, DoubleML
Successfully installed DoubleML-0.4.1 sklearn-0.0
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install SyncRNG

Collecting SyncRNG
  Downloading SyncRNG-1.3.3-cp39-cp39-win_amd64.whl (19 kB)
Installing collected packages: SyncRNG
Successfully installed SyncRNG-1.3.3
Note: you may need to restart the kernel to use updated packages.


In [5]:
import random
import pandas as pd
import numpy as np
from scipy.stats import norm

import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy
from SyncRNG import SyncRNG
import numpy as np
import re
from statsmodels.sandbox.stats.multicomp import multipletests
from scipy import linalg
from itertools import chain

from SyncRNG import SyncRNG

from CTL.causal_tree_learn import CausalTree
from sklearn.model_selection import train_test_split

# HTE I: Binary treatment

Source RMD file: [link](https://docs.google.com/uc?export=download&id=1FSUi4WLfYYKnvWsNWypiQORhkqf5IlFP)

In the previous chapter, we learned how to estimate the effect of a binary treatment averaged over the entire population. However, the average may obscure important details about how different individuals react to the treatment. In this chapter, we will learn how to estimate the **conditional average treatment effect (CATE)**,
\begin{equation}
  (\#eq:cate)
  \tau(x) := \E[Y_i(1) - Y_i(0) | X_i = x],
\end{equation}
which is a "localized" version of the average treatment effect conditional on a vector of observable characteristics. 

It's often the case that \@ref(eq:cate) is too general to be immediately useful, especially when the observable covariates are high-dimensional. It can be hard to estimate reliably without making strong modeling assumptions, and hard to summarize in a useful manner after estimation. In such situations, we will instead try to estimate treatment effect averages for simpler groups
\begin{equation}
  (\#eq:cate-g)
  \E[Y_i(1) - Y_i(0) | G_i = g],
\end{equation}
where $G_i$ indexes subgroups of interest. Below you'll learn how to estimate and test hypotheses about pre-defined subgroups, and also how to discover subgroups of interest from the data. In this tutorial, you will learn how to use estimates of \@ref(eq:cate) to suggest relevant subgroups $G_i$ (and in the next chapters you will find out other uses of \@ref(eq:cate) in policy learning and evaluation).

We'll continue using the abridged version of the General Social Survey (GSS) [(Smith, 2016)](https://gss.norc.org/Documents/reports/project-reports/GSSProject%20report32.pdf) dataset that was introduced in the previous chapter. In this dataset, individuals were sent to treatment or control with equal probability, so we are in a randomized setting. However, many of the techniques and code shown below should also work in an observational setting provided that unconfoundedness and overlap are satisfied (these assumptions were defined in the previous chapter).

As with other chapters in this tutorial, the code below should still work by replacing the next snippet of code with a different dataset, provided that you update the key variables `treatment`, `outcome`, and `covariates` below. Also, please make sure to read the comments as they may be subtle differences depending on whether your dataset was created in a randomized or observational setting.

In [309]:
data = pd.read_csv( "https://docs.google.com/uc?id=1kSxrVci_EUcSr_Lg1JKk1l7Xd5I9zfRC&export=download" )

n = data.shape[0]

# Treatment: does the the gov't spend too much on "welfare" (1) or "assistance to the poor" (0)
treatment = "w"

# Outcome: 1 for 'yes', 0 for 'no'
outcome = "y"

# Additional covariates
covariates = ["age", "polviews", "income", "educ", "marital", "sex"]

## Pre-specified hypotheses

We will begin by learning how to test pre-specified null hypotheses of the form
\begin{equation}
  (\#eq:hte-hyp)
  H_{0}: \E[Y(1) - Y(0) | G_i = 1] = \E[Y(1) - Y(0) | G_i = 0].
\end{equation}

That is, that the treatment effect is the same regardless of membership to some group
$G_i$. Importantly, for now we’ll assume that the group $G_i$ was **pre-specified** -- it was decided _before_ looking at the data.

In a randomized setting, if the both the treatment  $W_i$ and group membership $G_i$ are binary, we can write
\begin{equation}
  (\#eq:linear)
  \E[Y_i(W_i)|G_i] = \E[Y_i|W_i, G_i] = \beta_0 + \beta_w W_i + \beta_g G_i + \beta_{wg} W_i G_i
\end{equation}

<font size=1>
When $W_i$ and $G_i$ are binary, this decomposition is true without loss of generality. Why?
</font>

This allows us to write the average effects of $W_i$ and $G_i$ on $Y_i$ as
\begin{equation}
  (\#eq:decomp)
  \begin{aligned}
    \E[Y(1) | G_i=1] &= \beta_0 + \beta_w W_i + \beta_g G_i + \beta_{wg} W_i G_i, \\
    \E[Y(1) | G_i=0] &= \beta_0 + \beta_w W_i,  \\
    \E[Y(0) | G_i=1] &= \beta_0 + \beta_g G_i,  \\
    \E[Y(0) | G_i=0] &= \beta_0.
  \end{aligned}
\end{equation}

Rewriting the null hypothesis \@ref(eq:hte-hyp) in terms of the decomposition \@ref(eq:decomp), we see that it boils down to a test about the coefficient in the interaction: $\beta_{xw} = 0$. Here’s an example that tests whether the treatment effect is the same for "conservative" (`polviews` < 4) and "liberal" (`polviews` $\geq$ 4) individuals.

In [314]:
data["conservative"] = np.multiply(data.polviews < 4, 1)  # a binary group

In [315]:
# Only valid in randomized settings

# Suppose this his group was defined prior to collecting the data
data["conservative"] = np.multiply(data.polviews < 4, 1)  # a binary group
group = 'conservative'

# Recall from last chapter -- this is equivalent to running a t-test
fmla = 'y ~ w*conservative'
ols = smf.ols(fmla, data=data).fit(cov_type='HC2')
# print(ols_1.summary())
hypotheses = 'Intercept=0, w=0, conservative=0, w:conservative=0'
t_test = ols.t_test(hypotheses)
print(t_test.summary(xname=list(ols.summary2().tables[1].index)))

                               Test for Constraints                               
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.4836      0.005     95.127      0.000       0.474       0.494
w                 -0.3789      0.006    -64.657      0.000      -0.390      -0.367
conservative      -0.1590      0.009    -17.195      0.000      -0.177      -0.141
w:conservative     0.1160      0.010     11.185      0.000       0.096       0.136


<font size=1>
Interpret the results above. The coefficient $\beta_{xw}$ is denoted by `w:conservativeTRUE`. Can we detect a difference in treatment effect for conservative vs liberal individuals? For whom is the effect larger?
</font>



## Data-driven hypotheses

Pre-specifying hypotheses prior to looking at the data is in general good practice to avoid "p-hacking" (e.g., slicing the data into different subgroups until a significant result is found). However, valid tests can also be attained if by **sample splitting**: we can use a subset of the sample to find promising subgroups, then test hypotheses about these subgroups in the remaining sample. This kind of sample splitting for hypothesis testing is called **honesty**.

### Via causal trees

**Causal trees** [(Athey and Imbens)](PNAS, 2016)](https://www.pnas.org/content/pnas/113/27/7353.full.pdf) are an intuitive algorithm that is available in the randomized setting to discover subgroups with different treatment effects.

At a high level, the idea is to divide the sample into three subsets (not necessarily of equal size). The `splitting` subset is used to fit a decision tree whose objective is modified to maximize heterogeneity in treatment effect estimates across leaves. The `estimation` subset is then used to produce a valid estimate of the treatment effect at each leaf of the fitted tree. Finally, a `test` subset can be used to validate the tree estimates.

The next snippet uses `honest.causalTree` function from the [`causalTree`](https://github.com/susanathey/causalTree) package. For more details, see the [causalTree documentation](https://github.com/susanathey/causalTree/blob/master/briefintro.pdf).

In [102]:
data

Unnamed: 0,X,y,w,age,polviews,income,educ,marital,sex,conservative
0,1,0,0,28,4,11,14,5,1,0
1,2,1,0,54,6,12,16,2,2,0
2,3,1,0,44,2,12,16,5,2,1
3,6,0,0,47,1,5,10,4,1,1
4,7,0,1,19,4,9,10,5,2,0
...,...,...,...,...,...,...,...,...,...,...
28648,36497,0,0,62,5,12,16,1,1,0
28649,36498,1,0,66,7,9,12,2,2,0
28650,36499,0,1,54,3,11,12,4,2,1
28651,36500,0,0,57,3,6,16,3,2,1


In [103]:
X = data[['age','polviews', 'income','educ','marital','sex']]
y = data['y']
treatment = data['w']

In [104]:
columns = X.columns
X = X.values
y = y.values
treatment = treatment.values

In [298]:
# CL-honest

cthl = CausalTree(honest=True, min_size=1, split_size=0.33)
cthl.fit(X, y, treatment)
cthl.prune()
cthl.plot_tree(features=columns, filename="bin_tree_honest_1", show_effect=True, alpha = 0)


In [299]:
train_x, val_x, train_y, val_y, train_t, val_t = train_test_split(X, y, treatment, random_state=724, shuffle=True,
                                                                          test_size=0.33)
# get honest/estimation portion
train_x, est_x, train_y, est_y, train_t, est_t = train_test_split(train_x, train_y, train_t, shuffle=True,
                                                                          random_state=724, test_size=0.5)

In [300]:
est_x.shape

(9599, 6)

In [301]:
cthl_predict = cthl.predict(est_x)
cthl_predict

array([-0.3699006 , -0.3699006 , -0.3699006 , ..., -0.25010086,
       -0.3699006 , -0.3699006 ])

In [302]:
np.unique(cthl_predict)

array([-0.66666667, -0.5       , -0.43380279, -0.40642303, -0.40613027,
       -0.37373737, -0.3699006 , -0.30637293, -0.25967213, -0.25010086,
       -0.20362903, -0.17777778, -0.1375    , -0.08333333,  0.        ])

In [303]:
num_leaves = len(np.unique(cthl_predict))
print(num_leaves)

15


In [304]:
labels = [i for i in range(1,len(np.unique(cthl_predict)) + 1 ) ]

In [305]:
labels

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

In [306]:
predict = pd.DataFrame({"predict": cthl_predict})
predict['leaves'] = pd.Categorical(predict.predict)

In [307]:
predict['leaves'] = predict['leaves'].cat.rename_categories(labels)

In [308]:
predict

Unnamed: 0,predict,leaves
0,-0.369901,7
1,-0.369901,7
2,-0.369901,7
3,-0.369901,7
4,-0.369901,7
...,...,...
9594,-0.406130,5
9595,-0.433803,3
9596,-0.250101,10
9597,-0.369901,7
