# Proximal Causal Inference Implementation
The following code implements recovering the average causal effect between a treatment A on an outcome Y in the proximal causal inference setup where W and Z are proxies of U by fitting two linear regressions. Here, U, W, and Z are binary variables, so the first model is a linear logistic regression. On the other hand, Y is a continuous variable, so the second model is a normal linear logistic regression.

In [30]:
import pandas as pd
import numpy as np
from scipy.special import expit
import statsmodels.api as sm

np.random.seed(0)
size = 10000
verbose = True

# U = np.random.normal(0, 1, size)
U = np.random.binomial(1, 0.48, size)

# W = np.random.normal(0, 1, size) + 1.3*U
W = np.random.binomial(1, expit(1.3*U), size)

# Z = np.random.normal(0, 1, size) + 2*U
Z = np.random.binomial(1, expit(2*U), size)

A = np.random.binomial(1, expit(0.8*U), size)
if verbose:
    print(np.mean(A))

Y = np.random.normal(0, 1, size) + 1.3*A + 1.4*U

data = pd.DataFrame({"U": U, "W": W, "A": A, "Y": Y, "Z": Z})

0.6007


In [32]:
# fit a model W~A+Z
model1 = sm.GLM.from_formula(formula="W~A+Z", data=data, family=sm.families.Binomial()).fit()
print(model1.params)

What = model1.predict(data)
data["What"] = What

model2 = sm.GLM.from_formula(formula="Y~A+What", data=data, family=sm.families.Gaussian()).fit()
print(model2.params[1])

Intercept    0.061537
A            0.204127
Z            0.505602
dtype: float64
1.3468085291324319
