In [2]:
import numpy as np
import pandas as pd
from scipy.stats import multivariate_normal
import scipy.sparse as sp	
from statsmodels.api import Logit

import gurobipy as gp
from gurobipy import GRB

This is an implementation of the simulation section of Yu's Balancing Weights Causal Inference in Observational Factorial Studies paper (Yu 5.1).

In [27]:
rseed = 42
np.random.seed(rseed)
n = 500 # set as 500, 1000, 2000
rho = 0 # set as 0, 0.2, 0.4 for each scenario

Following the specification in Yu: "the treatment assignment mechanism for $Z_{ik}$ is independent across $k$'s and satisfies a logistic regression that $P(Z_{ik} = 1) = \frac{1}{(1+exp(-\beta_k^T X_i))}$ where "$\beta_1 = (\frac{1}{4}, \frac{2}{4}, 0, \frac{3}{4}, 1)$, $\beta_2 = (\frac{3}{4}, \frac{1}{4}, 1, 0, \frac{2}{4})$, $\beta_3 = (1, 0, \frac{3}{4}, \frac{2}{4}, \frac{1}{4})$."

This treatment assignment ensures all $2^3=8$ treatment combination groups are non-empty and observed so that the paper's proposed weighting estimators are applicable. We additionally assume conditional independence of factors given covariates, and that only the main factors of the three treatments are non-negligible.

In [33]:
# Define mean vector (mu)
mu = np.array([0.1, 0.1, 0.1, 0, 0]).T

# Define covariance matrix (Sigma) with 5 covariates, defined as according to paper specifications
#   - Diagonal filled with ones, rest with correlation coefficient rho
Sigma = np.full((5, 5), rho)
np.fill_diagonal(Sigma, 1)  

# Generate covariates (X) from multivariate normal
np.random.seed(rseed)
X = multivariate_normal.rvs(mean=mu, cov=Sigma, size=n)

In [34]:
# Define beta coefficients for treatment assignments to fulfill all treatment outcomes observed condition
beta_1 = np.array([1/4, 2/4, 0, 3/4, 1])
beta_2 = np.array([3/4, 1/4, 1, 0, 2/4])
beta_3 = np.array([1, 0, 3/4, 2/4, 1/4])

# Logistic function to generate treatment assignments
def logistic_prob(X, beta):
    return 1 / (1 + np.exp(-X @ beta))

# Generate treatment assignments independently
Z1 = np.random.binomial(1, logistic_prob(X, beta_1), size=n)
Z2 = np.random.binomial(1, logistic_prob(X, beta_2), size=n)
Z3 = np.random.binomial(1, logistic_prob(X, beta_3), size=n)

For our problem setup, we next consider three outcome models, with errors following standard normal distribution.
    
An additive outcome: 
$Y_{i1} = 2\sum_{k=1}^5 X_{ik} + \sum_{j=1}^3 Z_{ij} + \epsilon_{i1}$ 
    
A heterogeneous treatment effect outcome: 
$Y_{i2} = 2\sum_{k=1}^5 X_{ik} + \sum_{k=1}^5 X_{ik} \sum_{j=1}^3 Z_{ij} + \epsilon_{i2}$ 
    
A misspecified outcome:
$Y_{i3} = sin(X_{i1}) + cos(X_{i2}) + (min(1, X_{i1}) + X_{i2})Z_{i1} + \sum_{k=1}^5 X_{ik} \sum_{j=2}^3 Z_{ij} + \epsilon_{i3}$

Additionally, four estimators are implemented for each main effect $\tau_k, k=1, 2, 3$ under each outcome model:
- The additive regression estimator
- The interaction regression estimator
- The weighting estimator under general additive model assumption and covariate basis functions $h_s(X) = X_s, s=1,... 5$
- The weighting estimator with balance constraints under the outcome model specification with treatment effect heterogeneity, and the same basis functions

https://par.nsf.gov/servlets/purl/10337012 - Regression based causal inference with factorial experiments