## Average Treatment Effect on Patients' Outcome

Import required libraries.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from causalinference import CausalModel

Load the dataset and provide contextual information on column types.

In [2]:
# Load the dataset
df = pd.read_csv("mydata.csv")

# Contextual information
binary_columns = ['X5', 'W', 'Y']
categorical_columns = ['X6', 'X8']
numeric_columns = ['X1', 'X2', 'X3', 'X4', 'X7', 'X9']

Define the covariates and the treatment indicator and split the data into treatment and control groups

In [3]:
# Define covariates and treatment indicator
covariates = ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9']
treatment_indicator = 'W'

Create the preprocessing pipeline using standard scaling of numeric columns and one-hot encoding of categorical columns.

In [4]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_columns),
        ('cat', OneHotEncoder(), categorical_columns),
    ],
    remainder='passthrough'
)

Create a dataframe X that is suitable for the "causalinference" package.

In [5]:
# Transform X for causalinference package requirements
X = df[covariates]
X = preprocessor.fit_transform(X)
X = pd.DataFrame(X)

Run the causal model.

In [6]:
# Run causal model
cm = CausalModel(Y = df['Y'].values, D = df['W'].values, X = X.values)

Estimate propensity score with logistic regression (Imbens & Rubin, 2015), trim the data based on propensity score (Crump, Hotz, Imbens, & Mitnik, 2009) and estimate average treatment effect (ATE) using nearest-neighbourhood matching. Confidence intervals are also provided in the results.

In [7]:
cm.est_propensity_s()
cm.trim_s()
cm.est_via_matching()

print(cm.estimates)


Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.205      0.015     13.791      0.000      0.176      0.234
           ATC      0.217      0.017     12.790      0.000      0.184      0.250
           ATT      0.196      0.017     11.585      0.000      0.163      0.229



References:

Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199. Retrieved from http://www.jstor.org/stable/27798811

Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge: Cambridge University Press.