## Sensitivity Analysis with the causalsens Package

In this notebook, we explore sensitivity analysis using the R [causalsens](https://cran.r-project.org/web/packages/causalsens/index.html) package. Sensitivity analysis is a way to quantitatively evaluate the amount of potential biases in causal inference results. This is important because most causal inference algorithms require an ignorability assumption, where the treated units are comparable to the control units. We are often uncertain of the validity of this assumption.

See the [paper](https://www.mattblackwell.org/files/papers/causalsens.pdf) for more technical details.

We begin with some imports of standard data science libraries, the `whynot` package, and various `rpy2` packages in order to use the R package in the Jupyter notebook.

In [1]:
import matplotlib.pyplot as plt
import numpy as np

import whynot as wn

In [2]:
from rpy2.robjects.packages import importr
import rpy2.robjects as ro

package_name = "causalsens"

# Try-except block to install package if we have not already installed it
try:
    pkg = importr(package_name)
except:
    ro.r(f'install.packages("{package_name}")')
    pkg = importr(package_name)

stats = importr('stats')
base = importr('base')
grdevices = importr('grDevices')

We work with the Opipoid Unobserved Confounding Experiment. We get our dataset and the estimated effects using the `causal_suite`.

In [3]:
num_samples = 100
exp = wn.opioid.UnobservedConfounding
dset = exp.run(num_samples=num_samples)
estimated_effects = wn.causal_suite(dset.covariates, dset.treatments, dset.outcomes)


We calculate the true average treatment effect.

In [4]:
np.mean(dset.true_effects)

-16590.944766502176

These are the estimates of the average treatment effect using the various algorithms in our causal suite.

In [5]:
for key in estimated_effects:
    print(key)
    print(estimated_effects[key].ate)

ols
-12815.999660766945
propensity_score_matching
9253.106692397596
propensity_weighted_ols
-15337.879272539343
causal_forest
-18293.209287837348


We place the data into an R dataframe, create a regression model for the outcomes, a propensity score model (a logistic regression), and create a range for the $\alpha$ values.

In [6]:
d = {'x1': dset.covariates[:, 0], 'x2': dset.covariates[:, 1], 'y': dset.outcomes, 'z': dset.treatments}
dataf = ro.DataFrame(d)

linear_model = stats.lm("y ~ x1 + x2 + z", data=dataf)
p_model = stats.glm("z ~ x1 + x2", data=dataf, family=stats.binomial())

alpha = np.arange(-4500, 4500, 250)
ll_sens = pkg.causalsens(linear_model, p_model, ro.Formula('~ x1 + x2'), data=dataf, alpha=alpha, confound=pkg.one_sided_att)

We plot the estimated effect against the amount of raw confounding (in terms of $\alpha$) into a file and display it in Markdown. We can see that the true effect is contained in the confidence bounds for all values of $\alpha$.

In [7]:
grdevices.png(file="plots/amt_confounding.png", width=512, height=512)
ro.r.plot(ll_sens, type="raw", bty="n")
grdevices.dev_off();

![](plots/amt_confounding.png)

In [8]:
grdevices.png(file="plots/var_confounding.png", width=512, height=512)
ro.r.plot(ll_sens, type="r.squared", bty="n")
grdevices.dev_off();

We also plot the estimated effect in terms of the variance explained by the confounding. Again, the true effect is contained in the confidence bounds for all values of $\alpha$.

![](plots/var_confounding.png)

Let's see what happens when we change the number of datapoints in our dataset.

In [9]:
num_samples = 500
exp = wn.opioid.UnobservedConfounding
dset = exp.run(num_samples=num_samples)
np.mean(dset.true_effects)

-16599.097457592823

In [10]:
def create_plot(dset, num_points):
    d = {'x1': dset.covariates[:num_points, 0], 'x2': dset.covariates[:num_points, 1], 'y': dset.outcomes[:num_points], 'z': dset.treatments[:num_points]}
    dataf = ro.DataFrame(d)

    linear_model = stats.lm("y ~ x1 + x2 + z", data=dataf)
    p_model = stats.glm("z ~ x1 + x2", data=dataf, family=stats.binomial())

    alpha = np.arange(-4500, 4500, 250)
    ll_sens = pkg.causalsens(linear_model, p_model, ro.Formula('~ x1 + x2'), data=dataf, alpha=alpha, confound=pkg.one_sided_att)
    grdevices.png(file="plots/amt_confounding_" + str(num_points) + ".png", width=512, height=512)
    ro.r.plot(ll_sens, type="raw", bty="n")
    grdevices.dev_off();

When we only have 100 points:

In [11]:
create_plot(dset, 100)

![](plots/amt_confounding_100.png)

When we only have 200 points:

In [12]:
create_plot(dset, 200)

![](plots/amt_confounding_200.png)

When we have 500 points:

In [13]:
create_plot(dset, 500)

![](plots/amt_confounding_500.png)

As we can see, the confidence intervals of the estimated effect shrink with an increase in the size of the dataset. For 200 and 500 datapoints, the confidence interval does not contain the true effect for many values of confounding!