# Sensitivity Analysis with the *causalsens* Package

In this notebook, we explore sensitivity analysis using the R [causalsens](https://cran.r-project.org/web/packages/causalsens/index.html) package. Sensitivity analysis is an attempt to quantitatively evaluate the amount of potential biases in causal inference results. This is important because most causal inference algorithms require an ignorability assumption, where the treated units are comparable to the control units. We are often uncertain of the validity of this assumption.

See the [paper](https://www.mattblackwell.org/files/papers/causalsens.pdf) for more technical details.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import rpy2.robjects.packages as rpackages
import rpy2.robjects as ro
from rpy2.robjects import numpy2ri, pandas2ri
numpy2ri.activate()
pandas2ri.activate()

import whynot as wn

  from pandas.core.index import Index as PandasIndex
  import pandas.util.testing as tm


In [2]:
package_name = "causalsens"
try:
    causalsens = rpackages.importr(package_name)
except:
    utils = rpackages.importr("utils")
    utils.chooseCRANmirror(ind=1)
    utils.install_packages(package_name)
    causalsens = importr(package_name)

stats = rpackages.importr('stats')
base = rpackages.importr('base')
grdevices = rpackages.importr('grDevices')



## Generating confounded data

We generate data with *unobserved* confounding using the `wn.opioid.UnobservedConfounding` experiment on the [opioid simulator](https://whynot-docs.readthedocs-hosted.com/en/latest/simulators.html#opioid-simulator).

In [3]:
dset = wn.opioid.UnobservedConfounding.run(num_samples=100)

In [4]:
print(f"True average treatment effect: {np.mean(dset.true_effects):.2f} opioid deaths")

True average treatment effect: -16589.00 opioid deaths


## Estimating treatment effects

We compute estimated treatment effects on this dataset using the collection of estimators provided in the `causal_suite`.

In [5]:
estimated_effects = wn.causal_suite(dset.covariates, dset.treatments, dset.outcomes)
for estimator, estimate in estimated_effects.items():
    print(f"{estimator}: {estimate.ate:.2f}")

ols: -14365.12
propensity_score_matching: -682.56
propensity_weighted_ols: -15721.77
causal_forest: -8832.66


## Running sensitivity analysis

We perform a sensivity analysis using the *causalsens* package. To do this, we first fit a regression model for the outcomes, as well as a propensity score model (a logistic regression). We then compute interval bounds for the treatment effect, *accounting for unobserved confounding* via a *confounding function* that is parameterized by a single scalar, $\alpha$. 

When $\alpha > 0$, observed potential outcomes $Y$ are on average higher than their conterfactuals for all $X$, and similarly for $\alpha < 0$. The setting $\alpha = 0$ corresponds to no unobserved confounding.

In [6]:
df = ro.DataFrame({
    'x1': dset.covariates[:, 0],
    'x2': dset.covariates[:, 1], 
    'y': dset.outcomes,
    'z': dset.treatments})

linear_model = stats.lm("y ~ x1 + x2 + z", data=df)
p_model = stats.glm("z ~ x1 + x2", data=df, family=stats.binomial())

alpha = np.arange(-4500, 4500, 250)
ll_sens = causalsens.causalsens(linear_model, p_model, ro.Formula('~ x1 + x2'),
                                data=df, alpha=alpha, confound=causalsens.one_sided_att)

## Plotting sensitivity bounds
We plot the estimated effect against the amount of raw confounding (in terms of $\alpha$) into a file and display it in Markdown. We can see that the true effect is contained in the confidence bounds for all values of $\alpha$.

In [7]:
grdevices.png(file="assets/sensitivity_plots/amt_confounding.png", width=512, height=512)
ro.r.plot(ll_sens, type="raw", bty="n")
grdevices.dev_off();

![](assets/sensitivity_plots/amt_confounding.png)

In [8]:
grdevices.png(file="assets/sensitivity_plots/var_confounding.png", width=512, height=512)
ro.r.plot(ll_sens, type="r.squared", bty="n")
grdevices.dev_off();

We also plot the estimated effect in terms of the variance explained by the confounding. Again, the true effect is contained in the confidence bounds for all values of $\alpha$.

![](assets/sensitivity_plots/var_confounding.png)

## Plotting sensitivity analysis as sample size varies

Let's see what happens when we change the number of datapoints in our dataset.

In [9]:
exp = wn.opioid.UnobservedConfounding
dset = exp.run(num_samples=500)
print(f"True effect: {np.mean(dset.true_effects):.2f}")

True effect: -16590.45


In [10]:
def create_plot(dset, num_points):
    df = ro.DataFrame({
        'x1': dset.covariates[:num_points, 0],
        'x2': dset.covariates[:num_points, 1],
        'y': dset.outcomes[:num_points],
        'z': dset.treatments[:num_points]})

    linear_model = stats.lm("y ~ x1 + x2 + z", data=df)
    p_model = stats.glm("z ~ x1 + x2", data=df, family=stats.binomial())

    alpha = np.arange(-4500, 4500, 250)
    ll_sens = causalsens.causalsens(linear_model, p_model, ro.Formula('~ x1 + x2'),
                                    data=df, alpha=alpha, confound=causalsens.one_sided_att)
    grdevices.png(file="assets/sensitivity_plots/amt_confounding_" + str(num_points) + ".png",
                  width=512, height=512)
    ro.r.plot(ll_sens, type="raw", bty="n")
    grdevices.dev_off();

When we only have 100 points:

In [11]:
create_plot(dset, 100)

![](assets/sensitivity_plots/amt_confounding_100.png)

When we only have 200 points:

In [12]:
create_plot(dset, 200)

![](assets/sensitivity_plots/amt_confounding_200.png)

When we have 500 points:

In [13]:
create_plot(dset, 500)

![](assets/sensitivity_plots/amt_confounding_500.png)

As we can see, the confidence intervals of the estimated effect shrink with an increase in the size of the dataset. For 200 and 500 datapoints, the confidence interval does not contain the true effect for many values of confounding!