## Sensitivity Analysis for Regression Models
Sensitivity analysis helps us examine how sensitive a result is against the possibility of unobserved confounding. Two methods are implemented:
1. [Cinelli & Hazlett's robustness value](https://carloscinelli.com/files/Cinelli%20and%20Hazlett%20(2020)%20-%20Making%20Sense%20of%20Sensitivity.pdf) 
    - This method only supports linear regression estimator. <br>
    - The partial R^2 of treatment with outcome shows how strongly confounders explaining all the residual outcome variation would have to be associated with the treatment to eliminate the estimated effect.<br>
    - The robustness value measures the minimum strength of association unobserved confounding should have with both treatment and outcome in order to change the conclusions.<br>
    - Robustness value close to 1 means the treatment effect can handle strong confounders explaining almost all residual variation of the treatment and the outcome.<br>
    - Robustness value close to 0 means that even very weak confounders can also change the results.<br>
    - Benchmarking examines the sensitivity of causal inferences to plausible strengths of the omitted confounders.<br>
    - This method is based on https://carloscinelli.com/files/Cinelli%20and%20Hazlett%20(2020)%20-%20Making%20Sense%20of%20Sensitivity.pdf 
2. [Ding & VanderWeele's E-Value](https://dash.harvard.edu/bitstream/handle/1/36874927/EValue_FinalSubmission.pdf)
    - This method supports linear and logistic regression. 
    - The E-value is the minimum strength of association on the risk ratio scale that an unmeasured confounder would need to have with both the treatment and the outcome, conditional on the measured covariates, to fully explain away a specific treatment-outcome association.
    - The minimum E-value is 1, which means that no unmeasured confounding is needed to explain away the observed association (i.e. the confidence interval crosses the null).
    - Higher E-values mean that stronger unmeasured confounding is needed to explain away the observed association. There is no maximum E-value. 
    - [McGowan & Greevy Jr's](https://arxiv.org/abs/2011.07030) benchmarks the E-value against the measured confounders. 

### Step 1: Load required packages

In [None]:
import os, sys
sys.path.append(os.path.abspath("../../../"))
import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np
import dowhy.datasets 

# Config dict to set the logging level
import logging.config
DEFAULT_LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'loggers': {
        '': {
            'level': 'ERROR',
        },
    }
}

logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning
#warnings.filterwarnings(action='ignore', category=DataConversionWarning)

### Step 2: Load the dataset 
We create a dataset with linear relationships between common causes and treatment, and common causes and outcome. Beta is the true causal effect.

In [None]:
np.random.seed(100) 
data = dowhy.datasets.linear_dataset( beta = 10,
                                      num_common_causes = 7,
                                      num_samples = 500,
                                      num_treatments = 1,
                                     stddev_treatment_noise =10,
                                     stddev_outcome_noise = 5
                                    )

In [None]:
model = CausalModel(
            data=data["df"],
            treatment=data["treatment_name"],
            outcome=data["outcome_name"],
            graph=data["gml_graph"],
            test_significance=None,
        )
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
data['df'].head()

### Step 3: Create Causal Model
Remove one of the common causes to simulate unobserved confounding

In [None]:
data["df"] = data["df"].drop("W4", axis = 1)
graph_str = 'graph[directed 1node[ id "y" label "y"]node[ id "W0" label "W0"] node[ id "W1" label "W1"] node[ id "W2" label "W2"] node[ id "W3" label "W3"]  node[ id "W5" label "W5"] node[ id "W6" label "W6"]node[ id "v0" label "v0"]edge[source "v0" target "y"]edge[ source "W0" target "v0"] edge[ source "W1" target "v0"] edge[ source "W2" target "v0"] edge[ source "W3" target "v0"] edge[ source "W5" target "v0"] edge[ source "W6" target "v0"]edge[ source "W0" target "y"] edge[ source "W1" target "y"] edge[ source "W2" target "y"] edge[ source "W3" target "y"] edge[ source "W5" target "y"] edge[ source "W6" target "y"]]'
model = CausalModel(
            data=data["df"],
            treatment=data["treatment_name"],
            outcome=data["outcome_name"],
            graph=graph_str,
            test_significance=None,
        )
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
data['df'].head()

### Step 4: Identification

In [None]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

### Step 5: Estimation
Currently only Linear Regression estimator is supported for Linear Sensitivity Analysis

In [None]:
estimate = model.estimate_effect(identified_estimand,method_name="backdoor.linear_regression")
print(estimate)

### Step 6a: Refutation and Sensitivity Analysis - Method 1
<b>identified_estimand</b>: An instance of the identifiedEstimand class that provides the information with respect to which causal pathways are employed when the treatment effects the outcome<br>
<b>estimate</b>: An instance of CausalEstimate class. The estimate obtained from the estimator for the original data.<br>
<b>method_name</b>: Refutation method name <br>
<b>simulation_method</b>: "linear-partial-R2" for Linear Sensitivity Analysis<br>
<b>benchmark_common_causes</b>: Name of the covariates used to bound the strengths of unobserved confounder<br>
<b>percent_change_estimate</b>: It is the percentage of reduction of treatment estimate that could alter the results (default = 1) if percent_change_estimate = 1, the robustness value describes the strength of association of confounders with treatment and outcome in order to reduce the estimate by 100% i.e bring it down to 0. <br>
<b>confounder_increases_estimate</b>: confounder_increases_estimate = True implies that confounder increases the absolute value of estimate and vice versa. Default is confounder_increases_estimate = False i.e. the considered confounders pull estimate towards zero<br>
<b>effect_fraction_on_treatment</b>: Strength of association between unobserved confounder and treatment compared to benchmark covariate<br>
<b>effect_fraction_on_outcome</b>: Strength of association between unobserved confounder and outcome compared to benchmark covariate<br>
<b>null_hypothesis_effect</b>: assumed effect under the null hypothesis (default = 0) <br>
<b>plot_estimate</b>: Generate contour plot for estimate while performing sensitivity analysis. (default = True). 
                              To override the setting, set plot_estimate = False.

In [None]:
refute = model.refute_estimate(identified_estimand, estimate ,
                               method_name = "add_unobserved_common_cause",
                               simulation_method = "linear-partial-R2", 
                               benchmark_common_causes = ["W3"],
                               effect_fraction_on_treatment = [ 1,2,3]
                              )

The x axis shows hypothetical partial R2 values of unobserved confounder(s) with the treatment. The y axis shows hypothetical partial R2 of unobserved confounder(s) with the outcome.<br>
The contour levels represent adjusted t-values or estimates for unobserved confounders with hypothetical partialR2 values when these would be included in full regression model. <br>
The red line is the critical threshold: confounders with such strength or stronger are sufficient to invalidate the research conclusions.

In [None]:
refute.stats

In [None]:
refute.benchmarking_results

##### Parameter List for plot function
<b>plot_type</b>: "estimate" or "t-value" <br>
<b>critical_value</b>: special reference value of the estimate or t-value that will be highlighted in the plot<br>
<b>x_limit</b>: plot's maximum x_axis value (default = 0.8) <br>
<b>y_limit</b>: plot's minimum y_axis value (default = 0.8) <br>
<b>num_points_per_contour</b>: number of points to calculate and plot each contour line (default = 200) <br>
<b>plot_size</b>: tuple denoting the size of the plot (default = (7,7))<br>
<b>contours_color</b>: color of contour line (default = blue)<br>
                               String or array. If array, lines will be plotted with the specific color in ascending order.<br>
    <b>critical_contour_color</b>: color of threshold line (default = red)<br>
    <b>label_fontsize</b>: fontsize for labelling contours (default = 9)<br>
    <b>contour_linewidths</b>: linewidths for contours (default = 0.75)<br>
    <b>contour_linestyles</b>: linestyles for contours (default = "solid")
                                   See : https://matplotlib.org/3.5.0/gallery/lines_bars_and_markers/linestyles.html<br>
    <b>contours_label_color</b>: color of contour line label (default = black)<br>
    <b>critical_label_color</b>: color of threshold line label (default = red)<br>
    <b>unadjusted_estimate_marker</b>: marker type for unadjusted estimate in the plot (default = 'D')
                                           See: https://matplotlib.org/stable/api/markers_api.html <br><b>unadjusted_estimate_color</b>: marker color for unadjusted estimate in the plot (default = "black")<br>
    <b>adjusted_estimate_marker</b>: marker type for bias adjusted estimates in the plot (default = '^')<br><b>adjusted_estimate_color</b>: marker color for bias adjusted estimates in the plot (default = "red")<br>
    <b>legend_position</b>:tuple denoting the position of the legend (default = (1.6, 0.6))<br>

In [None]:
refute.plot(plot_type = 't-value')

The t statistic is the coefficient divided by its standard error. The higher the t-value, the greater the evidence to reject the null hypothesis. <br>
According to the above plot,at 5% significance level, the null hypothesis of zero effect would be rejected given the above confounders. 

In [None]:
print(refute)

### Step 6b: Refutation and Sensitivity Analysis - Method 2
<b>simulated_method_name</b>: "e-value" for E-value

#### Parameter List for plot function
- <b>num_points_per_contour</b>: number of points to calculate and plot for each contour (Default = 200)
- <b>plot_size</b>: size of the plot (Default = (6.4,4.8))
- <b>contour_colors</b>: colors for point estimate and confidence limit contour (Default = ["blue", "red])
- <b>xy_limit</b>: plot's maximum x and y value. Default is 2 x E-value. (Default = None)

In [None]:
refute = model.refute_estimate(identified_estimand, estimate ,
                               method_name = "add_unobserved_common_cause",
                               simulation_method = "e-value", 
                              )

- The x axis shows hypothetical values of the risk ratio of an unmeasured confounder at different levels of the treatment. The y axis shows hypothetical values of the risk ratio of the outcome at different levels of the confounder.
- Points lying on or above the blue contour represent combinations of these risk ratios that would tip (i.e. explain away) the point estimate. 
- Points lying on or above the red contour represent combinations of these risk ratios that would tip (i.e. explain away) the confidence interval.
- The green points are Observed Covariate E-values. These measure how much the limiting bound of the confidence interval changes on the E-value scale after a specific covariate is dropped and the estimator is re-fit. The covariate that corresponds to the largest Observed Covariate E-value is labeled. 

In [None]:
refute.stats

In [None]:
refute.benchmarking_results

In [None]:
print(refute)

#### Sensitivity Analysis for dataset with no confounders
We now run the sensitivity analysis for the same dataset but without dropping any variable. <br>
We get a robustness value goes from 0.55 to 0.95 which means that treatment effect can handle strong confounders explaining  almost all residual variation of the treatment and the outcome. <br>

In [None]:
np.random.seed(100) 
data = dowhy.datasets.linear_dataset( beta = 10,
                                      num_common_causes = 7,
                                      num_samples = 500,
                                      num_treatments = 1,
                                     stddev_treatment_noise=10,
                                     stddev_outcome_noise = 1
                                    )

In [None]:
model = CausalModel(
            data=data["df"],
            treatment=data["treatment_name"],
            outcome=data["outcome_name"],
            graph=data["gml_graph"],
            test_significance=None,
        )
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
data['df'].head()

In [None]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

In [None]:
estimate = model.estimate_effect(identified_estimand,method_name="backdoor.linear_regression")
print(estimate)

In [None]:
refute = model.refute_estimate(identified_estimand, estimate ,
                               method_name = "add_unobserved_common_cause",
                               simulation_method = "linear-partial-R2", 
                               benchmark_common_causes = ["W3"],
                               effect_fraction_on_treatment = [ 1,2,3])

In [None]:
refute.plot(plot_type = 't-value')

In [None]:
print(refute)

In [None]:
refute.stats

In [None]:
refute.benchmarking_results