# Causal Estimate
Using Halerium Graphs

Author: {{ cookiecutter.author_name }}
Created: {{ cookiecutter.timestamp }}

## How to use the notebook

The following cells:
- specify objective, variables, and variable types,
- read dataset,
- set up the causal structure,
- present results from the tests,

By default, the notebook is set up to run with an example (wine quality). To see how it works, run the notebook without changing the code.

For your project, adjust the code in the linked cells with your objectives, variables, dataset etc. and then execute all cells in order.

Please refer to causal_estimate.board for detailed instructions. The headers in this notebook follow the cards on the board.

In [0]:
# <halerium id="2ee2df07-9220-46ca-aeb8-314fbb453b03">
# Link to causal_estimate.board
# </halerium id="2ee2df07-9220-46ca-aeb8-314fbb453b03">


## Imports

In [0]:
import numpy as np
import pandas as pd

from dowhy import CausalModel
import dowhy.causal_refuters as causal_refuters
import dowhy.datasets
import dowhy.api

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor

from statsmodels.api import OLS

import matplotlib.pyplot as plt

import warnings
warnings.simplefilter('ignore')

### 2. Import the Dataset

In [0]:
# <halerium id="81107ee7-188d-4398-8776-178cdcdbb917">
time_series = False
path = 'default example' # Specify the path of the data
# </halerium id="81107ee7-188d-4398-8776-178cdcdbb917">


Importing the dataset

In [0]:
if path =='default example':
    path = 'https://raw.githubusercontent.com/erium/halerium-example-data/main/hypothesis_testing/WineQT.csv'

if time_series:
    df = pd.read_csv(path, parse_dates=['date'], index_col = 'date')
else:
    df = pd.read_csv(path, sep=None)

Visualising the dataset

In [0]:
df

### 3. Model the Causal Structure
A png of the causal model will be generated

In [0]:
# Example: Does pH acidity levels affect the quality of the wine?
# <halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">
is_treatment_binary = False
# </halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">


In [0]:
# Simpler example, using backdoor methods
# <halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">
model= CausalModel(
        data = df,
        treatment='pH',
        outcome='quality',
        common_causes=['fixed acidity', 'volatile acidity'],
        effect_modifiers=['residual sugar'])
model.view_model()
# </halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">


In [0]:
# More complex example, using graph and iv
# <halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">
# causal_graph = """
# digraph {
# U[label="Unobserved Confounders"];
# sulphates->pH; residual_sugar->quality;
# fixed_acidity->pH; volatile_acidity -> pH;
# U->pH;U->quality;
# fixed_acidity->quality; volatile_acidity->quality; pH->quality; 
# }
# """

# model= CausalModel(
#         data = df,
#         graph=causal_graph.replace("\n", " "),
#         treatment='pH',
#         outcome='quality')
# </halerium id="34d8feb6-0c2f-4d8c-aafe-4cf0818d2bf4">
# model.view_model()

### 4. Identify, Estimate, Refute

#### Identify estimands

In [0]:
# Skips the warning asking about unobserved confounders
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
# <halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">
print(identified_estimand)
# </halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">


#### Estimation

In [0]:
estimate_methods = {}

#### Backdoor methods

In [0]:
from functions.causal_estimate import estimate_causal_effect

# <halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">
estimate_causal_effect(model, identified_estimand, 'linear', estimate_methods)
# </halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">


Propensity score methods
*Only for binary treatments

In [0]:
if is_treatment_binary:
        estimate_causal_effect(model, identified_estimand, 'strat', estimate_methods)
        estimate_causal_effect(model, identified_estimand, 'match', estimate_methods)
        estimate_causal_effect(model, identified_estimand, 'ipw', estimate_methods)

#### Instrumental Variable

In [0]:
estimate_causal_effect(model, identified_estimand, 'iv', estimate_methods)

Regression Discontinuity

In [0]:
estimate_causal_effect(model, identified_estimand, 'regdist', estimate_methods)

In [0]:
estimate_methods

#### Refuting the estimate
Refutation methods provide tests that every correct estimator should pass. 

So if an estimator fails the refutation test (p-value is <0.05), then it means that there is some problem with the estimator.

In [0]:
from functions.causal_estimate import refute_causal_estimate

refute_data = {}
# <halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">
refute_causal_estimate(model, identified_estimand, estimate_methods, refute_data, is_treatment_binary)
# </halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">


In [0]:
from functions.causal_estimate import show_refute_results

# <halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">
show_refute_results(refute_data)
# </halerium id="7c339497-523a-4c86-9382-6765cc7f73cc">
