# DoWhy: Different estimation methods for causal inference
This is quick introduction to DoWhy causal inference library.
We will load in a sample dataset and use different methods for estimating causal effect from a (pre-specified)treatment variable to a (pre-specified) outcome variable.

First, let us add required path for python to find DoWhy code and load required packages.

In [1]:
import os, sys
sys.path.append(os.path.abspath("../../"))

In [2]:
import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy.do_why import CausalModel
import dowhy.datasets 

Let us first load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome. 

Beta is the true causal effect. 

In [3]:
data = dowhy.datasets.linear_dataset(beta=10,
        num_common_causes=5, 
        num_instruments = 2,
        num_samples=10000,
        treatment_is_binary=True)
df = data["df"]


Note that we are using a pandas dataframe to load the data.

## Identifying the causal estimand

We now input a causal graph in the DOT graph format.

In [4]:
# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["dot_graph"],
        instruments=data["instrument_names"],
        logging_level = logging.INFO
        )

Model to find the causal effect of treatment v on outcome y


In [5]:
model.view_model()

<img src="causal_model.png">

We get a causal graph. Now identification and estimation is done. 

In [6]:
identified_estimand = model.identify_effect()
print(identified_estimand)

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:{'X1', 'Z0', 'X2', 'X4', 'Unobserved Confounders', 'X0', 'X3', 'Z1'}


{'observed': 'yes'}
{'observed': 'yes'}
{'observed': 'yes'}
{'observed': 'yes'}
{'observed': 'no'}
There are unobserved common causes. Causal effect cannot be identified.
WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y


INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]


Estimand type: ate
### Estimand : 1
Estimand name: iv
No such variable found!
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X1,Z0,X2,X4,X0,X3,Z1))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X1,Z0,X2,X4,X0,X3,Z1,U) = P(y|v,X1,Z0,X2,X4,X0,X3,Z1)



## Method 1: Regression

Use linear regression.

In [7]:
causal_estimate_reg = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression",
        test_significance=True)
print(causal_estimate_reg)
print("Causal Estimate is " + str(causal_estimate_reg.value))

LinearRegressionEstimator


INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X1+Z0+X2+X4+X0+X3+Z1


*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
No such variable found!
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X1,Z0,X2,X4,X0,X3,Z1))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X1,Z0,X2,X4,X0,X3,Z1,U) = P(y|v,X1,Z0,X2,X4,X0,X3,Z1)

## Realized estimand
b: y~v+X1+Z0+X2+X4+X0+X3+Z1
## Estimate
Value: 10.000000000000018

## Statistical Significance
p-value: 0.0

Causal Estimate is 10.0


## Method 2: Stratification

We will be using propensity scores to stratify units in the data.

In [8]:
causal_estimate_strat = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_stratification")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator
INFO:dowhy.causal_estimator:b: y~v+X1+Z0+X2+X4+X0+X3+Z1


PropensityScoreStratificationEstimator
*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
No such variable found!
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X1,Z0,X2,X4,X0,X3,Z1))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X1,Z0,X2,X4,X0,X3,Z1,U) = P(y|v,X1,Z0,X2,X4,X0,X3,Z1)

## Realized estimand
b: y~v+X1+Z0+X2+X4+X0+X3+Z1
## Estimate
Value: 10.238528243078772

Causal Estimate is 10.2385282431


## Method 3: Matching

We will be using propensity scores to match units in the data.

In [9]:
causal_estimate_match = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_matching")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))

PropensityScoreMatchingEstimator


INFO:dowhy.causal_estimator:INFO: Using Propensity Score Matching Estimator
INFO:dowhy.causal_estimator:b: y~v+X1+Z0+X2+X4+X0+X3+Z1


*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
No such variable found!
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X1,Z0,X2,X4,X0,X3,Z1))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X1,Z0,X2,X4,X0,X3,Z1,U) = P(y|v,X1,Z0,X2,X4,X0,X3,Z1)

## Realized estimand
b: y~v+X1+Z0+X2+X4+X0+X3+Z1
## Estimate
Value: 8.991647331885279

Causal Estimate is 8.991647331885279


## Method 4: Weighting

We will be using (inverse) propensity scores to assign weights to units in the data.

In [10]:
causal_estimate_ipw = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_weighting")
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y~v+X1+Z0+X2+X4+X0+X3+Z1


PropensityScoreWeightingEstimator
*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
No such variable found!
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X1,Z0,X2,X4,X0,X3,Z1))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X1,Z0,X2,X4,X0,X3,Z1,U) = P(y|v,X1,Z0,X2,X4,X0,X3,Z1)

## Realized estimand
b: y~v+X1+Z0+X2+X4+X0+X3+Z1
## Estimate
Value: 17.074885660950077

Causal Estimate is 17.074885661


## Method 5: Instrumental Variable

We will be using Wald estimator for the provided instrumental variable.

In [11]:
causal_estimate_iv = model.estimate_effect(identified_estimand,
        method_name="iv.instrumental_variable", method_params={'iv_instrument_name':'Z1'})
print(causal_estimate_iv)
print("Causal Estimate is " + str(causal_estimate_iv.value))



InstrumentalVariableEstimator
*** Causal Estimate ***

## Target estimand
None
## Realized estimand
None
## Estimate
Value: None

Causal Estimate is None


## Method 6: Regression Discontinuity

We will be internally converting this to an equivalent instrumental variables problem.

In [12]:
causal_estimate_regdist = model.estimate_effect(identified_estimand,
        method_name="iv.regression_discontinuity", 
        method_params={'rd_variable_name':'Z1',
                       'rd_threshold_value':0.5,
                       'rd_bandwidth': 0.1})
print(causal_estimate_regdist)
print("Causal Estimate is " + str(causal_estimate_regdist.value))



RegressionDiscontinuityEstimator
*** Causal Estimate ***

## Target estimand
None
## Realized estimand
None
## Estimate
Value: None

Causal Estimate is None
