# Causal inference in python

It is critical to correctly predict and understand the causal effects of these interventions. 

Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal reasoning.

## The need for causal inference

Predictive models uncover patterns that connect the inputs and outcome in observed data.

To intervene, however, we need to estimate the effect of changing an input from its current value, for which no data exists.

Such questions, involving estimating a counterfactual, are common in decision-making scenarios.

- __Will it work?__
    - _Does a proposed change to a system improve people’s outcomes?_

- __Why did it work?__
    - _What led to a change in a system’s outcome?_

- __What shoud we do?__
    - _What changes to a system are likely to improve outcomes for people?_

- __What the overall effect?__
    - _How does the system interact with human behavior?; What is the effect of a system’s recommendations on people’s activity?_
    
Answering these questions requires __causal reasoning__.

In [1]:
import os, sys
sys.path.append('/home/romain/Documents/codes/synchro/dowhy')

import numpy as np
import pandas as pd

import dowhy
from dowhy.do_why import CausalModel
import dowhy.datasets

In [2]:
data = dowhy.datasets.linear_dataset(
    beta=10,
    num_common_causes=5,
    num_instruments=2,
    num_samples=92,
    treatment_is_binary=True)
df = data["df"]

In [3]:
df

Unnamed: 0,Z0,Z1,X0,X1,X2,X3,X4,v,y
0,0.0,0.171664,2.750925,-0.179560,0.760997,-0.497898,1.327836,1.0,26.878091
1,0.0,0.339494,-0.670123,-0.963025,0.810983,0.599589,2.196018,1.0,11.135967
2,0.0,0.744906,-0.386434,-2.012331,1.285295,0.357852,0.919720,0.0,-3.227954
3,0.0,0.335891,1.814055,-0.804995,1.246481,-0.300542,-0.339240,1.0,19.218133
4,0.0,0.060123,-1.451885,-0.508208,1.812545,-0.978404,0.058383,0.0,-4.267763
5,0.0,0.755967,-2.261807,0.048376,0.200055,0.113202,0.651812,1.0,1.389858
6,0.0,0.498633,-0.928874,0.486958,0.379286,1.672187,0.594514,1.0,15.376126
7,1.0,0.750365,0.470555,2.099077,0.456135,-0.546311,0.195304,1.0,23.175422
8,0.0,0.186100,-1.349096,-0.350527,2.137715,-1.551084,-0.377975,0.0,-3.961116
9,0.0,0.435249,-0.898436,-0.129085,-0.516610,1.658553,1.079288,1.0,9.319136


In [4]:
df.head()

Unnamed: 0,Z0,Z1,X0,X1,X2,X3,X4,v,y
0,0.0,0.171664,2.750925,-0.17956,0.760997,-0.497898,1.327836,1.0,26.878091
1,0.0,0.339494,-0.670123,-0.963025,0.810983,0.599589,2.196018,1.0,11.135967
2,0.0,0.744906,-0.386434,-2.012331,1.285295,0.357852,0.91972,0.0,-3.227954
3,0.0,0.335891,1.814055,-0.804995,1.246481,-0.300542,-0.33924,1.0,19.218133
4,0.0,0.060123,-1.451885,-0.508208,1.812545,-0.978404,0.058383,0.0,-4.267763


In [5]:
data['dot_graph']

'digraph { v ->y; U[label="Unobserved Confounders"]; U->v; U->y;X0-> v; X1-> v; X2-> v; X3-> v; X4-> v;X0-> y; X1-> y; X2-> y; X3-> y; X4-> y;Z0-> v; Z1-> v;}'

In [6]:
# With graph
model = CausalModel(
    data=df,
    treatment=data["treatment_name"],
    outcome=data["outcome_name"],
    graph=data["dot_graph"],
)

Model to find the causal effect of treatment v on outcome y


In [7]:
model.view_model()

In [9]:
identified_estimand = model.identify_effect()
print(identified_estimand)

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:{'X0', 'Z1', 'X1', 'U', 'Z0', 'X2', 'X3', 'X4'}


{'observed': 'yes'}
{'observed': 'yes'}
{'observed': 'yes'}
{'label': 'Unobserved Confounders', 'observed': 'no'}
There are unobserved common causes. Causal effect cannot be identified.
WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y


INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z0', 'Z1']


Estimand type: ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X0,Z1,X1,Z0,X2,X3,X4))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X0,Z1,X1,Z0,X2,X3,X4,U) = P(y|v,X0,Z1,X1,Z0,X2,X3,X4)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z0)/Derivative(v, Z0))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z0,Z1)
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→v, then ¬(Z0,Z1→y)



In [10]:
causal_estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression")
print(causal_estimate)
print("Causal Estimate is " + str(causal_estimate.value))

LinearRegressionEstimator


INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X0+Z1+X1+Z0+X2+X3+X4


*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d                                      
──(Expectation(y|X0,Z1,X1,Z0,X2,X3,X4))
dv                                     
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X0,Z1,X1,Z0,X2,X3,X4,U) = P(y|v,X0,Z1,X1,Z0,X2,X3,X4)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z0)/Derivative(v, Z0))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z0,Z1)
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→v, then ¬(Z0,Z1→y)

## Realized estimand
b: y~v+X0+Z1+X1+Z0+X2+X3+X4
## Estimate
Value: 9.999999999999993

Causal Estimate is 9.999999999999993


In [11]:
# Without graph
model= CausalModel(
        data=df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        common_causes=data["common_causes_names"])



Model to find the causal effect of treatment v on outcome y


In [12]:
model.view_model()

In [13]:
identified_estimand = model.identify_effect()

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:{'X0', 'X1', 'U', 'X2', 'X3', 'X4'}


{'observed': 'yes'}
{'observed': 'yes'}
{'label': 'Unobserved Confounders', 'observed': 'no'}
There are unobserved common causes. Causal effect cannot be identified.
WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y


INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]


In [14]:
estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression",
        test_significance=True
        )
print(estimate)
print("Causal Estimate is " + str(estimate.value))

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X0+X1+X2+X3+X4


LinearRegressionEstimator
*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d                                
──(Expectation(y|X0,X1,X2,X3,X4))
dv                               
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X0,X1,X2,X3,X4,U) = P(y|v,X0,X1,X2,X3,X4)
### Estimand : 2
Estimand name: iv
No such variable found!

## Realized estimand
b: y~v+X0+X1+X2+X3+X4
## Estimate
Value: 10.0

## Statistical Significance
p-value: 0.008000000000000007

Causal Estimate is 10.0


In [15]:
res_random=model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
print(res_random)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X0+X1+X2+X3+X4+w_random


Refute: Add a Random Common Cause
Estimated effect:(10.0,)
New effect:(-1.1833754688326563,)



In [16]:
res_placebo=model.refute_estimate(identified_estimand, estimate,
        method_name="placebo_treatment_refuter", placebo_type="permute")
print(res_placebo)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~placebo+X0+X1+X2+X3+X4


Refute: Use a Placebo Treatment
Estimated effect:(10.0,)
New effect:(-1.3391101782696,)



In [17]:
res_subset=model.refute_estimate(identified_estimand, estimate,
        method_name="data_subset_refuter", subset_fraction=0.9)
print(res_subset)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X0+X1+X2+X3+X4


Refute: Use a subset of data
Estimated effect:(10.0,)
New effect:(-0.22918857020476516,)

