# Testing Assumptions in model with DoWhy: A simple example
This is a quick introduction to how we can test if our assumed graph is correct and the assumptions match with the dataset.
We do so by checking the conditional independences in the graph and see if they hold true for the data as well. Currently we are using partial correlation to do that. 

First, let us load all required packages.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import pandas as pd
import os, sys
sys.path.append(os.path.abspath("../../../"))
import dowhy
from dowhy import CausalModel
import dowhy.datasets 

## Step 1: Load dataset

In [None]:
data = dowhy.datasets.linear_dataset(beta=10,
        num_common_causes=5,
        num_instruments = 2,
        num_effect_modifiers=1,
        num_samples=5000, 
        treatment_is_binary=True,
        stddev_treatment_noise=10,
        num_discrete_common_causes=1)

df = data["df"] #Insert dataset here
df.head()

Note that we are using a pandas dataframe to load the data. At present, DoWhy only supports pandas dataframe as input.

## Step 2: Input causal graph

We now input a causal graph. You can do that in the GML graph format (recommended), DOT format or the output from daggity -
To create the causal graph for your dataset, you can use a tool like [DAGitty](http://dagitty.net/dags.html#) that provides a GUI to construct the graph. You can export the graph string that it generates.

In [None]:
graph_string = """dag {
        W0 [pos="-2.200,-1.520"]
        W1 [pos="-1.457,-1.533"]
        W2 [pos="-0.763,-1.547"]
        W3 [pos="1.041,-1.587"]
        W4 [pos="1.510,-1.560"]
        X0 [pos="1.222,-0.625"]
        Z0 [pos="0.390,-1.601"]
        Z1 [pos="-0.176,-1.540"]
        v0 [pos="-0.219,-0.881"]
        y [pos="-0.144,-0.296"]
        W0 -> v0
        W0 -> y
        W1 -> v0
        W1 -> y
        W2 -> v0
        W2 -> y
        W3 -> v0
        W3 -> y
        W4 -> v0
        W4 -> y
        X0 -> y
        Z0 -> v0
        Z1 -> v0
        v0 -> y
        }"""

## Step 3: Create Causal Model

In [None]:
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=graph_string
        )

In [None]:
model.view_model()

In [None]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

## Step 4: Testing for Conditional Independence
We can check if the assumptions of the graph hold true for the data using model.refute_graph(k, method_name,independence_constraints) 
We are testing X ⫫ Y | Z where X and Y are singular sets and Z can have k number of variables. 
Currently we are using "partial_correlation" method by default and k value is 1 by default unless input.

In [None]:
refuter_object = model.refute_graph(k=2, method_name = "partial_correlation")  #Change k parameter to test conditional independence given different number of variables 

In [None]:
print(refuter_object)

### Testing for a set of edges
We can also test a set of conditional independences whether they are true or not The input has to be in the form - <br>
[( x1, y1, (z1, z2)), <br>
 ( x2, y2, (z3, z4)),<br>
 ( x3, y3, (z5,)),<br>
 ( x4, y4, ())<br>
 ]

In [None]:
print(model.refute_graph(method_name = "partial_correlation",independence_constraints = [('W3', 'Z0', ()), ('W0', 'Z0', ()),('W3', 'Z0', ('W1','W2',))]))

## Testing with a wrong graph input

In [None]:
graph_string = """dag {
        W0 [pos="-2.200,-1.520"]
        W1 [pos="-1.457,-1.533"]
        W2 [pos="-0.763,-1.547"]
        W3 [pos="1.041,-1.587"]
        W4 [pos="1.510,-1.560"]
        X0 [pos="1.222,-0.625"]
        Z0 [pos="0.390,-1.601"]
        Z1 [pos="-0.176,-1.540"]
        v0 [pos="-0.219,-0.881"]
        y [pos="-0.144,-0.296"]
        W0 -> v0
        W0 -> y
        W1 -> v0
        W1 -> y
        W2 -> v0
        W2 -> y
        W3 -> v0
        W3 -> y
        W4 -> v0
        X0 -> Z0
        Z0 -> Z1
        }"""

In [None]:
model = CausalModel(
            data=df,
            treatment=data["treatment_name"],
            outcome=data["outcome_name"],
            graph=graph_string,
            proceed_when_unidentifiable=True,
            test_significance=None
        )

In [None]:
model.view_model()

In [None]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

In [None]:
refuter_object = model.refute_graph(k=1,method_name = "partial_correlation")

We can see that since we input the wrong graph, many conditional independences were not met

In [None]:
print(refuter_object)