# Mediators and Confounders

In a directed acyclic graph (DAG) representing a causal model, we can use the `backdoor criterion` to identify confounders and the `frontdoor criterion` to identify mediators. A set of variables $C$ satisifies the backdoor criterion relative to $T$ and $Y$ if the following are true:

- $C$ blocks all backdoor paths from $T$ to $Y$
- $C$ does not contain any descendants of $T$

A set of variables $M$ satistifes the frontdoor criterion relative to $T$ and $Y$ if the following are true:

- All paths from $T$ to $Y$ go through $M$
- There is no active backdoor path from $T$ to $M$
- All backdoor paths from $M$ to $Y$ are blocked by $T$

If $(T, C, Y)$ satisfy the backdoor criterion, then the backdoor adjustment formula may be used to estimate the causal effect of $T$ on $Y$.

- $P(y|do(t)) = \sum_c P(y|t,c) P(c)$

If $(T, M, Y)$ satisfy the frontdoor criterion, then the frontdoor adjustment formula may be used to estimate the causal effect of $T$ on $Y$.

- $P(y|do(t)) = \sum_m \big( P(m|t) \sum_t P(y|m,t) P(t) \big)$

In [1]:
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

C = BbnNode(Variable(0, 'confounder', ['false', 'true']), [0.8, 0.2])
T = BbnNode(Variable(1, 'treatment', ['false', 'true']), [0.8, 0.2, 0.2, 0.8])
M = BbnNode(Variable(2, 'mediator', ['false', 'true']), [0.75, 0.25, 0.1, 0.9])
Y = BbnNode(Variable(3, 'output', ['false', 'true']), [0.99, 0.01, 0.6, 0.4, 0.55, 0.45, 0.2, 0.8])

bbn = Bbn() \
    .add_node(C) \
    .add_node(T) \
    .add_node(M) \
    .add_node(Y) \
    .add_edge(Edge(C, T, EdgeType.DIRECTED)) \
    .add_edge(Edge(T, M, EdgeType.DIRECTED)) \
    .add_edge(Edge(C, Y, EdgeType.DIRECTED)) \
    .add_edge(Edge(M, Y, EdgeType.DIRECTED))

In [2]:
from pybbn.pptc.inferencecontroller import InferenceController

join_tree = InferenceController.apply(bbn)

In [3]:
import pandas as pd
from pybbn.sampling.sampling import LogicSampler

sampler = LogicSampler(bbn)
samples = sampler.get_samples(n_samples=1_000, seed=37)

df = pd.DataFrame(samples).rename(columns={0: 'C', 1: 'T', 2: 'M', 3: 'Y'})

In [4]:
df 

Unnamed: 0,C,T,M,Y
0,true,true,true,true
1,false,false,false,false
2,false,false,true,true
3,false,true,true,false
4,false,false,false,false
...,...,...,...,...
995,false,false,false,false
996,false,false,false,false
997,false,false,true,true
998,false,false,false,false


In [22]:
import itertools

def safe_divide(num, den):
    if den == 0:
        return 0
    return num / den

def get_filter(X, x):
    f = [f'{_X}=="{_x}"' for _X, _x in zip(X, x)]
    f = ' and '.join(f)
    return f

def get_domain_product_filters(df, X):
    prod = [sorted(df[x].unique()) for x in X]
    prod = itertools.product(*prod)
    prod = [' and '.join([f'{_x}=="{_v}"' for _x, _v in zip(X, tup)]) for tup in prod]
    return prod

def get_marg_prob(df, X, x):
    f = get_filter(X, x)
    N = df.shape[0]
    n = df.query(f).shape[0]
    return safe_divide(n, N)

def get_cond_prob(df, X, x, Y, y):
    num_f = get_filter(X + Y, x + y)
    den_f = get_filter(Y, y)

    n = df.query(num_f).shape[0]
    d = df.query(den_f).shape[0]
    return safe_divide(n, d)

def get_intv_prob(df, T, M, Y):
    # for m in get_domain_product_filters(df, 
    
get_marg_prob(df, ['T'], ['true'])
get_cond_prob(df, ['Y'], ['true'], ['M', 'T'], ['true', 'true'])
get_domain_product_filters(df, ['T'])
get_domain_product_filters(df, ['T', 'M'])
get_domain_product_filters(df, ['M'])

['M=="false"', 'M=="true"']

In [12]:
df['T'].unique()

array(['true', 'false'], dtype=object)