# EpiRust DAG Analysis Demo

This notebook demonstrates causal inference and DAG (Directed Acyclic Graph) analysis capabilities in EpiRust, inspired by R's `dagitty`. We'll explore:

1. Creating and visualizing DAGs
2. Finding adjustment sets
3. Analyzing causal paths
4. Testing conditional independence

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from epirust.dag import DAGAnalyzer

# Set random seed for reproducibility
np.random.seed(42)

## Example 1: Classic Confounding

Let's create a classic confounding scenario in epidemiology: the relationship between coffee consumption and cancer, confounded by smoking.

In [None]:
# Create a DAG
dag = DAGAnalyzer.new()
dag.add_node("smoking", {"type": "confounder"})
dag.add_node("coffee", {"type": "exposure"})
dag.add_node("cancer", {"type": "outcome"})

# Add edges
dag.add_edge("smoking", "coffee")
dag.add_edge("smoking", "cancer")
dag.add_edge("coffee", "cancer")

# Visualize the DAG
dag.plot()

## Find Minimal Adjustment Sets

To estimate the causal effect of coffee on cancer, we need to adjust for confounders:

In [None]:
# Find minimal adjustment sets
adjustment_sets = dag.find_adjustment_sets("coffee", "cancer")
print("Minimal adjustment sets:")
for adj_set in adjustment_sets:
    print(f"- {adj_set}")

## Example 2: Mediation Analysis

Let's analyze a mediation scenario: the effect of diet on heart disease, mediated by blood pressure.

In [None]:
# Create mediation DAG
med_dag = DAGAnalyzer.new()
med_dag.add_node("diet", {"type": "exposure"})
med_dag.add_node("blood_pressure", {"type": "mediator"})
med_dag.add_node("heart_disease", {"type": "outcome"})
med_dag.add_node("age", {"type": "confounder"})

# Add edges
med_dag.add_edge("diet", "blood_pressure")
med_dag.add_edge("blood_pressure", "heart_disease")
med_dag.add_edge("diet", "heart_disease")
med_dag.add_edge("age", "diet")
med_dag.add_edge("age", "heart_disease")

# Visualize
med_dag.plot()

In [None]:
# Analyze direct and indirect effects
direct_effect = med_dag.get_paths("diet", "heart_disease", exclude_through=["blood_pressure"])
indirect_effect = med_dag.get_paths("diet", "heart_disease", must_pass_through=["blood_pressure"])

print("Direct effect paths:")
for path in direct_effect:
    print(f"- {' -> '.join(path)}")

print("\nIndirect effect paths:")
for path in indirect_effect:
    print(f"- {' -> '.join(path)}")

## Example 3: Instrumental Variables

Let's explore an instrumental variable analysis scenario, common in epidemiological studies.

In [None]:
# Create IV DAG
iv_dag = DAGAnalyzer.new()
iv_dag.add_node("genetic_variant", {"type": "instrument"})
iv_dag.add_node("alcohol_consumption", {"type": "exposure"})
iv_dag.add_node("blood_pressure", {"type": "outcome"})
iv_dag.add_node("lifestyle", {"type": "confounder"})

# Add edges
iv_dag.add_edge("genetic_variant", "alcohol_consumption")
iv_dag.add_edge("alcohol_consumption", "blood_pressure")
iv_dag.add_edge("lifestyle", "alcohol_consumption")
iv_dag.add_edge("lifestyle", "blood_pressure")

# Visualize
iv_dag.plot()

In [None]:
# Verify IV assumptions
print("IV Relevance:", iv_dag.has_path("genetic_variant", "alcohol_consumption"))
print("IV Independence:", not iv_dag.has_path("genetic_variant", "blood_pressure", 
                                              exclude_through=["alcohol_consumption"]))
print("IV Exclusion:", not iv_dag.has_unblocked_path("genetic_variant", "blood_pressure", 
                                                      ["alcohol_consumption"]))

## Example 4: Time-Varying Confounding

Finally, let's examine a scenario with time-varying confounding, common in longitudinal studies.

In [None]:
# Create time-varying DAG
tv_dag = DAGAnalyzer.new()

# Add nodes for different time points
for t in [1, 2]:
    tv_dag.add_node(f"treatment_{t}", {"type": "exposure", "time": t})
    tv_dag.add_node(f"confounder_{t}", {"type": "confounder", "time": t})
    tv_dag.add_node(f"outcome_{t}", {"type": "outcome", "time": t})

# Add edges
# Within time 1
tv_dag.add_edge("confounder_1", "treatment_1")
tv_dag.add_edge("confounder_1", "outcome_1")
tv_dag.add_edge("treatment_1", "outcome_1")

# Time 1 to time 2
tv_dag.add_edge("confounder_1", "confounder_2")
tv_dag.add_edge("treatment_1", "treatment_2")
tv_dag.add_edge("outcome_1", "outcome_2")

# Within time 2
tv_dag.add_edge("confounder_2", "treatment_2")
tv_dag.add_edge("confounder_2", "outcome_2")
tv_dag.add_edge("treatment_2", "outcome_2")

# Visualize
tv_dag.plot()

In [None]:
# Analyze time-varying effects
for t in [1, 2]:
    print(f"\nTime point {t}:")
    adjustment_sets = tv_dag.find_adjustment_sets(f"treatment_{t}", f"outcome_{t}")
    print(f"Adjustment sets for treatment effect at time {t}:")
    for adj_set in adjustment_sets:
        print(f"- {adj_set}")

## Conclusion

This notebook demonstrated EpiRust's DAG analysis capabilities for various epidemiological scenarios:

1. Classic confounding analysis
2. Mediation analysis with direct and indirect effects
3. Instrumental variable analysis
4. Time-varying confounding

These tools help researchers:
- Identify proper adjustment sets
- Validate causal assumptions
- Analyze complex causal pathways
- Handle time-varying relationships