# Chapter 3: Building a Causal Graphical Model

This chapter covers:

- Building a causal DAG to model a Data Generating Process (DGP)
- Using your causal graph as a communication, computation, and reasoning tool
- Building a causal DAG in pgmpy and Pyro
- Training a probabilistic machine learning model using the causal DAG as a scaffold

## The Data Generating Process (DGP)

A **Data Generating Process (DGP)** is the underlying mechanism that produces observed data. In causal inference, we model the DGP to understand not just correlations, but the actual causal relationships between variables.

Below is a simple example: the "broken window" scenario. Jenny and Brian may each throw a rock at a window. The window breaks if the combined impact exceeds the window's strength. This function represents the *true* causal mechanism—it shows how causes (throwing rocks) produce effects (broken window).

In [1]:
def true_dgp(jenny_inclination, brian_inclination, window_strength):
    jenny_throws_rock = jenny_inclination > 0.5
    brian_throws_rock = brian_inclination > 0.5
    if jenny_throws_rock and brian_throws_rock:
        strength_of_impact = 0.8
    elif jenny_throws_rock or brian_throws_rock:
        strength_of_impact = 0.6
    else:
        strength_of_impact = 0.0
    window_breaks = strength_of_impact > window_strength
    return jenny_throws_rock, brian_throws_rock, window_breaks

## Building a Causal DAG in pgmpy

A **Directed Acyclic Graph (DAG)** represents causal relationships as directed edges between nodes. Each edge `(A, B)` means "A causes B" or "A directly influences B."

We'll model a transportation survey with the following variables:
- **A** (Age): young, adult, old
- **S** (Sex): M, F
- **E** (Education): high school, university
- **O** (Occupation): employed, self-employed
- **R** (Residence size): small, big
- **T** (Transportation): car, train, other

The causal structure encodes our domain knowledge:
- Age and Sex influence Education level
- Education influences both Occupation and Residence size
- Occupation and Residence size together influence Transportation choice

```
    A     S
     \   /
       E
      / \
     O   R
      \ /
       T
```

In [18]:
from pgmpy.models import DiscreteBayesianNetwork

model = DiscreteBayesianNetwork(
    [("A", "E"), ("S", "E"), ("E", "O"), ("E", "R"), ("O", "T"), ("R", "T")]
)


## Loading Observational Data

Now we load survey data that was generated according to (or at least consistent with) our causal model. Each row represents one person's responses.

In [19]:
import pandas as pd
url='https://raw.githubusercontent.com/altdeep/causalML/master/datasets/transportation_survey.csv'
data = pd.read_csv(url)
data

Unnamed: 0,A,S,E,O,R,T
0,adult,F,high,emp,small,train
1,young,M,high,emp,big,car
2,adult,M,uni,emp,big,other
3,old,F,uni,emp,big,car
4,young,F,uni,emp,big,car
...,...,...,...,...,...,...
495,young,M,high,emp,big,other
496,adult,M,high,emp,big,car
497,young,M,high,emp,small,train
498,young,M,high,emp,small,car


## Learning Conditional Probability Distributions (CPDs)

The `fit()` method learns the **Conditional Probability Distributions (CPDs)** from data. In a Bayesian network, each node has a CPD that specifies:

- For **root nodes** (no parents): P(X) — the marginal distribution
- For **child nodes**: P(X | Parents(X)) — the conditional distribution given parents

These CPDs are also called **Causal Markov Kernels** because they encode the local causal mechanism at each node. Together, they define the full joint distribution:

$$P(A, S, E, O, R, T) = P(A) \cdot P(S) \cdot P(E|A,S) \cdot P(O|E) \cdot P(R|E) \cdot P(T|O,R)$$

This factorization follows from the **Causal Markov Condition**: each variable is independent of its non-descendants given its parents.

In [21]:
model.fit(data)
causal_markov_kernels = model.get_cpds()
print(causal_markov_kernels)

INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'A': 'C', 'S': 'C', 'E': 'C', 'O': 'C', 'R': 'C', 'T': 'C'}


[<TabularCPD representing P(A:3) at 0x12a87aa50>, <TabularCPD representing P(E:2 | A:3, S:2) at 0x12a87bad0>, <TabularCPD representing P(S:2) at 0x12a879850>, <TabularCPD representing P(O:2 | E:2) at 0x12a87a270>, <TabularCPD representing P(R:2 | E:2) at 0x12a878ef0>, <TabularCPD representing P(T:3 | O:2, R:2) at 0x12a87a3f0>]


## Examining a Conditional Probability Table

Let's look at the CPD for Transportation (T), which depends on Occupation (O) and Residence size (R).

This table shows P(T | O, R) — the probability of each transportation mode given a person's occupation and residence size. For example:
- Employed people with big residences mostly drive cars (~70%)
- Self-employed people with small residences exclusively drive cars (100%)

Each column sums to 1.0, representing a valid probability distribution for that combination of parent values.

In [22]:
cmk_T = causal_markov_kernels[-1]
print(cmk_T)

+----------+---------------------+-----+--------------------+----------+
| O        | O(emp)              | ... | O(self)            | O(self)  |
+----------+---------------------+-----+--------------------+----------+
| R        | R(big)              | ... | R(big)             | R(small) |
+----------+---------------------+-----+--------------------+----------+
| T(car)   | 0.7034313725490197  | ... | 0.4444444444444444 | 1.0      |
+----------+---------------------+-----+--------------------+----------+
| T(other) | 0.13480392156862744 | ... | 0.3333333333333333 | 0.0      |
+----------+---------------------+-----+--------------------+----------+
| T(train) | 0.16176470588235295 | ... | 0.2222222222222222 | 0.0      |
+----------+---------------------+-----+--------------------+----------+


## Summary

In this notebook, we:

1. **Defined a DGP** as a function that encodes causal mechanisms
2. **Built a causal DAG** using pgmpy's `DiscreteBayesianNetwork`, specifying directed edges that represent causal relationships
3. **Learned CPDs from data** using the `fit()` method, which estimates conditional probabilities from observed frequencies
4. **Examined the learned parameters** to understand how parent variables influence child variables

### Key Concepts

- **Causal DAG**: A directed acyclic graph where edges represent direct causal influence
- **Causal Markov Condition**: Each variable is independent of its non-descendants given its parents
- **CPD/Causal Markov Kernel**: The local conditional distribution P(X | Parents(X)) at each node
- **Factorization**: The joint distribution factors into a product of CPDs following the graph structure

### Why Causal Graphs Matter

Unlike purely statistical models that capture correlations, causal graphs let us:
- **Predict interventions**: What happens if we *set* a variable to a value (do-calculus)
- **Reason about counterfactuals**: What *would have* happened under different circumstances
- **Identify confounders**: Which variables we must control for to estimate causal effects