# ECE 493 - Probabilistic Reasoning and Decision Making

## Probability

1. *See [Probability and Statistics for Computer Science](http://jacobpark.me/Programming-Notebook/Books/Probability%20and%20Statistics%20for%20Computer%20Science/Probability%20and%20Statistics%20for%20Computer%20Science.html).*
2. *See [Stanford CS229 - Probability Review](https://ermongroup.github.io/cs228-notes/preliminaries/probabilityreview/).*

## Bayesian Networks

### Probabilistic Modeling

- **Bayesian Networks**: A directed graph $G = (V, E)$ with
    - A random variable $x_i$ for each node $i \in V$
    - A conditional probability distribution (CPD) $p(x_{i} \vert x_{A_i})$ per node, specifying the probability of $x_i$ conditioned on its parent's values (i.e. ancestors).
$$p(x_{1}, x_{2}, ..., x_{n}) = p(x_{1}) p(x_{2} \mid x_{1}) ... p(x_{n} \mid x_{n - 1}, ..., x_{2}, x_{1})$$
- **Compact Bayesian Network**: Reduce Large Arity Factors with Ancestor Variables $x_{A_i}$
$$p(x_{i} \mid x_{i - 1}, ..., x_{1}) = p(x_{i} \mid x_{A_i})$$
    - *Allow*: $x_{A_i} \subseteq \{x_{i - 1}, ..., x_{1}\}$
- **Probability Tables**: Discrete Variables and $p(x_{i} \mid x_{A_i})$ Factors
    - *Rows*: Values of $x_{A_i}$
    - *Columns*: Values of $x_i$
    - *Entries*: Probabilities $p(x_{i} \mid x_{A_i})$

#### Space Complexity of Bayesian Networks

- *Probability Table Space Complexity*: If each variable takes $d$ values and has at most $k$ ancestors: $O(d^{k + 1})$
- *Compact Bayesian Network Space Complexity*: $O(nd^{k + 1})$
- *Naive Bayesian Network Space Complexity*: $O(d^{n})$

#### Interpretation of Bayesian Networks

- A probability $p$ factorizes over a DAG $G$ if it can be decomposed into a product of factors, as specified by $G$.
- A Bayesian network represents a probability distribution formed via a product of smaller, locally conditional probability distributions (one for each variable).
- A Bayesian network introduces assumptions that certain variables are independent.

#### Validity of Bayseian Networks

1. Is the probability non-negative?
2. Is the sum over all vaiables equivalent to one?
3. Is the directed graph acyclic?

### Graphical Representation

- Directed Acyclic Graphs
- *Vertices*: Variables $x_i$
- *Edges*: Dependency Relationships

### $3$-Variable Independencies in Directed Graphs

- Let $I(p)$ be the set of all independencies that hold for a joint distribution $p$.
- Let $G$ be a Bayesian network with three nodes: $A$, $B$, and $C$.

#### Common Parent

- If $G$ is of the form $A \leftarrow B \rightarrow C$,
    - If $B$ is observed, then $A \perp C \mid B$
    - If $B$ is unobserved, then $A \not\perp C$
- **Intuition**: $B$ contains all the information that determines the outcomes of $A$ and $C$; once it is observed, there is nothing else that affects $A$'s and $C$s' outcomes.


#### Cascade

- If $G$ equals $A \rightarrow B \rightarrow C$,
    - If $B$ is observed, then $A \perp C \mid B$
    - If $B$ is unobserved, then $A \not\perp C$
- **Intuition**: $B$ contains all the information that determines the outcomes of $C$; once it is observed, there is nothing else that affects $C$'s outcomes.
    
#### V-Structure

- If $G$ is $A \rightarrow C \leftarrow B$, then knowing $C$ couples $A$ and $B$.
    - If $C$ is unobserved, then $A \perp B$
    - If $C$ is observed, then $A \not\perp B \mid C$

### $n$-Variable Independencies in Directed Graphs

#### $d$-separation (Directed)

- $Q$ and $W$ are $d$-separated when variables $O$ are observed if they are not connected by an active path.

#### Active Path

- An undirected path in the Bayesian Network structure $G$ is called active given observed variables $O$ if for every consecutive triple of variables $X$, $Y$, $Z$ on the path, one of the following holds:
    - $X \leftarrow Y \leftarrow Z$, and $Y$ is unobserved $Y \not\in O$
    - $X \rightarrow Y \rightarrow Z$, and $Y$ is unobserved $Y \not\in O$
    - $X \leftarrow Y \rightarrow Z$, and $Y$ is unobserved $Y \not\in O$
    - $X \rightarrow Y \leftarrow Z$, and $Y$ or any of its descendants are observed

### Independence Maps

- Let $I(G) = \{(X \perp Y \mid Z) : X, Y \text{ are } d\text{-sep given } Z\}$ be a set of variables that are $d$-separated in $G$.
- If $p$ factorizes over $G$, then $I(G) \subseteq I(p)$, then $G$ is an $I$-map for $p$.

### Equivalence of Bayesian Networks

- $G_1$ and $G_2$ are $I$-equivalent...
    - If they encode the same dependencies: $I(G_1) = I(G_2)$.
    - If they have the same skeleton and the same v-structures.
    - If the $d$-separation between variables is the same.