## Undirected Graphical Models

A set of r.v. described by an undirected graph. The edges (undirected) represent __probabilistic interactions__ between neighboring variables (as opposed to conditional dependence in DAGM). 

### Definitions
__Global Markov Property $G$__ - $X_A\perp X_B \mid X_C$ IFF $X_C$ separated $X_A$ from $X_B$, i.e. there is no path in the graph between $A$ and $B$ that doesn't go through $X_C$

__Local Markov Property (Markov Blanket) $L$__ - The set of nodes that renders a node $t$ conditionally dependent all the other nodes in the graph 
$$t\perp (V-cl(t)) | mv(t)$$
where $cl(t) = mb(t)\cup t$ is the closure.

__Pairwise (Markov) Property $P$__ - The set of nodes that renders two nodes, $s,t$, conditionally independent of each other. 
$$s\perp t \mid (V - \{s,t\}) \Leftrightarrow G(s,t)=0$$
$G$ is the number of edges between two nodes

# Exact Inference

## Problem Setting
Let $X_E$ be the observed evident, $X_F$ be the unobserved variable we want to infer, $X_R = X - \{X_F, X_E\}$ be all the remaining variables. 

Then, in a probabilistic graphical models, we need to marginalize out all $X_R$, focusing on the joint distribution over evidence and subject of inference.
$$p(X_F, X_E) = \sum_{X_R} p(X_F, X_E, X_R)$$
and the inference will be 
$$p(X_F|X_E) = p(\frac{X_F, X_E}{X_E}) = \frac{p(X_F, X_E)}{\sum_{X_F}p(X_F, X_E)}$$
and the conditional distribution 
$$p(X_E) = \sum_{X_F, X_R} p(X_F, X_E, X_R)$$
However, if $|X_R|$ are large, then this computing is huge, for continuous variables, integrating is even more computational intensive. 

## Variable Elimination
### Simple Example: Chain
Consider a chain $A\rightarrow B \rightarrow C\rightarrow D$ and $X_F = \{D\}, X_E = \{\}, X_R = \{A,B,C\}$
Note that $p(A,B,C,D) = p(A)p(B|A)p(C|B)p(D|C)$ so that 
\begin{align*}
p(D) &= \sum_{A,B,C}p(A,B,C,D) \\
&= \sum_C\sum_B\sum_A p(A)p(B|A)p(C|B)p(D|C)&\sim O(k^3)\\
&= \sum_Cp(D|C)\sum_Bp(C|B)\sum_Ap(A)p(B|A)\\
&= \sum_C p(D|C)\sum_Bp(C|B)p(B)\\
&= \sum_C p(D|C)p(C)&\sim O(nk^2)
\end{align*}

### Intermediate Factors
Consider the distribution given by 
$$p(X,A,B,C) = p(X)p(A|X)p(B|A)p(C|B,X)$$

Define __factor__ $\phi$ which are not necessarily normalized distributions, but which describes the local relationship between r.v.'s. 
\begin{align*}
p(A,B,C) &= \sum_X p(X)p(A|X)p(B|A)p(C|B,X)\\
&= \sum_{X}\phi(X) \phi(A,X)\phi(A,B)\phi(X,B,C)\\
&= \phi(A,B)\sum_X \phi(X)\phi(A,X)\phi(X,B,C)\\
&= \phi(A,B) \tau(A,B,C)
\end{align*}

### Sum-product Inference
Computing $P(Y)$ for directed and undirected models is given by __sum-product__ inference algorithm 
$$\tau(Y) = \sum_z\prod_{\phi\in \Phi}\phi(z_{Scope[\phi]\cap Z}, y_{Scope[\phi]\cap Y}), \forall Y$$
where $\Phi$ is a set of potential factors. 

### Complexity of Variable Elimination Ordering
The complexity is 
$$O(mk^{N_{max}})$$
- $m = |X_R|$ is the number of initial factors
- $k$ is the number of states each random variable takes (assuming equal)
- $N_{max} = \arg\max_i N_i$ where $N_i$ is the number of r.v. inside each sum $\sum_i$