# Definition

A structured probabilistic model or graphical model is a graph $G = <V,E>$.  
It can be used to represent the joint probability distribution over several variables, in a structured manner, in order to reduce the number of parameters needed to represent such a distribution.  

$V$ is the set of vertices, each vertex represent a random variable.  
$E$ is the set of edges, each edge represent the interaction between the 2 variables. The absence of edge between 2 vertices indicates that the 2 corresponding variables are independant given the other variables.

# Directed models

In directed models, the graph is directed, and each vertex contains a factor that is the probability distribution over $x_i$, given the parents of $x_i$ denoted $P_{a_G}(x_i)$:
$$p(x) = \prod_i p(x_i|P_{a_G}(x_i))$$

# Undirected models

In undirected models, the graph is undirected.  
They are also called Markov Random Field.  

Two vertices $X$ and $Y$ are adjacent is there is an edge between them: $X \sim Y$.  
A path $X_1,\text{...},X_n$ is a set of joined vertices; $X_{i-1} \sim X_i$ for $i=2,\text{...},n$.  
A complete graph is a graph with every pair of vertices joined by an edge.  
A subgraph $U \in V$ is a subject of vertices with their edges.  

In a markov graph, the absence of an edge implies that the random variables are conditionally independant given the other variables:
$$\text{No edge joining $X$ and $Y$} \iff X \perp Y | \text{rest}$$

$A,B,C$ subgraphs. $C$ separate $A$ and $B$ if every path between $A$ and $B$ intersects $C$.
$$\text{$C$ separates $A$ and $B$} \implies A \perp B | C$$

A clique is a complete subgraph. A clique is said maximal is no other vertices can be added to it and still yield a clique.  

A probability density function over $\mathcal{G}$ can be represented as:
$$f(x) = \frac{1}{Z} \prod_{i} \phi^{(i)}(C^{(i)})$$

Each $C^{(i)}$ is a maximal clique, it's an undirectd model associated with a factor $\phi^{(i)}(C^{(i)})$. These factors are just non-negative functions, but there is no garantee that they sum to $1$.  
They capture dependnce in $C^{(i)}$ by scoring certain configurations higher than others.  

In order to get a probability distribution, we divide by $Z$, the normilazing constant, also called the partition function:
$$Z = \sum_{x \in X} \prod_{i} \phi^{(i)}(C^{(i)})$$