# Dual representation of Markov random fields

Markov random fields (MRFs) are quite good models for image textures as they naturally model local dependencies.

<img src = 'illustrations/markov-random-field-i.png' width=100%>


The celebrated Hammersley-Clifford theorem fixes the format in which the corresponding probability distribution must be sought:

\begin{align*}
p[\boldsymbol{x}|\omega]=\frac{1}{Z(\omega)}\cdot\exp\Biggl(-\sum_{c\in\textsf{MaxClique}}\Psi_c(\boldsymbol{x}_c,\omega)\Biggr) 
\end{align*} 
where 
* $\omega$ is the set of model parameters
* $Z(\omega)$ is a normalising constant
* $\textsf{MaxClique}$ is the set of maximal cliques in the Markov random field
* $\Psi_c$ is defined on the variables $x_i$ in the clique $c$ 

In the following we show that this formalisation leads to multivariate normal distribution. We explore how this formalisation is connected to pixel prediction formalisation in the notebook [02_texture_synthesis.ipynb](./02_texture_synthesis.ipynb). 

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import sklearn
import sys

from pandas import Series
from pandas import DataFrame
from typing import List,Tuple

from pandas import Categorical
from pandas.api.types import CategoricalDtype

from tqdm import tnrange#, tqdm_notebook
from plotnine import *

# Local imports
from common import *
from convenience import *

## I.  Normal distribution as a resulting probability assignment 

The Hammersley-Clifford theorem gives a lot of freedom how one can specify the full distribution $p[\boldsymbol{x}|\omega]$. 
In principle, we could search for optimal potential functions $\Psi_c$ by estimating 

\begin{align*}
\Pr[x_i|(x_j)_{j\in\mathsf{Neighbours}(X_i)}]
\end{align*}

for all pixels $x_i$ and then fix discrete sub-potentials $\Psi_c$ that lead to the estimated conditional probabilities. 
However, the amount of data needed to get reliable estimates for sub-potentials is immense. 
Hence, the classical approach is to severely restrict the shape of sub-potentials.


If we define all individual potentials $\Psi_c$ as quadratic forms over $\boldsymbol{x}_c$, the resulting distribution $p[\boldsymbol{x}|\omega]$ will be a multivariate normal distribution. 
The latter allows us to obtain analytical solutions for basic inference tasks that are generally doable with complex MCMC simulation algorithms.
The main aim of this tutorial is to demonstrate the remarkable properties of multivariate normal distribution that provide necessary formulae to get the analytical solution.   

### Four-dimensional multivariate normal distribution as a solution to 2 x 2 MRF 

The $2\times 2$ Markov random field has four edges that are also maximal cliques: $(X_1, X_2)$, $(X_2,X_3)$, $(X_3, X_4)$ and $(X_4, X_1)$.

<img src = 'illustrations/markov-random-field-iv.png' width=100%>

Let us consider the subpotential $\Psi_1$ corresponding to the first edge $(X_1,X_2)$. If we restrict $\Psi_1(x_1,x_2)$ to quadratic forms then the search space is

\begin{align*}
\Psi_1(x_1,x_2)= a_{11} x_1^2+2a_{12}x_1x_2+ a_{22}x_2^2\enspace
\end{align*}
for $a_{11},a_{12},a_{22}\in\mathbb{R}$. 
In most applications, we would like that $x_1\approx x_2$ and thus we can restrict the search space even more 

\begin{align*}
\Psi_1(x_1,x_2)= \alpha(x_{1}-x_2)^2=\alpha x_1^2-2\alpha x_1x_2+\alpha x_2^2\enspace
\end{align*}

for $\alpha\in\mathbb{R}^+$.
Analogous reasoning for the other edges leads to the following subpotentials

\begin{align*}
\Psi_1(x_1,x_2)&= \alpha_1(x_{1}-x_2)^2=\alpha_1 x_1^2-2\alpha_1 x_1x_2+\alpha_1 x_2^2\\
\Psi_2(x_2,x_3)&= \alpha_2(x_{2}-x_3)^2=\alpha_2 x_2^2-2\alpha_2 x_2x_3+\alpha_2 x_3^2\\
\Psi_3(x_3,x_4)&= \alpha_3(x_{3}-x_4)^2=\alpha_3 x_3^2-2\alpha_3 x_3x_4+\alpha_3 x_4^2\\
\Psi_4(x_4,x_1)&= \alpha_4(x_{4}-x_1)^2=\alpha_4 x_4^2-2\alpha_4 x_4x_1+\alpha_4 x_1^2\\
\end{align*}

and thus the probability of the entire MRF is

\begin{align*}
p[x_1,x_2,x_3,x_4|\alpha_1,\alpha_2,\alpha_3,\alpha_4]&=
\frac{1}{Z(\alpha_1,\alpha_2,\alpha_3,\alpha_4)}\cdot\exp\bigl(-\Psi(x_1,x_2,x_3,x_4)\bigr)\\
\end{align*}

where

\begin{align*}
\Psi(x_1,x_2,x_3,x_4)=
(\alpha_1+\alpha_4) x_1^2+
(\alpha_1+\alpha_2) x_2^2+
(\alpha_2+\alpha_3) x_3^2+
(\alpha_3+\alpha_4) x_4^2
-2\alpha_1 x_1x_2
-2\alpha_2 x_2x_3
-2\alpha_3 x_3x_4
-2\alpha_4 x_4x_1\enspace.
\end{align*}


If we write the potential energy $\Psi(\boldsymbol{x})$ using matrix operations we get

\begin{align*}
\Psi(\mathbf{x})=
\mathbf{x}^T
\begin{pmatrix}
\alpha_1+\alpha_4 & -\alpha_1 & 0 & -\alpha_4\\
-\alpha_1 &\alpha_1+\alpha_2 & -\alpha_2 & 0 \\
0 &-\alpha_2 &\alpha_2+\alpha_3 & -\alpha_3 \\
-\alpha_4 & 0 &-\alpha_3 &\alpha_3+\alpha_4 \\
\end{pmatrix}
\mathbf{x} = \mathbf{x}^T A\mathbf{x} 
\end{align*}
for a coefficient matrix $A$.
As a four-dimensional multivariate normal distribution has density 

\begin{align*}
p(\boldsymbol{x}|\boldsymbol{\mu},\boldsymbol{\Sigma})\propto \exp\Biggl(-\frac{1}{2}\cdot(\boldsymbol{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1} (\boldsymbol{x}-\boldsymbol{\mu})\Biggl)\enspace,
\end{align*}

we get that our probability assignment indeed corresponds to a multivariate normal distribution with parameters $\boldsymbol{\mu}=\boldsymbol{0}$ and $\boldsymbol{\Sigma}^{-1}=2\cdot A$.
Analogous derivation and shape matching with multivariate normal distribution can be done for any Markov random field. That is, based on some intuition we directly fix the inverse covariance matrix of the multivariate normal distribution and then determine the scaling factor $Z(\omega)$ directly from the density formula.      


## II. Linear regression model for smallest neighbourhood

Let us consider the the four pixel neighbourhood of a particular pixel
<img src = 'illustrations/markov-random-field-v.png' width=30%>

The linear model formulation implies 

\begin{align*}
y=\beta_1x_1+\beta_2x_2+ \beta_3x_3+\beta_4x_4+\beta_0+\varepsilon,
\qquad \varepsilon\sim\mathcal{N}(0,\sigma)
\end{align*}

The Hammersley-Clifford theorem gives 

\begin{align}
p[x_1,x_2,x_3,x_4, y]\propto\mathrm{exp}\left(-\psi(x_1,y)-\psi_2(x_2,y)-\psi_3(x_3,y)-\psi_4(x_4, y)\right)
\end{align}

from which we can conclude 

\begin{align*}
 p[y|x_1,x_2,x_3,x_4,]&=\frac{p[y,x_1,x_2,x_3,x_4]}{p[x_1,x_2,x_3,x_4]}
 \propto p[y,x_1,x_2,x_3,x_4]
\end{align*}
Consequently we can espress

\begin{align*}
 p[y|x_1,x_2,x_3,x_4,]&\propto \mathrm{exp}\left(-\psi(x_1,y)-\psi_2(x_2,y)-\psi_3(x_3,y)-\psi_4(x_4, y)\right)
\end{align*}

The probabilistic model on the other hand gives

\begin{align}
 p[y|x_1,x_2,x_3,x_4,]&\propto \mathrm{exp}\left(-\frac{(y-\beta_1x_1-\beta_2x_2- \beta_3x_3-\beta_4x_4-\beta_0)^2}{2\sigma^2}\right)\\
 &\propto\mathrm{exp}\left(-\frac{y^2-2\beta_1x_1 y-2\beta_2x_2y- \beta_3x_3y-2\beta_4x_4y-2\beta_0y}{2\sigma^2}\right)\\
 &\propto\mathrm{exp}\left(-\frac{(y-4\beta_1x_1)^2 -(y-4\beta_2x_2)^2- (y-4\beta_3x_3)^2-(y-4\beta_4x_4)^2-8\beta_0y}{8\sigma^2}\right)\\
 &\propto\mathrm{exp}\left(-\frac{(y-4\beta_1x_1-\beta_0)^2 -(y-2\beta_2x_2-\beta_0)^2- (y-\beta_3x_3-\beta_0)^2-(y-2\beta_4x_4-\beta_0)^2}{8\sigma^2}\right)\\ 
\end{align}

As a result, we obtain the desired correspondence

\begin{align*}
\psi_1(x_1,y)&=\frac{(y-4\beta_1x_1-\beta_0)^2}{8\sigma^2}\\
\psi_2(x_2,y)&=\frac{(y-4\beta_2x_2-\beta_0)^2}{8\sigma^2}\\
\psi_3(x_3,y)&=\frac{(y-4\beta_3x_3-\beta_0)^2}{8\sigma^2}\\
\psi_4(x_4,y)&=\frac{(y-4\beta_4x_4-\beta_0)^2}{8\sigma^2}
\end{align*}

Thus we can estimate the sub-potentials through the linear regression model given enough samples of independent patches from the image.  


# Home exercises


## 6.1 Correspondence with homogenous MRF for 3 x 3 grid* (<font color='red'>4p</font>)


The simplest homogenous MRF model each node is influenced by four of its closest neighbours where 

* the deviations from the mean pixel intensity are penalised by $\frac{1}{2}\cdot\delta^2 x_{ij}$
* the differeces in the horisontal direction are penalised by $\frac{1}{2}\cdot\alpha (x_{ij}-x_{i,j+1})^2$
* the differeces in the vertical direction are penalised by $\frac{1}{2}\cdot\beta (x_{ij}-x_{i+1,j})^2$

Express the density function for $p[x_0,x_1, \ldots, x_8]$ up to a multiplicative constant for $3\times 3$ grid depicted below 
<img src = 'illustrations/markov-random-field-ii.png' width=100%>

Again express the conditional probability $p[x_4|x_0,\ldots,x_8]$ up to a multiplicative constant from the linear regression formulation. 
This will lead to constraints on the linear regrssion formulation. 
* Define the corresponding linear regression task (<font color='red'>1p</font>).
* Solve it for non-overlapping moss textures (<font color='red'>1p</font>).
* Solve the direct maximum likelihood estimation for  moss textures (<font color='red'>1p</font>).
* Show that the results are equal (<font color='red'>1p</font>).

## 6.2 Correspondence with homogenous MRF for n x n grid* (<font color='red'>5p</font>)

Find a maximum likelihod solution for parameters $\alpha, \beta, \delta$ for a general $n\times n$ grid. The main issue is that the normalizing factor $Z(\alpha, \beta, \delta)$ depends on the variables we want to fit and thus cannot be treated as a constant. Nevertheless, you can still estimate $x_{ij}$ through a linear model using neighbouring points. Since a point can be located in both ends of the edge you get different formalisations. Find a heuristic solution to consolidate tasks.   
