## Restricted Boltzmann Machine (RBM)
A [RBM](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. 

A standard RBM has following diagram
<img src="../rbm-diagram.png" alt="rbm-diagram">
where we denote
* $v\in\mathbb{R}^D$ is visible units
* $h\in\mathbb{R}^H$ is hidden units

And $v,h$ takes binary value (0,1), then it defines an energy function (similar as Hopfield network)
$$
E(v,h) = -a^Tv -b^Th - v^TWh
$$
which allows us to model the joint-distribution $(v,h)$ in term of the energy function i.e
$$
\mathbb{P}(v,h) = \frac{1}{Z}e^{-E(v,h)}
$$
The above diagram is a bipartite graph which allows to define conditional probability 
$$
\begin{array}{rl}
P(v|h) &= \prod_{i}P(v_i|h)\\
P(h|v) &= \prod_{j}P(h_j|v)
\end{array}
$$
where individual probability is given by
$$
\begin{array}{rl}
P(v_i=1|h) &= \sigma\left(a_i + \sum_{j=1}^{H}w_{i,j}h_j\right)\\
P(h_j=1|v) &= \sigma\left(b_j + \sum_{i=1}^Dw_{i,j}v_i\right)
\end{array}
$$

In this note, we look at the following task
* given a set of unlabelled dataset $\left\{v^{(i)}\right\}_{i=1}^n$
* given a configuration of RMB e.g number of hidden units $H$

The goal is to learn $a$, $b$ and $W$ that model the joint-distribution directly from dataset $\left\{v^{(i)}\right\}_{i=1}^n$.

We will consider MNIST dataset for this task, then later we go through some RBM's application.