Restricted Boltzman Machines
=================

A [Restricted Boltzmann machine (RBM)](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. They are a particular form of [Boltzman Machines](https://en.wikipedia.org/wiki/Boltzmann_machine) subject to a **restriction**. The restriction is that there are no connections between nodes within a group of units (meaning that the network form a bipartite graph, see below).

The RBM is made of two layers, each one having a certain number of units. The input units are called **visible units** of the RBM because their states are observed. The feature detectors correspond to non-observed **hidden units**. The two layers are connected through a matrix of weights. The units inside the layer are not connected, meaning that the network form a [bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph).

Energy of a configuration
-----------------------------

The visible and hidden units are often organised as vectors, and a pair of visible-hidden vectors is called a **configuration**. A joint configuration of the visible and hidden units has an **energy** (see Hopfield, 1982) given by:

$$E(v,h) = -a^{T}v -b^{T}h -v^{T} Wh $$

where the matrix of weights $W$ (size $m \times n$) associated with the connection between hidden unit $h$ and visible unit $v$, as well as bias weights $a$ for the visible units and $b$ for the hidden units.
This definition of Energy is the same used in [Hopefield Networks](https://en.wikipedia.org/wiki/Hopfield_network).


Probability distributions over units
------------------------------------------

Using the energy it is possible to define a series of probability distribution of both visible and hidden units. The [joint probability](https://en.wikipedia.org/wiki/Joint_probability_distribution) of every possible pair of a visible and a hidden vector can be defined as follows:

$$ p(v,h) = \frac{1}{Z} e^{-E(v,h)}$$

where the **partition function** $Z$ is a normalisation factor, given by summing over all possible pairs of visible and hidden vectors.

Using the energy it is also possible to compute the [marginal probability distribution](https://en.wikipedia.org/wiki/Marginal_distribution) on the visible units:

$$ p(v,h) = \frac{1}{Z} \sum_{h} e^{-E(v,h)}$$



Resources
------------

A Practical Guide to Training RBMs by Geoffrey Hinton [[pdf]](https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf)