Skip to content

Commit

Permalink
Merge pull request #3197 from sanuj/cookbook
Browse files Browse the repository at this point in the history
Add GMM cookbook page
  • Loading branch information
karlnapf committed May 30, 2016
2 parents 8932637 + 1226872 commit b703d3f
Show file tree
Hide file tree
Showing 4 changed files with 103 additions and 0 deletions.
53 changes: 53 additions & 0 deletions doc/cookbook/source/examples/clustering/gmm.rst
@@ -0,0 +1,53 @@
=======================
Gaussian Mixture Models
=======================

A Gaussian mixture model is a probabilistic model that assumes that data are generated from a finite mixture of Gaussians with unknown parameters. The model likelihood can be written as:

.. math::
p(x|\theta) = \sum_{i=1}^{K}{\pi_i \mathcal{N}(x|\mu_i, \Sigma_i)}
where :math:`p(x|\theta)` is probability distribution given :math:`\theta:=\{\pi_i, \mu_i, \Sigma_i\}_{i=1}^K`, :math:`K` denotes number of mixture components, :math:`\pi_i` denotes weight for :math:`i`-th component, :math:`\mathcal{N}` denotes a multivariate normal distribution with mean vector :math:`\mu_i` and covariance matrix :math:`\Sigma_i`.

The expectation maximization (EM) algorithm is used to learn parameters of the model, via finding a local maximum of a lower bound on the likelihood.

See Chapter 20 in :cite:`barber2012bayesian` for a detailed introduction.

-------
Example
-------

We start by creating CDenseFeatures (here 64 bit floats aka RealFeatures) as

.. sgexample:: gmm.sg:create_features

We initialize :sgclass:`GMM`, passing the desired number of mixture components.

.. sgexample:: gmm.sg:create_gmm_instance

We provide training features to the :sgclass:`GMM` object, train it by using EM algorithm and sample data-points from the trained model.

.. sgexample:: gmm.sg:train_sample

We extract parameters like :math:`\pi`, :math:`\mu_i` and :math:`\Sigma_i` for any componenet from the trained model.

.. sgexample:: gmm.sg:extract_params

We obtain log likelihood of belonging to clusters and being generated by this model.

.. sgexample:: gmm.sg:cluster_output

We can also use Split-Merge Expectation-Maximization algorithm :cite:`ueda2000smem` for training.

.. sgexample:: gmm.sg:training_smem

----------
References
----------
:wiki:`Mixture_model`

:wiki:`Expectation–maximization_algorithm`

.. bibliography:: ../../references.bib
:filter: docname in docnames
9 changes: 9 additions & 0 deletions doc/cookbook/source/index.rst
Expand Up @@ -40,3 +40,12 @@ Gaussian Processes
:glob:

examples/gaussian_processes/**

Clustering
----------

.. toctree::
:maxdepth: 1
:glob:

examples/clustering/**
10 changes: 10 additions & 0 deletions doc/cookbook/source/references.bib
Expand Up @@ -24,3 +24,13 @@ @book{Rasmussen2005GPM
year = {2005},
publisher = {The MIT Press}
}
@article{ueda2000smem,
title={SMEM Algorithm for Mixture Models},
author={N. Ueda and R. Nakano and Z. Ghahramani and G.E. Hinton},
journal={Neural Computation},
volume={12},
number={9},
pages={2109--2128},
year={2000},
publisher={MIT Press}
}
31 changes: 31 additions & 0 deletions examples/meta/src/clustering/gmm.sg
@@ -0,0 +1,31 @@
CSVFile f_feats_train("../../data/classifier_4class_2d_linear_features_train.dat")

#![create_features]
RealFeatures features_train(f_feats_train)
#![create_features]

#![create_gmm_instance]
int num_components = 3
GMM gmm(num_components)
#![create_gmm_instance]

#![train_sample]
gmm.set_features(features_train)
gmm.train_em()
RealVector output = gmm.sample()
#![train_sample]

#![extract_params]
int component_num = 1
RealVector nth_mean = gmm.get_nth_mean(component_num)
RealMatrix nth_cov = gmm.get_nth_cov(component_num)
RealVector coef = gmm.get_coef()
#![extract_params]

#![cluster_output]
RealVector log_likelihoods = gmm.cluster(nth_mean)
#![cluster_output]

#![training_smem]
gmm.train_smem()
#![training_smem]

0 comments on commit b703d3f

Please sign in to comment.