Merge pull request #3197 from sanuj/cookbook

Add GMM cookbook page
shogun-toolbox · May 30, 2016 · b703d3f · b703d3f
2 parents 8932637 + 1226872
commit b703d3f
Show file tree

Hide file tree

Showing 4 changed files with 103 additions and 0 deletions.
diff --git a/doc/cookbook/source/examples/clustering/gmm.rst b/doc/cookbook/source/examples/clustering/gmm.rst
@@ -0,0 +1,53 @@
+=======================
+Gaussian Mixture Models
+=======================
+
+A Gaussian mixture model is a probabilistic model that assumes that data are generated from a finite mixture of Gaussians with unknown parameters. The model likelihood can be written as:
+
+.. math::
+
+    p(x|\theta) = \sum_{i=1}^{K}{\pi_i \mathcal{N}(x|\mu_i, \Sigma_i)}
+
+where :math:`p(x|\theta)` is probability distribution given :math:`\theta:=\{\pi_i, \mu_i, \Sigma_i\}_{i=1}^K`, :math:`K` denotes number of mixture components, :math:`\pi_i` denotes weight for :math:`i`-th component, :math:`\mathcal{N}` denotes a multivariate normal distribution with mean vector :math:`\mu_i` and covariance matrix :math:`\Sigma_i`.
+
+The expectation maximization (EM) algorithm is used to learn parameters of the model, via finding a local maximum of a lower bound on the likelihood.
+
+See Chapter 20 in :cite:`barber2012bayesian` for a detailed introduction.
+
+-------
+Example
+-------
+
+We start by creating CDenseFeatures (here 64 bit floats aka RealFeatures) as
+
+.. sgexample:: gmm.sg:create_features
+
+We initialize :sgclass:`GMM`, passing the desired number of mixture components.
+
+.. sgexample:: gmm.sg:create_gmm_instance
+
+We provide training features to the :sgclass:`GMM` object, train it by using EM algorithm and sample data-points from the trained model.
+
+.. sgexample:: gmm.sg:train_sample
+
+We extract parameters like :math:`\pi`, :math:`\mu_i` and :math:`\Sigma_i` for any componenet from the trained model.
+
+.. sgexample:: gmm.sg:extract_params
+
+We obtain log likelihood of belonging to clusters and being generated by this model.
+
+.. sgexample:: gmm.sg:cluster_output
+
+We can also use Split-Merge Expectation-Maximization algorithm :cite:`ueda2000smem` for training.
+
+.. sgexample:: gmm.sg:training_smem
+
+----------
+References
+----------
+:wiki:`Mixture_model`
+
+:wiki:`Expectation–maximization_algorithm`
+
+.. bibliography:: ../../references.bib
+    :filter: docname in docnames
diff --git a/doc/cookbook/source/index.rst b/doc/cookbook/source/index.rst
@@ -40,3 +40,12 @@ Gaussian Processes
    :glob:
 
    examples/gaussian_processes/**
+
+Clustering
+----------
+
+.. toctree::
+   :maxdepth: 1
+   :glob:
+
+   examples/clustering/**
diff --git a/doc/cookbook/source/references.bib b/doc/cookbook/source/references.bib
@@ -24,3 +24,13 @@ @book{Rasmussen2005GPM
   year = {2005},
   publisher = {The MIT Press}
 }
+@article{ueda2000smem,
+  title={SMEM Algorithm for Mixture Models},
+  author={N. Ueda and R. Nakano and Z. Ghahramani and G.E. Hinton},
+  journal={Neural Computation},
+  volume={12},
+  number={9},
+  pages={2109--2128},
+  year={2000},
+  publisher={MIT Press}
+}
diff --git a/examples/meta/src/clustering/gmm.sg b/examples/meta/src/clustering/gmm.sg
@@ -0,0 +1,31 @@
+CSVFile f_feats_train("../../data/classifier_4class_2d_linear_features_train.dat")
+
+#![create_features]
+RealFeatures features_train(f_feats_train)
+#![create_features]
+
+#![create_gmm_instance]
+int num_components = 3
+GMM gmm(num_components)
+#![create_gmm_instance]
+
+#![train_sample]
+gmm.set_features(features_train)
+gmm.train_em()
+RealVector output = gmm.sample()
+#![train_sample]
+
+#![extract_params]
+int component_num = 1
+RealVector nth_mean = gmm.get_nth_mean(component_num)
+RealMatrix nth_cov = gmm.get_nth_cov(component_num)
+RealVector coef = gmm.get_coef()
+#![extract_params]
+
+#![cluster_output]
+RealVector log_likelihoods = gmm.cluster(nth_mean)
+#![cluster_output]
+
+#![training_smem]
+gmm.train_smem()
+#![training_smem]