Add kmeans cookbook page

shogun-toolbox · May 11, 2016 · 2f7f73f · 2f7f73f
1 parent 112dd79
commit 2f7f73f
Show file tree

Hide file tree

Showing 2 changed files with 70 additions and 0 deletions.
diff --git a/doc/cookbook/source/examples/classifier/kmeans.rst b/doc/cookbook/source/examples/classifier/kmeans.rst
@@ -0,0 +1,47 @@
+==================
+:math:`k`-means clustering
+==================
+:math:`k`-means clustering aims to partition :math:`n` observations into :math:`k` (:math:`\leq n`) clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
+
+The :math:`n` observations are represented by :math:`n` :math:`d`-dimensional real vecoters, :math:`\mathbf{x} = (x_1, x_2, ..., x_n)`.
+
+
+
+In :math:`k`-means clustering, the :math:`n` observations will partitioned into :math:`k` (:math:`\leq n`) sets :math:`\mathbf{S} = {S_1, S_2, ..., S_k}`, with minimal within-cluster sum of squares (WCSS) (sum of distance functions of each point in the cluster to the :math:`k^{th}` center). 
+In other words, its objective is to find:
+
+.. math::
+   k = \underset{\mathbf{S}}{argmin} \sum_{i=1}^{k}\sum_{\mathbf{x}\in S_k}\left \|\boldsymbol{x} - \boldsymbol{\mu}_i  \right \|^{2}
+
+where :math:`\mathbf{μ}_i` is the mean of points in :math:`S_i`.
+
+-------
+Example
+-------
+Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as
+
+.. sgexample:: kmeans.sg:create_features
+
+In order to run :sgclass:`CKMeans`, we need to choose a distance, for example :sgclass:`CEuclideanDistance`, or other sub-classes of :sgclass:`CDistance`. The distance is initialized with the data we want to classify.
+
+.. sgexample:: kmeans.sg:choose_distance
+
+Once we have chosen a distance, we create an instance of the :sgclass:`CKMeans` classifier.
+We explicitly set the number of clusters we are expecting to have as 2 and pass it to :math:`k`, together with training method Lloyd's method.
+
+.. sgexample:: kmeans.sg:create_instance
+
+Then we train the dataset:
+
+.. sgexample:: kmeans.sg:train_and_apply
+
+And we can extract centers and radius of each cluster:
+
+.. sgexample:: kmeans.sg:extract_centers_and_radius
+
+----------
+References
+----------
+:wiki:`K-means_clustering`
+
+:wiki:`Lloyd's_algorithm`
diff --git a/examples/meta/src/classifier/kmeans.sg b/examples/meta/src/classifier/kmeans.sg
@@ -0,0 +1,23 @@
+CSVFile f_feats_train("../../data/classifier_binary_2d_linear_features_train.dat")
+
+#![create_features]
+RealFeatures features_train(f_feats_train)
+#![create_features]
+
+#![choose_distance]
+EuclideanDistance distance(features_train, features_train)
+#![choose_distance]
+
+#![create_instance]
+KMeans kmeans(3, distance, enum EKMeansMethod.KMM_LLOYD)
+#![create_instance]
+
+#![train_and_apply]
+kmeans.train()
+#![train_and_apply]
+
+#![extract_centers_and_radius]
+RealMatrix c = kmeans.get_cluster_centers()
+RealVector r = kmeans.get_radiuses()
+#![extract_centers_and_radius]
+