Skip to content

Commit

Permalink
Merge pull request #3183 from OXPHOS/cookbook
Browse files Browse the repository at this point in the history
Add kmeans page to cookbook
  • Loading branch information
karlnapf committed Jun 15, 2016
2 parents 3998da8 + 9950533 commit 59c003f
Show file tree
Hide file tree
Showing 7 changed files with 85 additions and 100 deletions.
2 changes: 1 addition & 1 deletion data
56 changes: 56 additions & 0 deletions doc/cookbook/source/examples/clustering/kmeans.rst
@@ -0,0 +1,56 @@
=======
K-means
=======
:math:`K`-means clustering aims to partition :math:`n` observations into :math:`k\leq n` clusters (sets :math:`\mathbf{S}`),
in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

In other words, its objective is to minimize:

.. math::
\argmin_\mathbf{S} \sum_{i=1}^{k}\sum_{\mathbf{x}\in S_k}\left \|\boldsymbol{x} - \boldsymbol{\mu}_i \right \|^{2}
where :math:`\mathbf{μ}_i` is the mean of points in :math:`S_i`.

See Chapter 20 in :cite:`barber2012bayesian` for a detailed introduction.

-------
Example
-------
Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as

.. sgexample:: kmeans.sg:create_features

In order to run :sgclass:`CKMeans`, we need to choose a distance, for example :sgclass:`CEuclideanDistance`, or other sub-classes of :sgclass:`CDistance`. The distance is initialized with the data we want to classify.

.. sgexample:: kmeans.sg:choose_distance

Once we have chosen a distance, we create an instance of the :sgclass:`CKMeans` classifier.
We explicitly set :math:`k`, the number of clusters we are expecting to have as 3 and pass it to :sgclass:`CKMeans`. In this example, we apply Lloyd's method for `k`-means clustering.

.. sgexample:: kmeans.sg:create_instance_lloyd

Then we train the model:

.. sgexample:: kmeans.sg:train_dataset

We can extract centers and radius of each cluster:

.. sgexample:: kmeans.sg:extract_centers_and_radius


:sgclass:`CKMeans` also supports mini batch :math:`k`-means clustering.
We can create an instance of :sgclass:`CKMeans` classifier with mini batch :math:`k`-means method by providing the batch size and iteration number.

.. sgexample:: kmeans.sg:create_instance_mb

Then train the model and extract the centers and radius information as mentioned above.

----------
References
----------
:wiki:`K-means_clustering`

:wiki:`Lloyd's_algorithm`

.. bibliography:: ../../references.bib
:filter: docname in docnames
28 changes: 28 additions & 0 deletions examples/meta/src/clustering/kmeans.sg
@@ -0,0 +1,28 @@
CSVFile f_feats_train("../../data/classifier_binary_2d_linear_features_train.dat")
Math:init_random(1)

#![create_features]
RealFeatures features_train(f_feats_train)
#![create_features]

#![choose_distance]
EuclideanDistance distance(features_train, features_train)
#![choose_distance]

#![create_instance_lloyd]
KMeans kmeans(2, distance)
#![create_instance_lloyd]

#![train_dataset]
kmeans.train()
#![train_dataset]

#![extract_centers_and_radius]
RealMatrix c = kmeans.get_cluster_centers()
RealVector r = kmeans.get_radiuses()
#![extract_centers_and_radius]

#![create_instance_mb]
KMeansMiniBatch kmeans_mb(2, distance)
kmeans_mb.set_mb_params(4, 1000)
#![create_instance_mb]
25 changes: 0 additions & 25 deletions examples/undocumented/csharp_modular/clustering_kmeans_modular.cs

This file was deleted.

31 changes: 0 additions & 31 deletions examples/undocumented/java_modular/clustering_kmeans_modular.java

This file was deleted.

19 changes: 0 additions & 19 deletions examples/undocumented/octave_modular/clustering_kmeans_modular.m

This file was deleted.

24 changes: 0 additions & 24 deletions examples/undocumented/python_modular/clustering_kmeans_modular.py

This file was deleted.

0 comments on commit 59c003f

Please sign in to comment.