Skip to content

Commit

Permalink
squash - hierarchical clustering cookbook page
Browse files Browse the repository at this point in the history
  • Loading branch information
OXPHOS committed May 30, 2016
1 parent 32a7336 commit 45dc68a
Show file tree
Hide file tree
Showing 7 changed files with 57 additions and 120 deletions.
34 changes: 34 additions & 0 deletions doc/cookbook/source/examples/clustering/hierarchical.rst
@@ -0,0 +1,34 @@
=======================
Hierarchical Clustering
=======================

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.
We apply a "bottom up" approach: each observation starts in its own clister, and pairs of clusters are subsequently merged.

The merges are determined in a greedy manner.
We start by constructing a pairwise distance matrix. Then, the clusters of the pair with closest distance are merged iteratively.

-------
Example
-------

Imagine we have files with the training data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as:

.. sgexample:: hierarchical.sg:create_features

In order to run :sgclass:`CHierarchical`, we need to choose a distance, for example :sgclass:`CEuclideanDistance`, or other sub-classes of :sgclass:`CDistance`. The distance is initialized with the data we want to classify.

.. sgexample:: hierarchical.sg:choose_distance

We then create an instance of the :sgclass:`CHierarchical` classifier by assigning the steps of merging we expect to have in the training.

.. sgexample:: hierarchical.sg:create_instance

We can extract the information of the two merged elements, as well as the distance between them in each merging step:

.. sgexample:: hierarchical.sg:extract_results

----------
References
----------
:wiki:`Hierarchical_clustering`
23 changes: 23 additions & 0 deletions examples/meta/src/clustering/hierarchical.sg
@@ -0,0 +1,23 @@
CSVFile f_feats_train("../../data/classifier_4class_2d_linear_features_train.dat")

#![create_features]
RealFeatures features_train(f_feats_train)
#![create_features]

#![choose_distance]
EuclideanDistance distance(features_train, features_train)
#![choose_distance]

#![create_instance]
int merges = 3
Hierarchical hierarchical(merges, distance)
#![create_instance]

#![train_model]
hierarchical.train()
#![train_model]

#![extract_results]
RealVector d = hierarchical.get_merge_distances()
IntMatrix cp = hierarchical.get_cluster_pairs()
#![extract_results]

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

0 comments on commit 45dc68a

Please sign in to comment.