Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 1.33 KB

hierarchical.rst

File metadata and controls

34 lines (22 loc) · 1.33 KB

Hierarchical Clustering

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. We apply a "bottom up" approach: each observation starts in its own clister, and pairs of clusters are subsequently merged.

The merges are determined in a greedy manner. We start by constructing a pairwise distance matrix. Then, the clusters of the pair with closest distance are merged iteratively.

Example

Imagine we have files with the training data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as:

hierarchical.sg:create_features

In order to run CHierarchical, we need to choose a distance, for example CEuclideanDistance, or other sub-classes of CDistance. The distance is initialized with the data we want to classify.

hierarchical.sg:choose_distance

We then create an instance of the CHierarchical classifier by assigning the steps of merging we expect to have in the training.

hierarchical.sg:create_instance

We can extract the information of the two merged elements, as well as the distance between them in each merging step:

hierarchical.sg:extract_results

References

Hierarchical_clustering