Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. We apply a "bottom up" approach: each observation starts in its own clister, and pairs of clusters are subsequently merged.
The merges are determined in a greedy manner. We start by constructing a pairwise distance matrix. Then, the clusters of the pair with closest distance are merged iteratively.
Imagine we have files with the training data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as:
hierarchical.sg:create_features
In order to run CHierarchical
, we need to choose a distance, for example CEuclideanDistance
, or other sub-classes of CDistance
. The distance is initialized with the data we want to classify.
hierarchical.sg:choose_distance
We then create an instance of the CHierarchical
classifier by assigning the steps of merging we expect to have in the training.
hierarchical.sg:create_instance
We can extract the information of the two merged elements, as well as the distance between them in each merging step:
hierarchical.sg:extract_results
Hierarchical_clustering