Genie: A new, fast, and outlier resistant hierarchical clustering algorithm
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
inst
man
src
tests
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
NAMESPACE
NEWS
README.md
genie.Rproj

README.md

Genie (R Package)

A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm

Build Status

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure - unless the clusters are well-separated. To overcome its limitations, we proposed a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini or Bonferroni index) of the cluster sizes does not increase drastically above a given threshold. Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.

A detailed description of the algorithm can be found in:

Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences, 2016, doi:10.1016/j.ins.2016.05.003.

Authors: Marek Gagolewski, Maciej Bartoszuk, and Anna Cena

Homepage: http://www.gagolewski.com/software/genie/

CRAN entry: http://cran.r-project.org/web/packages/genie/