By &ldquo;agglomerative&rdquo; we mean a clustering technique in which we initially start with individual data points as the &ldquo;clusters&rdquo; and over time we &ldquo;agglomerate&rdquo; these small clusters into larger clusters.

There is the opposite strategy, i.e., that is divisive clustering
which begins with a single giant cluster and splits it.



## Some real data



By now surely you&rsquo;re getting tired of this iris data set?



In [1]:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)

## Draw a dendrogram



Like so many things `scikit-learn` makes drawing a dendrogram quite easy.



In [1]:
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage  

linked = linkage(X)
dendrogram(linked, labels=y)
plt.show()

And like so many things, `scikit-learn` offers many parameters which
greatly modify the output.



In [1]:
linked = linkage(X, 'ward')
dendrogram(linked, labels=y, leaf_rotation=0,leaf_font_size=6)
plt.show()

## Perform agglomerative clustering



In [1]:
from sklearn.cluster import AgglomerativeClustering
ac = AgglomerativeClustering(n_clusters=3)
ac.fit(X)
ac.labels_

How did we do?



In [1]:
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score(y, ac.labels_)

In [1]:
from sklearn.metrics import confusion_matrix
confusion_matrix( y, ac.labels_ )

Let&rsquo;s ****look at our data****.



In [1]:
import matplotlib.pyplot as plt
plt.scatter( X[:,1], X[:,2],c=y )
for i, label in enumerate(ac.labels_):
    plt.annotate(label, (X[i,1], X[i,2]))
plt.show()

## Is this really better than k-means?



Let&rsquo;s do some k-means!



In [1]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3).fit(X)
kmeans.labels_

We could judge this via some metrics.



In [1]:
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score(y, kmeans.labels_)

But we could also just look at some pictures.



In [1]:
import matplotlib.pyplot as plt
plt.scatter( X[:,1], X[:,2],c=y )
for i, label in enumerate(kmeans.labels_):
    plt.annotate(label, (X[i,1], X[i,2]))
plt.show()