# Hierarchical agglomerative clustering

<justify>Hierarchical clustering algorithms are either top-down or bottom-up. Bottomup algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters
have been merged into a single cluster that contains all documents. Bottom up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. Top-down clustering requires a method for splitting a cluster. HAC It proceeds by splitting clusters recursively until individual documents are reached. </justify>

![https://python-graph-gallery.com/wp-content/uploads/401_custom_Dendrogram1.png](https://python-graph-gallery.com/wp-content/uploads/401_custom_Dendrogram1.png)
 <center>Figure. 1.Dendogram</center>

An HAC clustering is typically visualized as a dendrogram as shown in Figure .1. Each merge is represented by a horizontal line. The y-coordinate of the horizontal line is the similarity of the two clusters that were merged, where documents are viewed as singleton clusters. 

# Agglomerative Technic


## Single-link

<justify>In single-link clustering or single-linkage clustering, the similarity of two clusters is the similarity of their most similar members (see Figure. 2). This single-link merge criterion is local. We pay attention solely to the area where the two clusters come closest to each other. Other, more distant parts of the cluster and the clusters’ overall structure are not taken into account.</justify>

<img src="https://1.bp.blogspot.com/-FbHLDDIzKrw/X54d6LznF1I/AAAAAAAAGnA/JR3tK75cQ7cPYfJLjifuj7r8JnJ5HwlAwCLcBGAsYHQ/s320/single%2Blink%2BHAC.PNG" width="450">

<center>Figure. 2. Single Link : Minimum Similarity

## Complete-link clustering

<justify>In complete-link clustering or complete-linkage clustering, the similarity of two  clusters is the similarity of their most dissimilar members (see Figure 3. This is equivalent to choosing the cluster pair whose merge has the smallest diameter. This complete-link merge criterion is non-local; the entire structure of the clustering can influence merge decisions. This results in a preference for compact clusters with small diameters over long, straggly clusters, but also causes sensitivity to outliers. A single document far from the center can increase diameters of candidate merge clusters dramatically and completely change the final clustering</justify>


<img src="https://1.bp.blogspot.com/-CdvmBikHl-4/X54eys7GmPI/AAAAAAAAGnI/7Ot7I3jVBS4IHaS0CVfJlqiJEQ5wnyB3QCLcBGAsYHQ/s320/comple%2Blink.PNG" width="450">

<center>Figure. 3.complete-link: minimum similarity</center>

<img src="https://1.bp.blogspot.com/-mionHirutig/X54mIqkdPyI/AAAAAAAAGnU/WoI4AmCRzCMe8g_CuxVzwRmlcG9Vem0NACLcBGAsYHQ/s320/Centroid%2Baverage%2Binter%2Bsimilarity.PNG" width="450">

<center>Figure. 4. centroid: average inter-similarity</center>



<img src="https://1.bp.blogspot.com/-967l7xcrFBc/X54mIjpFOfI/AAAAAAAAGnY/yWqlZFWlXYc_0kxS3fGZ3KEjS56yBJ8kgCLcBGAsYHQ/s320/Group-average%2Baverage%2Bof%2Ball%2Bsimilarities.PNG" width="430">

<center>Figure. 5.Group-average: average of all similarities</center>


The different notions of cluster similarity used by the four HAC algorithms (Figure 2 - Figure 5). An inter-similarity is a similarity between two documents from different clusters.


<img src="https://1.bp.blogspot.com/-xpA2ViC7FD4/X54rh1ybghI/AAAAAAAAGnw/M8FjmFue12knAxekl6-W_a-3ihsDHFOGwCLcBGAsYHQ/w588-h256/A%2Bsingle-link%2B%2528left%2529%2Band%2Bcomplete-link%2B%2528right%2529.PNG" width="500">

<center>Figure. 6. single-link (left) and complete-link (right) clustering of eight documents. The ellipses correspond to successive clustering stages. Left: The single-linksimilarity of the two upper two-point clusters is the similarity of d2 and d3 (solid
line), which is greater than the single-link similarity of the two left two-point clusters (dashed line). Right: The complete-link similarity of the two upper two-point clusters is the similarity of d1 and d4 (dashed line), which is smaller than the complete-link
similarity of the two left two-point clusters (solid line).</center>


<justify>Figure. 6. depicts a single-link and a complete-link clustering of eight documents. The first four steps, each producing a cluster consisting of a pair of two documents, are identical. Then single-link clustering joins the upper two pairs (and after that the lower two pairs) because on the maximumsimilarity definition of cluster similarity, those two clusters are closest. Complete link clustering joins the left two pairs (and then the right two pairs) because those are the closest pairs according to the minimum-similarity definition of
cluster similarity.</justify>

## Space and Time Complexity of Hierarchical clustering Technique:

According to [2] Space and Time Complexity of Hierarchical clustering Technique are described below:

* <justify>Space complexity: The space required for the Hierarchical clustering Technique is very high when the number of data points are high as we need to store the similarity matrix in the RAM. The space complexity is the order of the square of n.</justify>

* **Space complexit**y = O(n²) where n is the number of data points.
* Time complexity: Since we’ve to perform n iterations and in each iteration, we need to update the similarity matrix and restore the matrix, the time complexity is also very high. The time complexity is the order of the cube of n.

* Time complexity = O(n³) where n is the number of data points.

# Limitations of Hierarchical clustering Technique:

* There is no mathematical objective for Hierarchical clustering.
* All the approaches to calculate the similarity between clusters has its own disadvantages.
* High space and time complexity for Hierarchical clustering. Hence this clustering algorithm cannot be used when we have huge data.

# How to Implementing Hierarchical agglomerative clustering ?

1. At the start, treat each data point as one cluster. Therefore, the number of clusters at the start will be K, while K is an integer representing the number of data points.
1. Form a cluster by joining the two closest data points resulting in K-1 clusters.
1. Form more clusters by joining the two closest clusters resulting in K-2 clusters.
1. Repeat the above three steps until one big cluster is formed.
1. Once single cluster is formed, dendrograms are used to divide into multiple clusters depending upon the problem. We will study the concept of dendrogram in detail in an upcoming section.

There are different ways to find distance between the clusters. The distance itself can be Euclidean or Manhattan distance. Following are some of the options to measure distance between two clusters:

1. Measure the distance between the closes points of two clusters.
1. Measure the distance between the farthest points of two clusters.
1. Measure the distance between the centroids of two clusters.
1. 1. Measure the distance between all possible combination of points between the two clusters and take the mean.

# Example 1: cluster the X NumPy array of data points

In [None]:
import numpy as np

X = np.array([[5,3],
    [10,15],
    [15,12],
    [24,10],
    [30,30],
    [85,70],
    [71,80],
    [60,78],
    [70,55],
    [80,91],])

Let's plot the above data points



In [None]:
import matplotlib.pyplot as plt

labels = range(1, 11)
plt.figure(figsize=(10, 7))
plt.subplots_adjust(bottom=0.1)
plt.scatter(X[:,0],X[:,1], label='True Position')

for label, x, y in zip(labels, X[:, 0], X[:, 1]):
    plt.annotate(
        label,
        xy=(x, y), xytext=(-3, 3),
        textcoords='offset points', ha='right', va='bottom')
plt.show()





<left>Figure. 7.</left>


As you can see in figure 7. above we can see two clusters: the first at the bottom left consisting of points 1-5 while the second at the top right consisting of points 6-10.

let's draw the dendrograms for our data points

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt

linked = linkage(X, 'single')

labelList = range(1, 11)

plt.figure(figsize=(10, 7))
dendrogram(linked,
            orientation='top',
            labels=labelList,
            distance_sort='descending',
            show_leaf_counts=True)
plt.show()



<left>Figure. 8.</left>


The algorithm starts by finding the two points that are closest to each other based on Euclidean distance. If we look back at Figure 7, we can see that points 2 and 3 are closest to each other while points 7 and 8 are closes to each other. Therefore a cluster will be formed between these two points first. In Figure 8, you can see that the dendrograms have been created joining points 2 with 3, and 8 with 7. The vertical height of the dendrogram shows the Euclidean distances between points. From Graph2, it can be seen that the Euclidean distance between points 8 and 7 is greater than the distance between points 2 and 3. The next step is to join the cluster formed by joining two points to the next nearest cluster or point which in turn results in another cluster. 

## Example 2 : Practice with Anime Dataset

In [None]:
import numpy as np
import pylab as pl
import pandas as pd
import matplotlib.pyplot as plt 
%matplotlib inline
import seaborn as sns
from sklearn.utils import shuffle
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering

In [None]:
anime = pd.read_csv("../input/anime-recommendations-database/anime.csv")
rating = pd.read_csv("../input/anime-recommendations-database/rating.csv")

In [None]:
anime.head()

Modeling user rating

In [None]:
rating.head()

In [None]:
cnt_pro = rating['rating'].value_counts()
plt.figure(figsize=(6,4))
sns.barplot(cnt_pro.index, cnt_pro.values, alpha=0.8)
plt.ylabel('Number of rating', fontsize=12)
plt.xlabel('rating', fontsize=12)
plt.xticks(rotation=80)
plt.show();

Here we are computing the ratings with each user_id, first group the user_id and get max and mean of ratings

In [None]:
#display(rating[["user_id","rating",]].groupby(["user_id"]).agg(["max",'mean']).style.background_gradient(cmap="copper"))

In [None]:
Mean_rate = rating.groupby(['user_id']).mean().reset_index()
Mean_rate['mean_rating'] = Mean_rate['rating']
Mean_rate.drop(['anime_id','rating'],axis=1, inplace=True)


In [None]:
Mean_rate.head()

In [None]:
user = pd.merge(rating,Mean_rate,on=['user_id','user_id'])
user.head()


In [None]:
user = user.drop(user[user.rating < user.mean_rating].index)
user

In [None]:
user[user['user_id']==2].head(10)


In [None]:
user[user['user_id']==1].head(10)

# # Merge Dataset

Because Hierarchical clustering has High space and time complexity. Hence this clustering algorithm cannot be used when we have huge data. So we reduce the data, this work we only use 10000 data

In [None]:
Data = pd.merge(anime,user,on=['anime_id','anime_id'])
Data= Data[Data.user_id <= 10000]
Data.head(10)

In [None]:
Data.info()

In [None]:
len(Data['anime_id'].unique())


In [None]:
len(Data['user_id'].unique())


Show detail of anime which each user like


In [None]:
user_anime = pd.crosstab(Data['user_id'], Data['name'])
user_anime.head(10)

# Agglomerative Clustering

The hierarchy class has a dendrogram method which takes the value returned by the linkage method of the same class. The linkage method takes the dataset and the method to minimize distances as parameters. We use 'ward' as the method since it minimizes then variants of distances between the clusters.

In [None]:
import scipy.cluster.hierarchy as shc

plt.figure(figsize=(10, 7))
plt.title("Customer Dendograms")
dend = shc.dendrogram(shc.linkage(user_anime, method='ward'))

Now we know the number of clusters for our dataset, the next step is to group the data points into these ten clusters

In [None]:
from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=10, affinity='euclidean', linkage='ward')
cluster.fit_predict(user_anime)


You can see the cluster labels from all of your data points. 

In [None]:
plt.figure(figsize=(10, 7))
plt.scatter(user_anime.iloc[:,0], user_anime.iloc[:,1], c=cluster.labels_, cmap='rainbow')

References

* [[1](https://nlp.stanford.edu/IR-book/pdf/17hier.pdf)] Hierarchical clustering. DRAFT! © April 1, 2009 Cambridge University Press
* [[2](https://towardsdatascience.com/understanding-the-concept-of-hierarchical-clustering-technique-c6e8243758ec)] . Time Complexity Hierarchical clustering
* [[3](https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/)] Hierarchical clustering

# On progress :)