# Resouces

1. The links in this lecture
2. [Cheat sheet for implementing 7 methods for selecting the optimal number of clusters in Python](https://towardsdatascience.com/cheat-sheet-to-implementing-7-methods-for-selecting-optimal-number-of-clusters-in-python-898241e1d6ad)

# [What is Hierarchical Clustering (HC)?][1]

Hierarchical Clustering is an unsupervised learning algorithm that merges similar clusters of unlabeled data. Hierarchical Clustering is different from K-Means, it does not require any prior knowledge about the number of clusters K and the output is a dendrogram, a tree structure hierarchy of clusters.


# [Types of HC][2]

1. Agglomerative — Bottom up approach. Start with many small clusters and merge them together to create bigger clusters.
2. Divisive — Top down approach. Start with a single cluster than break it up into smaller clusters.

## [Linkage (distance) concept][3]

<img style="float:center" src="./images/Linkages.png" alt="drawing" height="200" width="300"/>

1. Single Linkage – the distance between the two clusters is defined as the shortest distance two points in each cluster.
<img style="float:center" src="./images/minLink.png" alt="drawing" height="100" width="200"/>



2. Complete Linkage – the distance between two clusters is defined as the longest distance between two points in each cluster.
<img style="float:center" src="./images/maxLink.png" alt="drawing" height="100" width="200"/>



3. Average Linkage – the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster.
<img style="float:center" src="./images/avgLink.png" alt="drawing" height="200" width="200"/>

4. Ward’s Method: This approach of calculating the similarity between two clusters is exactly the same as Group Average except that Ward’s method calculates the sum of the square of the distances between two clusters.


[1]:https://codingwithalex.com/hierarchical-clustering/
[2]:https://towardsdatascience.com/machine-learning-algorithms-part-12-hierarchical-agglomerative-clustering-example-in-python-1e18e0075019
[3]:https://towardsdatascience.com/understanding-the-concept-of-hierarchical-clustering-technique-c6e8243758ec

# Some pros and cons of Hierarchical Clustering
## Pros
1. No assumption of a particular number of clusters (i.e. k-means)
2. May correspond to meaningful taxonomies

## Cons
1. Once a decision is made to combine two clusters, it can’t be undone
2. Too slow for large data sets, O(𝑛2 log(𝑛))


# Divisive Hierarchical Clustering


<img style="float:center" src="./images/AG1.png" alt="drawing" height="100" width="200"/>

<img style="float:center" src="./images/AG2.png" alt="drawing" height="100" width="200"/>


# [Agglomerative Hierarchical Clustering Algorithm][1]

1. Start with N separate clusters, one for each data point (N represent the number of points in your data).
2. Compute the approximity function between clusters.
3. Merge the two clusters that are closest to each other (based on some linkage criterion — single linkage)
4. Recompute distances between the clusters.
5. Repeat steps 2 and 3 until you get one cluster of size N. The output is a dendrogram (tree hierarchy of clusters)


1. Step 1:
<img style="float:center" src="./images/AHC.png" alt="drawing" height="100" width="200"/>

2. Final HC:
<img style="float:center" src="./images/AHCR.png" alt="drawing" height="100" width="200"/>

[1]:https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/





# [Example][1]

1. Step 1: Data table:
<img style="float:center" src="./images/Matrix.png" alt="drawing" height="100" width="200"/>

2. Step 2: Proximity Matrix:
<img style="float:center" src="./images/proximity matrix.png" alt="drawing" height="100" width="200"/>

3. Step 3: Merge 1 and 2:
<img style="float:center" src="./images/merge_2.png" alt="drawing" height="100" width="200"/>

4. Step 3: Table after merge 1 and 2:
<img style="float:center" src="./images/merge_2_max.png" alt="drawing" height="100" width="200"/>

5. Final HC:
<img style="float:center" src="./images/FinalHC.png" alt="drawing" height="100" width="200"/>

[1]:https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/


# [Let us practice][1]

[1]:https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/

In [None]:
import numpy as np

X = np.array([[5,3],
    [10,15],
    [15,12],
    [24,10],
    [30,30],
    [85,70],
    [71,80],
    [60,78],
    [70,55],
    [80,91],])

In [None]:
import matplotlib.pyplot as plt

labels = range(1, 11)
plt.figure(figsize=(10, 7))
plt.subplots_adjust(bottom=0.1)
plt.scatter(X[:,0],X[:,1], label='True Position')

for label, x, y in zip(labels, X[:, 0], X[:, 1]):
    plt.annotate(
        label,
        xy=(x, y), xytext=(-3, 3),
        textcoords='offset points', ha='right', va='bottom')
plt.show()

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt

linked = linkage(X, 'single')

labelList = range(1, 11)

plt.figure(figsize=(10, 7))
dendrogram(linked,
            orientation='top',
            labels=labelList,
            distance_sort='descending',
            show_leaf_counts=True)
plt.show()

In [None]:
help(dendrogram)

# How to select K using dendrogram

## Try to cut the tree before level 2,3,4

In [None]:
# Using SciKit-learn
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
import numpy as np

In [None]:
X = np.array([[5,3],
    [10,15],
    [15,12],
    [24,10],
    [30,30],
    [85,70],
    [71,80],
    [60,78],
    [70,55],
    [80,91],])

In [None]:
from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
cluster.fit_predict(X)

In [None]:
print(cluster.labels_)

In [None]:
plt.scatter(X[:,0],X[:,1], c=cluster.labels_, cmap='rainbow')

In [None]:
#Using Iris Data:
import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display

In [None]:
from sklearn import datasets

# import some data to play with
iris = datasets.load_iris()
feat = iris.feature_names
X = iris.data[:, :2]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
y = iris.target
y_name = ['Setosa', 'Versicolour', 'Virginica']

In [None]:
from sklearn.cluster import AgglomerativeClustering
clustering = AgglomerativeClustering(linkage="ward", n_clusters=3)
clustering.fit(X)

In [None]:
# MinMax scale the data so that it fits nicely onto the 0.0->1.0 axes of the plot.
from sklearn import preprocessing
X_plot = preprocessing.MinMaxScaler().fit_transform(X)

colours = 'rbg'
for i in range(X.shape[0]):
    plt.text(X_plot[i, 0], X_plot[i, 1], str(clustering.labels_[i]),
             color=colours[y[i]],
             fontdict={'weight': 'bold', 'size': 9}
        )

plt.xticks([])
plt.yticks([])
plt.axis('off')
plt.show()

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage

linkage_matrix = linkage(X, 'ward')
figure = plt.figure(figsize=(7.5, 5))
dendrogram(
    linkage_matrix,
    color_threshold=0,
)
plt.title('Hierarchical Clustering Dendrogram (Ward)')
plt.xlabel('sample index')
plt.ylabel('distance')
plt.tight_layout()
plt.show()