Q1: What is Hierarchical Clustering and How is it Different from Other Clustering Techniques?
Hierarchical clustering is a method that builds a hierarchy of clusters by either merging smaller clusters into larger ones (agglomerative) or splitting larger clusters into smaller ones (divisive). Unlike K-Means, hierarchical clustering does not require specifying the number of clusters (K) beforehand and produces a tree-like structure (dendrogram) that shows relationships between clusters.

📌 Key Differences from Other Clustering Techniques:

Feature	Hierarchical Clustering	K-Means Clustering	DBSCAN
Number of Clusters	Not predefined	Predefined (K)	Not required
Cluster Shape	Any shape	Mostly spherical	Arbitrary
Scalability	Slow for large data	Fast for large data	Slower than K-Means
Handles Noise	Limited	Poorly	Yes
Output Structure	Dendrogram (Tree)	Fixed K clusters	Density-based
Q2: What are the Two Main Types of Hierarchical Clustering Algorithms?
There are two main types of hierarchical clustering:

Agglomerative Hierarchical Clustering (Bottom-Up Approach)

Starts with each data point as its own cluster.
Merges the closest clusters iteratively until only one cluster remains.
Commonly used method due to simplicity.
🔹 Example: Merging customers based on purchase behavior.

Divisive Hierarchical Clustering (Top-Down Approach)

Starts with one large cluster containing all data points.
Splits clusters recursively until each point is its own cluster.
Less common due to higher computational complexity.
🔹 Example: Splitting species based on genetic characteristics.

Q3: How to Determine the Distance Between Two Clusters in Hierarchical Clustering?
Cluster distance is calculated using linkage methods, which define how clusters are merged:

Linkage Type	Definition
Single Linkage	Distance between the closest points of two clusters. (Leads to elongated clusters)
Complete Linkage	Distance between the farthest points of two clusters. (Encourages compact clusters)
Average Linkage	Mean distance between all points in two clusters. (Balanced approach)
Centroid Linkage	Distance between the centroids (means) of two clusters.
Ward’s Method	Minimizes variance within clusters (often gives best results).
📌 Common Distance Metrics Used:

Euclidean Distance (Default) → 
𝑑
(
𝐴
,
𝐵
)
=
∑
(
𝐴
𝑖
−
𝐵
𝑖
)
2
d(A,B)= 
∑(A 
i
​
 −B 
i
​
 ) 
2
 
​
 
Manhattan Distance → 
𝑑
(
𝐴
,
𝐵
)
=
∑
∣
𝐴
𝑖
−
𝐵
𝑖
∣
d(A,B)=∑∣A 
i
​
 −B 
i
​
 ∣
Cosine Similarity → Measures angle between vectors
Q4: How to Determine the Optimal Number of Clusters in Hierarchical Clustering?
Since hierarchical clustering does not require K beforehand, you determine the optimal number after constructing the dendrogram.

🔹 Common Methods to Find Optimal Clusters:

Dendrogram Cutting:

Find the largest vertical gap in the dendrogram without merging clusters too soon.
Draw a horizontal line where the longest branch split occurs.
Silhouette Score:

Measures how well a point fits into its assigned cluster.
Higher values indicate better clustering quality.
Elbow Method on Linkage Distances:

Plot intra-cluster distances vs. number of clusters.
Find the "elbow" where the distance stops decreasing significantly.
Q5: What are Dendrograms and How Are They Useful?
A dendrogram is a tree-like diagram that visualizes hierarchical clustering results.

📌 Key Insights from Dendrograms:

Shows relationships between clusters.
Helps identify the best number of clusters.
Reveals outliers (long branches that don't merge early).
🔹 Example: Customer Segmentation

A dendrogram groups customers by purchasing behavior.
Cutting at an appropriate level provides optimal customer segments.
Q6: Can Hierarchical Clustering Be Used for Both Numerical and Categorical Data?
✅ Yes, but the distance metrics must be adapted:

Data Type	Common Distance Metric
Numerical Data	Euclidean, Manhattan, Cosine Similarity
Categorical Data	Jaccard Similarity, Hamming Distance
Mixed Data	Gower’s Distance (Handles categorical + numerical)
📌 Example:

Numerical (Age, Income): Use Euclidean Distance.
Categorical (Gender, Product Category): Use Jaccard Similarity.
Mixed Data (Age + Product Category): Use Gower’s Distance.
Q7: How to Use Hierarchical Clustering to Detect Outliers?
Hierarchical clustering helps detect outliers by identifying data points that merge late in the dendrogram.

🔹 Steps to Detect Outliers:

Construct a dendrogram using hierarchical clustering.
Identify points that merge at extreme distances.
Points that remain isolated for long are likely outliers.
📌 Example: Fraud Detection

In credit card transactions, outliers might be isolated purchases that merge only at high distances.