# Customer Segmentation using Hierarchical Clustering
##### Businesses often struggle to understand their diverse customer base and to tailor marketing strategies effectively. Traditional demographic grouping (like age or gender alone) may not reveal the true behavioral patterns in customer spending.
##### This project aims to use unsupervised machine learning ‚Äî specifically Hierarchical Clustering ‚Äî to automatically group customers based on their Annual Income and Spending Score.
##### These clusters will help identify meaningful customer segments, such as ‚ÄúLuxury Shoppers‚Äù, ‚ÄúBudget-Conscious Customers‚Äù, or ‚ÄúModerate Spenders‚Äù, without any prior labels or supervision.

### üéØ Project Objective
##### To apply Hierarchical Clustering on mall customer data and visualize the natural groupings of customers based on income and spending habits.
##### The insights from these clusters can assist businesses in:
##### - Designing targeted marketing campaigns
##### - Improving customer satisfaction
##### - Enhancing product recommendations and promotional strategies

### üìÅ Dataset
##### - File: Mall_Customers.csv
##### - Source: Kaggle ‚Äì Mall Customer Segmentation Data
##### - Columns:
#####     - CustomerID
#####     - Gender
#####     - Age
#####     - Annual Income (k$)
#####     - Spending Score (1-100)

### ü™¥ Import Required Librarie

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn.preprocessing import StandardScaler

### üìÇ Load the Dataset

In [None]:
df = pd.read_csv("./data/Mall_Customers.csv")
df.head()

### üìò Explore the Data

In [None]:
print("Dataset shape:", df.shape)
print("\nMissing values:\n", df.isnull().sum())

### ‚öôÔ∏è Select Relevant Features
##### We‚Äôll use Annual Income (k$) and Spending Score (1‚Äì100) for clustering they‚Äôre good indicators of customer behavior.

In [None]:
X = df[['Annual Income (k$)', 'Spending Score (1-100)']].values

### üßÆ Standardize the Data
##### Scaling ensures both features contribute equally to the distance calculation.

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### üå≥ Create the Dendrogram
##### We use Ward‚Äôs method to calculate the linkage matrix and draw the dendrogram to decide the optimal number of clusters.

In [None]:
plt.figure(figsize=(10, 6))
dendrogram(linkage(X_scaled, method='ward'))
plt.title("Dendrogram for Mall Customers")
plt.xlabel("Customers")
plt.ylabel("Euclidean Distance")
plt.show()

### ‚úÇÔ∏è Fit Hierarchical Clustering and Assign Clusters
##### After examining the dendrogram visually, we can cut it to form, for example, 5 clusters.

In [None]:
Z = linkage(X_scaled, method='ward')
clusters = fcluster(Z, t=5, criterion='maxclust')

df['Cluster'] = clusters
df.head()

### üé® Visualize Final Clusters
##### We plot the two main features again, colored by their cluster labels.

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(
    X[:, 0], X[:, 1],
    c=df['Cluster'], cmap='rainbow', s=100, alpha=0.8
)
plt.title("Customer Segments (Hierarchical Clustering)")
plt.xlabel("Annual Income (k$)")
plt.ylabel("Spending Score (1-100)")
plt.show()

### üìä Analyze the Clusters
##### We look at the average behavior of each group.

In [None]:
summary = df.groupby('Cluster')[['Annual Income (k$)', 'Spending Score (1-100)']].mean()
summary