# 🧠 Customer Segmentation Tool

## 🔍 Project Overview

This project is part of the **CodeClause Internship (Golden Level)**. The goal is to develop a **Customer Segmentation Tool** using clustering algorithms. The project also includes designing a **UI for data input**, applying **machine learning (K-Means)** to segment customers based on their behavior, and evaluating the results.

### 📅 Duration

- **Start Date**: 01 June 2025  
- **End Date**: 30 June 2025  
- **Assigned to**: Samira Yousefzadeh

## 📚 Table of Contents
1. [Introduction](#introduction)
2. [Importing Libraries](#importing-libraries)
3. [Loading Dataset](#loading-dataset)
4. [Exploratory Data Analysis](#exploratory-data-analysis)
5. [Data Cleaning](#data-cleaning)
6. [Feature Engineering](#feature-engineering)
7. [Train-Test Split](#train-test-split)
8. [Modeling with K-Means](#modeling-with-k-means)
9. [Evaluation and Insights](#evaluation-and-insights)
10. [Model Saving](#model-saving)
11. [Conclusion](#conclusion)

## 1. 🧠 Introduction
Customer segmentation is a process of dividing customers into groups based on common characteristics. In this project, we’ll use clustering (specifically K-Means) to group similar customers together for better targeting and marketing.

## 2. 📦 Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')

In [None]:
# ## 3. 📂 Loading Dataset
df = pd.read_csv('Mall_Customers.csv')
df.head()

In [None]:
# ## 4. 📊 Exploratory Data Analysis
print(df.info())
df.describe()

In [None]:
sns.pairplot(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']])
plt.show()

In [None]:
plt.figure(figsize=(8,5))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

In [None]:
# ## 5. 🧹 Data Cleaning
df.isnull().sum()  # Check for missing values

In [None]:
# Drop CustomerID for modeling
df_model = df.drop('CustomerID', axis=1)

In [None]:
# ## 6. ⚙️ Feature Engineering
# Encode Gender
df_model['Gender'] = df_model['Gender'].map({'Male': 0, 'Female': 1})

In [None]:
# Scaling
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df_model)

In [None]:
# ## 7. 🧪 Train-Test Split (Not typical for clustering but useful for visualization/eval)
# In clustering, we use all data for unsupervised learning

In [None]:
# ## 8. 🧩 Modeling with K-Means
inertia = []
silhouette = []
k_range = range(2, 11)

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(df_scaled)
    inertia.append(kmeans.inertia_)
    silhouette.append(silhouette_score(df_scaled, kmeans.labels_))

In [None]:
plt.plot(k_range, inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.grid(True)
plt.show()

In [None]:
plt.plot(k_range, silhouette, marker='s')
plt.title('Silhouette Score')
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.grid(True)
plt.show()

In [None]:
# Final KMeans Model
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(df_scaled)
df['Cluster'] = kmeans.labels_

In [None]:
sns.scatterplot(data=df, x='Annual Income (k$)', y='Spending Score (1-100)', hue='Cluster', palette='tab10')
plt.title('Customer Segments')
plt.show()

In [None]:
# ## 9. 🧾 Evaluation and Insights
cluster_centers = scaler.inverse_transform(kmeans.cluster_centers_)
pd.DataFrame(cluster_centers, columns=df_model.columns)

In [None]:
# ## 10. 💾 Model Saving
import joblib
joblib.dump(kmeans, 'kmeans_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

In [None]:
# ## 11. ✅ Conclusion
print("Successfully built a customer segmentation model using KMeans. Saved model and scaler.")