# <span style=" text-align:center; color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ Customer Segmentation Using K-mean clustering</span>


<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxWHHYqf0Klt1-euLEnH5EGkPHlCMbcIhsPkSGlD-4vK7t29CpkjJXm09E23lCZK8Kj8IM-xsEps8L9CeE2iCU-bqFV9SUDC9crtPiujr121oYykVokczfCoW-9fUZJqjggK4Wx5Od__gl2TBxkZYli1swJVbIgB7MmHSCBtpQxpsdODD881ZpS5sMKBw/s792/customer_segmenation.png" alt="Customer Segmentation" style="width: 100%; max-width: 800px; height: auto; display: block; margin: 0 auto; border: solid 2px #007BFF; box-shadow: 5px 5px 10px rgba(0, 0, 0, 0.2); border-radius: 20px;">


# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ Table of Content</span>

<div style="border: solid 2px #007BFF; background-color:aliceblue; padding:15px;margin:0;font-family:Georgia; margin:20px;">  
    
* [1. Introduction](#1)    
* [2. Data Loading and Exploration](#2)    
* [3. Data preprocessing](#3)
* [4. Applying K-Means Clustering](#4)  
* [5. Analyzing Segments](#5)
* [6. Conclusion](#6)

<a id="1"></a>
# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 1. Introduction</span>

<p style="font-size: 18px; line-height: 1.5; font-family: Arial, sans-serif; color: #333; font-family:Georgia; border: solid #007BFF; padding: 15px;">
    In the world of retail, knowing what customers like and how they shop is really important. To understand this, we often group customers based on their shopping habits. One popular way to do this is called K-means clustering. It helps us find groups of customers who buy similar things.
    <br><br>
    In this project, we're going to use K-means clustering to group customers from a retail store based on how much they earn each year and how much they spend when they shop. By doing this, we hope to learn more about different types of customers and how they behave when shopping.
    <br><br>
    Our goal is to answer a few important questions:
    <br><br>
    <strong>1. What kinds of customer groups do we have?</strong>
    <br>
    <strong>2. How do these groups differ in terms of how much they earn and spend?</strong>
    <br>
    <strong>3. What can we learn from these groups to help the store improve its business?</strong>
</p>


In [None]:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


<a id="2"></a>

# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 2. Load Data and Exploration</span>


In [None]:
df = pd.read_csv('/kaggle/input/customer-segmentation-tutorial-in-python/Mall_Customers.csv')

In [None]:
df.head(5)

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.isnull().sum()

In [None]:
# Drop unnecessary columns if any
df = df.drop('CustomerID', axis=1)

In [None]:
# Select features for clustering
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

In [None]:
X

In [None]:
import seaborn as sns
plt.figure(figsize=(10, 6)) 
sns.scatterplot(X, x= "Annual Income (k$)", y= "Spending Score (1-100)")
plt.show()

<a id="3"></a>

# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 3. Data Preprocessing</span>

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Feature normalization
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

In [None]:
# Choosing the Optimum Number of Clusters (Elbow Method)
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)
print(wcss)

In [None]:
# Plotting the Elbow Method graph with styling
plt.figure(figsize=(10, 6)) 
plt.plot(range(1, 11), wcss, marker='o', linestyle='--', color='b')  
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('Within-Cluster Sum of Squares (WCSS)')
plt.xticks(range(1, 11))  
plt.grid(True) 
plt.show()


<a id="4"></a>

# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 4. Apply k-Mean clustering</span>

In [None]:
# Training the K-Means Clustering Model
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
kmeans.fit(X_scaled)

<a id="5"></a>

# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 5. Analyze Segments</span>

In [None]:
plt.figure(figsize=(10, 8))
for cluster_label in range(5):  # Loop through each cluster label
    cluster_points = X[kmeans.labels_ == cluster_label]
    centroid = cluster_points.mean(axis=0)  # Calculate the centroid as the mean position of the data points
    plt.scatter(cluster_points['Annual Income (k$)'], cluster_points['Spending Score (1-100)'],
                s=50, label=f'Cluster {cluster_label + 1}')  # Plot points for the current cluster
    plt.scatter(centroid[0], centroid[1], s=300, c='black', marker='*', label=f'Centroid {cluster_label + 1}')  # Plot the centroid
plt.title('Clusters of Customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()


<a id="6"></a>

# <span style="color: #007BFF; font-size: 24px; font-weight: bold; border: 2px solid #007BFF; padding: 10px; background-color: #D0C5D8; display: block; width: 100%;">➡️ 6. Conclusion</span>

<p style="font-size: 18px; line-height: 1.5; font-family: Arial, sans-serif; color: #333; font-family:Georgia; border: solid #007BFF; padding: 15px;">
    The result of the analysis shows that the retail store customers can be grouped into 5 clusters or segments for targeted marketing.
    <br><br>
    <strong style="color: blue;">Cluster 1 (Blue):</strong> These are low-income earning customers with high spending scores. I can assume that why this group of customers spend more at the retail store despite earning less is because they enjoy and are satisfied with the services rendered at the retail store.
    <br><br>
    <strong style="color: orange;">Cluster 2 (Orange):</strong> This group of customers have a higher income but they do not spend more at the store. One of the assumptions could be that they are not satisfied with the services rendered at the store. They are another ideal group to be targeted by the marketing team because they have the potential to bring in increased profit for the store.
    <br><br>
    <strong style="color: green;">Cluster 3 (Green):</strong> The customers in this group are high-income earners with high spending scores. They bring in profit. Discounts and other offers targeted at this group will increase their spending score and maximize profit.
    <br><br>
    <strong style="color: red;">Cluster 4 (Red):</strong> These are average income earners with average spending scores. They are cautious with their spending at the store.
    <br><br>
    <strong style="color: purple;">Cluster 5 (Purple):</strong> Low-income earners with a low spending score. I can assume that this is so because people with low income will tend to purchase fewer items at the store.
</p>
