# K-Means Clustering

# Importing the libraries

In [None]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

In [None]:
dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3, 4]].values

# Main Objective
The main objective of this analysis is to apply K-Means clustering on the Mall Customers dataset to identify distinct customer segments based on their annual income and spending score. The goal is to provide actionable insights for targeted marketing and personalized customer engagement.


# Data Set Description
The dataset consists of customer information from a mall, including Annual Income (k$) and Spending Score (1-100). The analysis aims to uncover natural groupings or clusters within the data, enabling businesses to tailor their approaches to different customer segments based on their spending behavior and financial indicators.


# Data Exploration and Preprocessing
 No explicit data preprocessing is performed in the provided code.

 Elbow Method for Optimal Cluster Selection


In [None]:
from sklearn.cluster import KMeans

wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)


Plotting the Elbow Method

In [None]:
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

# Training K-Means Model


In [None]:
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
y_kmeans = kmeans.fit_predict(X)

# Visualizing Clusters


In [None]:
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s=100, c='red', label='Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s=100, c='blue', label='Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s=100, c='green', label='Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s=100, c='cyan', label='Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s=100, c='magenta', label='Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow', label='Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

# Recommended Model

After applying the Elbow Method, the K-Means clustering model with 5 clusters is recommended as it indicates a significant reduction in within-cluster sum of squares. This model provides a meaningful segmentation of customers based on their spending behavior and annual income.


# Key Findings and Insights
The analysis identified five distinct customer segments with varying spending behavior and annual income. Each cluster represents a unique profile, enabling targeted marketing strategies. For instance, Cluster 1 (Red) may represent high spenders with high income, while Cluster 3 (Green) may include customers with moderate income and spending.


# Suggestions for Next Steps

To enhance the model, future steps may involve incorporating additional features, such as customer preferences or demographics. Regularly updating the dataset and retraining the model will ensure its relevance over time. Additionally, exploring other clustering algorithms and evaluating the model's performance on unseen data could further improve its accuracy.