<a href="https://colab.research.google.com/github/linhb03/Ai118Project/blob/dev/Mall_Customers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project utilizes the "Mall Customer Segmentation" dataset, a publicly available resource sourced from Kaggle. The dataset contains anonymized information about mall customers, including their age, gender, annual income, and a calculated spending score.
https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python



In [3]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from google.colab import files

print("Please upload your customer data CSV file.")
uploaded = files.upload()
file_name = next(iter(uploaded))
df = pd.read_csv(file_name)

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')

print("\nData uploaded and cleaned successfully:")
print("New column names:", df.columns.tolist())
print(df.head())
print("\n---")

features_to_cluster = ['age', 'annual_income_(k$)', 'spending_score_(1-100)']
X = df[features_to_cluster]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

inertia = []
K = range(1, 8)
for k in K:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init='auto')
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(8, 5))
plt.plot(K, inertia, 'bo-')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal K')
plt.savefig('elbow_plot.png')
plt.close()

optimal_k = 3
kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init='auto')
df['cluster'] = kmeans.fit_predict(X_scaled)

print("\nClustering complete:")
print(df.head())
print("\n---")

print("Cluster Analysis:")
print(df.groupby('cluster')[features_to_cluster].mean())
print("\n---")

if len(features_to_cluster) >= 2:
    plt.figure(figsize=(10, 6))

    feature1_name = 'annual_income_(k$)'
    feature2_name = 'spending_score_(1-100)'

    plt.scatter(df[feature1_name], df[feature2_name], c=df['cluster'], cmap='viridis')
    plt.xlabel(feature1_name)
    plt.ylabel(feature2_name)
    plt.title('Customer Clusters')
    plt.savefig('customer_clusters.png')
    plt.close()


Please upload your customer data CSV file.


Saving Mall_Customers.csv to Mall_Customers (2).csv

Data uploaded and cleaned successfully:
New column names: ['customerid', 'gender', 'age', 'annual_income_(k$)', 'spending_score_(1-100)']
   customerid  gender  age  annual_income_(k$)  spending_score_(1-100)
0           1    Male   19                  15                      39
1           2    Male   21                  15                      81
2           3  Female   20                  16                       6
3           4  Female   23                  16                      77
4           5  Female   31                  17                      40

---

Clustering complete:
   customerid  gender  age  annual_income_(k$)  spending_score_(1-100)  \
0           1    Male   19                  15                      39   
1           2    Male   21                  15                      81   
2           3  Female   20                  16                       6   
3           4  Female   23                  16              