https://www.statology.org/performing-cluster-analysis-in-python-a-step-by-step-tutorial/
For this tutorial, we will use the mall customers dataset, which is publicly and widely available for download from several GitHub repositories, for instance, this one: https://github.com/kennedykwangari/Mall-Customer-Segmentation-Data/blob/master/Mall_Customers.csv
This dataset contains information about customers in a shopping mall: gender, age, annual income, and spending score. The latter represents an indicator ranging from 1 to 100 of the money spent by the customer in the mall. Thus, we aim to find and analyze hidden groups of customers with similar traits or patterns.

In [4]:
# Import necessary libraries
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset from URL
url = 'https://raw.githubusercontent.com/kennedykwangari/Mall-Customer-Segmentation-Data/master/Mall_Customers.csv'
df = pd.read_csv(url)

# Quick glimpse of the data
df.head()


Unnamed: 0,CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40


In [5]:
# Select relevant features for clustering (e.g., Age, Annual Income, Spending Score)
# Store the selected data attributes in a new Dataframe, named 'X'
X = df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]

# Feature scaling (standardization) using the StandardScaler() class available in sklearn library
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [6]:
# Determine optimal number of clusters (K) using the Elbow Method
inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

# Plot the elbow method to decide on the best 'K'
plt.figure(figsize=(8, 5))
plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()



AttributeError: 'NoneType' object has no attribute 'split'

In [None]:
# Apply K-means with K=5
# kmeans = KMeans(n_clusters=5, random_state=42)
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X_scaled)

# Add the cluster identifiers as a new attribute in the original data
df['Cluster'] = kmeans.labels_

In [None]:
# Visualize the clusters using annual income and spending score
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['Annual Income (k$)'], y=df['Spending Score (1-100)'], hue=df['Cluster'], palette='viridis', s=100)
plt.scatter(kmeans.cluster_centers_[:, 1], kmeans.cluster_centers_[:, 2], s=300, c='red', label='Centroids')
plt.title('Customer Segments based on income and spending Score')
plt.legend()
plt.show()