# K-means Clustering Example with the Mall Customers Dataset

## Introduction

In this activity, we will perform a K-means clustering analysis using the Mall Customers dataset. The objective is to cluster the customers into distinct groups based on their spending patterns. This exercise will help you understand the process of implementing and evaluating a K-means clustering model.

## The Dataset

The Mall Customers dataset contains the following features:
- `CustomerID`: Unique ID assigned to the customer
- `Gender`: Gender of the customer
- `Age`: Age of the customer
- `Annual Income (k$)`: Annual income of the customer in thousand dollars
- `Spending Score (1-100)`: Spending score assigned by the mall based on customer behavior and spending nature


## Objective

You will:

    1. Load and explore the dataset.
    2. Preprocess the data.
    3. Determine the optimal number of clusters using the elbow method.
    4. Implement the K-means clustering algorithm.
    5. Evaluate the model's performance.
    6. Visualize the clusters.

Let's get started!

## Import necessary libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

## Load the Mall Customers dataset

In [None]:
# Load the Mall Customers dataset
data = pd.---(---)
# Display the first few rows of the dataset
---

## Data Preprocessing

- Select relevant features and standardize them using `StandardScaler`.

In [None]:
# TODO: Select relevant features and standardize them
features = [---]
data_features = data[features]

In [None]:
# Standardize the features
scaler = StandardScaler()
data_scaled = scaler.fit_transform(---)

## Determining the Optimal Number of Clusters

- Use the elbow method to find the optimal number of clusters by plotting the inertia for a range of cluster numbers.

In [None]:
# TODO: Use the elbow method to find the optimal number of clusters
# HINT: Iterate over a range of cluster numbers and compute the inertia for each
inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=--, random_state=42)
    kmeans.fit(---)
    inertia.append(kmeans.inertia_)

# Plot the elbow curve
plt.figure(figsize=(8, 4))
plt.plot(range(1, 11, inertia, 'bo-'))
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal Number of Clusters')
plt.show()

## Implementing K-means Clustering

- Create an instance of `KMeans` with the optimal number of clusters and fit it to the scaled data.

In [None]:
# TODO: Create an instance of KMeans with the optimal number of clusters and fit it to the scaled data
# HINT: Use the class `KMeans` from sklearn.cluster
optimal_clusters = --
kmeans = ---(n_clusters=--, random_state=42)

# Fit the model to the scaled data
# HINT: Use the function `fit_predict`
data['cluster'] = kmeans.---(---)

## Evaluating the Mode

- Calculate the silhouette score for the model.

In [None]:
# TODO: Calculate the silhouette score for the model
# HINT: Use the function `silhouette_score` from sklearn.metrics
silhouette_avg = ---(---, ---)

print(f'Silhouette Score: {silhouette_avg}')

## Visualize the clusters

In [None]:
plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'], c=data['cluster'], cmap='viridis')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('K-means Clustering of Mall Customers')
plt.show()

## Discussion Questions:
    1. What does the silhouette score indicate about the quality of the clusters?
    2. How well do the clusters correspond to different types of customers based on their spending patterns?
    3. Are there any improvements you can suggest for the clustering model?