<a href="https://www.kaggle.com/code/sahilkhan70/customer-segmentation-analysis?scriptVersionId=205695767" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/customer-personality-analysis/marketing_campaign.csv


<h1 style="
    display: inline-block;
    color: black; 
    background-color: orange; 
    padding: 10px 20px; 
    border-radius: 8px; 
    box-shadow: 3px 3px 8px rgba(0, 0, 0, 0.3);
    text-align: center;
    font-size: 20px;
    font-weight: bold;
    margin: 0;
">
    Introduction to Customer Segmentation
</h1>
<br></br>
<p style="font-size: 16px; font-weight: bold;"><h3>💡 Problem Statement:</h3></p> 
<p style="font-size: 14px;">In today's competitive marketplace, businesses generate massive amounts of customer data, and understanding this data is crucial to stay ahead. However, not all customers behave the same. They have different preferences, buying patterns, and responses to marketing strategies.</p>

<p style="font-size: 16px; font-weight: bold;"><h3>🔍 Need for Customer Segmentation:</h3></p> 
<p style="font-size: 14px;">Customer segmentation is the process of dividing customers into distinct groups based on specific characteristics and behaviors. This allows businesses to:</p>

<ul style="font-size: 14px;">
  <li>🔹 Identify high-value customers.</li>
  <li>🔹 Tailor marketing campaigns and offers.</li>
  <li>🔹 Improve customer retention and satisfaction.</li>
  <li>🔹 Optimize resource allocation by focusing on key segments.</li>
</ul>

<p style="font-size: 14px;">The goal of this project is to use data-driven techniques to segment a customer base effectively, providing actionable insights for the business to target each group appropriately.</p>

<p style="font-size: 16px; font-weight: bold;"><h3>📊 Project Overview:</h3></p> 
<p style="font-size: 14px;">In this project, we will:</p>

<ul style="font-size: 14px;">
  <li>🟢 Explore and preprocess the dataset.</li>
  <li>🟢 Apply clustering algorithms to segment customers.</li>
  <li>🟢 Visualize and interpret the resulting segments.</li>
  <li>🟢 Provide business recommendations based on the segmentation.</li>
</ul>

<h1 style="
    display: inline-block;
    color: white; 
    background-color: #4A90E2; 
    padding: 10px 20px; 
    border-radius: 8px; 
    box-shadow: 3px 3px 8px rgba(0, 0, 0, 0.3);
    text-align: center;
    font-size: 20px;
    font-weight: bold;
    margin: 0;
">
    Importing Required Libraries
</h1>
<br></br>


In [3]:
# Data Manipulation and Analysis
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import style

# Machine Learning Libraries
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score

# Dimensionality Reduction and Clustering
from sklearn.model_selection import train_test_split
from sklearn.manifold import TSNE

# Warnings handling
import warnings
warnings.filterwarnings('ignore')

# Plotting Style
style.use('ggplot')

<h1 style="
    display: inline-block;
    color: white; 
    background-color: #4A90E2; 
    padding: 10px 20px; 
    border-radius: 8px; 
    box-shadow: 3px 3px 8px rgba(0, 0, 0, 0.3);
    text-align: center;
    font-size: 20px;
    font-weight: bold;
    margin: 0;
">
    Step 1: Understanding the Data
</h1>
<br></br>
<p style="font-size: 16px; font-weight: bold;">📝 Purpose:</p> 
<p style="font-size: 14px;">We will begin by exploring the dataset, its structure, and the type of data it contains. This step is crucial to identify which features will be important for segmentation.</p>

<ul style="font-size: 14px;">
  <li>Explore the dataset columns.</li>
  <li>Check for missing values and data types.</li>
</ul>

In [4]:
# Loading the dataset, the is TSV (Tab Seperated Values) so use sep = '\t'
df = pd.read_csv("/kaggle/input/customer-personality-analysis/marketing_campaign.csv", sep = "\t")

In [5]:
# Shapa of the dataset
print(f"The dataset has {df.shape[0]} rows and {df.shape[1]} columns.")

The dataset has 2240 rows and 29 columns.


In [6]:
df

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,08-03-2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,10-02-2014,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,19-01-2014,94,173,...,5,0,0,0,0,0,0,3,11,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2235,10870,1967,Graduation,Married,61223.0,0,1,13-06-2013,46,709,...,5,0,0,0,0,0,0,3,11,0
2236,4001,1946,PhD,Together,64014.0,2,1,10-06-2014,56,406,...,7,0,0,0,1,0,0,3,11,0
2237,7270,1981,Graduation,Divorced,56981.0,0,0,25-01-2014,91,908,...,6,0,1,0,0,0,0,3,11,0
2238,8235,1956,Master,Together,69245.0,0,1,24-01-2014,8,428,...,3,0,0,0,0,0,0,3,11,0


In [7]:
# Printing the short summary
df.describe()

Unnamed: 0,ID,Year_Birth,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
count,2240.0,2240.0,2216.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,...,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0
mean,5592.159821,1968.805804,52247.251354,0.444196,0.50625,49.109375,303.935714,26.302232,166.95,37.525446,...,5.316518,0.072768,0.074554,0.072768,0.064286,0.013393,0.009375,3.0,11.0,0.149107
std,3246.662198,11.984069,25173.076661,0.538398,0.544538,28.962453,336.597393,39.773434,225.715373,54.628979,...,2.426645,0.259813,0.262728,0.259813,0.245316,0.114976,0.096391,0.0,0.0,0.356274
min,0.0,1893.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
25%,2828.25,1959.0,35303.0,0.0,0.0,24.0,23.75,1.0,16.0,3.0,...,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
50%,5458.5,1970.0,51381.5,0.0,0.0,49.0,173.5,8.0,67.0,12.0,...,6.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
75%,8427.75,1977.0,68522.0,1.0,1.0,74.0,504.25,33.0,232.0,50.0,...,7.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
max,11191.0,1996.0,666666.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,...,20.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,11.0,1.0


In [8]:
# Checking if there is null values in the dataset
df.isnull().sum()

ID                      0
Year_Birth              0
Education               0
Marital_Status          0
Income                 24
Kidhome                 0
Teenhome                0
Dt_Customer             0
Recency                 0
MntWines                0
MntFruits               0
MntMeatProducts         0
MntFishProducts         0
MntSweetProducts        0
MntGoldProds            0
NumDealsPurchases       0
NumWebPurchases         0
NumCatalogPurchases     0
NumStorePurchases       0
NumWebVisitsMonth       0
AcceptedCmp3            0
AcceptedCmp4            0
AcceptedCmp5            0
AcceptedCmp1            0
AcceptedCmp2            0
Complain                0
Z_CostContact           0
Z_Revenue               0
Response                0
dtype: int64