# Mall Customer Segmentation Analysis

## Dataset Overview
This dataset contains 200 mall customers with the following features:

- Gender
- Age
- Annual Income (k$)
- Spending Score (1-100)

# DATA PREPARATION

In [1]:
# STEP-1: importing libraries
import pandas as pd
import numpy as np

In [4]:
!pip install xlrd

Defaulting to user installation because normal site-packages is not writeable
Collecting xlrd
  Downloading xlrd-2.0.2-py2.py3-none-any.whl (96 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.6/96.6 KB[0m [31m175.6 kB/s[0m eta [36m0:00:00[0m1m123.4 kB/s[0m eta [36m0:00:01[0m
Installing collected packages: xlrd
Successfully installed xlrd-2.0.2


In [10]:
import pandas as pd
#STEP-2: loading dataset
data = pd.read_csv("Mall_Customers.xls")  # It's likely CSV content with wrong extension
#STEP-3: Inspect data
print("Initial shape:", data.shape)
print(data.info())
print(data.head())

Initial shape: (200, 5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              200 non-null    int64 
 1   Gender                  200 non-null    object
 2   Age                     200 non-null    int64 
 3   Annual Income (k$)      200 non-null    int64 
 4   Spending Score (1-100)  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB
None
   CustomerID  Gender  Age  Annual Income (k$)  Spending Score (1-100)
0           1    Male   19                  15                      39
1           2    Male   21                  15                      81
2           3  Female   20                  16                       6
3           4  Female   23                  16                      77
4           5  Female   31                  17                      40


In [11]:
# Step 3: Drop duplicates
data.drop_duplicates(inplace=True)

In [12]:
data

Unnamed: 0,CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40
...,...,...,...,...,...
195,196,Female,35,120,79
196,197,Female,45,126,28
197,198,Male,32,126,74
198,199,Male,32,137,18


In [14]:
# Step 4: Handle null values
# You can choose to drop or fill missing values
data.dropna(inplace=True)  # or use df.fillna(method='ffill') as needed

In [15]:
# Step 5: Standardize column names
data.columns = data.columns.str.strip().str.lower().str.replace(' ', '_')

In [18]:
# Step 6: Check for inconsistent formats
# Example: ensure Gender is categorical
data['gender'] = data['gender'].str.title()


In [19]:

# Step 7: Review cleaned data
print("\nCleaned shape:", data.shape)
print(data.describe(include='all'))


Cleaned shape: (200, 5)
        customerid  gender         age  annual_income_(k$)  \
count   200.000000     200  200.000000          200.000000   
unique         NaN       2         NaN                 NaN   
top            NaN  Female         NaN                 NaN   
freq           NaN     112         NaN                 NaN   
mean    100.500000     NaN   38.850000           60.560000   
std      57.879185     NaN   13.969007           26.264721   
min       1.000000     NaN   18.000000           15.000000   
25%      50.750000     NaN   28.750000           41.500000   
50%     100.500000     NaN   36.000000           61.500000   
75%     150.250000     NaN   49.000000           78.000000   
max     200.000000     NaN   70.000000          137.000000   

        spending_score_(1-100)  
count               200.000000  
unique                     NaN  
top                        NaN  
freq                       NaN  
mean                 50.200000  
std                  25.823522  

In [21]:
# Optional: Save cleaned version
data.to_csv("Cleaned_Mall_Customers.csv", index=False)

In [22]:
data

Unnamed: 0,customerid,gender,age,annual_income_(k$),spending_score_(1-100)
0,1,Male,19,15,39
1,2,Male,21,15,81
2,3,Female,20,16,6
3,4,Female,23,16,77
4,5,Female,31,17,40
...,...,...,...,...,...
195,196,Female,35,120,79
196,197,Female,45,126,28
197,198,Male,32,126,74
198,199,Male,32,137,18
