# Python Practice Notebook 2 - Shop Customer Analysis

## Business Task

A mid-sized retail chain has implemented a membership card system to better understand how their customers behave. Every time a customer visits the store, their information gets recorded: demographics, profession, spending patterns, family size, and income level.

The company has collected 2,000 customer profiles, but they lack the analytics capabilities to understand to identify high-value customers, understand spending behavior, target promotions effectively, or spot groups at risk of churn.

**Analyze customer traits and behavior to help the business optimize marketing, segmentation, product offerings, and customer retention strategies.**

## Import Libraries & Load the dataset

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv("data/customers_dataset.csv")
df.head()

Unnamed: 0,CustomerID,Gender,Age,Annual Income ($),Spending Score (1-100),Profession,Work Experience,Family Size
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6


In [28]:
# Rows and columns
df.shape

(2000, 8)

In [29]:
# Basic statistics
df.describe()

Unnamed: 0,CustomerID,Age,Annual Income ($),Spending Score (1-100),Work Experience,Family Size
count,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0
mean,1000.5,48.96,110731.8215,50.9625,4.1025,3.7685
std,577.494589,28.429747,45739.536688,27.934661,3.922204,1.970749
min,1.0,0.0,0.0,0.0,0.0,1.0
25%,500.75,25.0,74572.0,28.0,1.0,2.0
50%,1000.5,48.0,110045.0,50.0,3.0,4.0
75%,1500.25,73.0,149092.75,75.0,7.0,5.0
max,2000.0,99.0,189974.0,100.0,17.0,9.0


**WHY do the minimum value in the Age column is 0?? ü§î There are unrealistc values in the datset.**

In [27]:
#Basic info
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   CustomerID              2000 non-null   int64 
 1   Gender                  2000 non-null   object
 2   Age                     2000 non-null   int64 
 3   Annual Income ($)       2000 non-null   int64 
 4   Spending Score (1-100)  2000 non-null   int64 
 5   Profession              1965 non-null   object
 6   Work Experience         2000 non-null   int64 
 7   Family Size             2000 non-null   int64 
dtypes: int64(6), object(2)
memory usage: 205.2+ KB


**The column Profession has null values.We need to handle them.‚úÇÔ∏è**

In [26]:
# check for null values
df.isnull().sum()

CustomerID                 0
Gender                     0
Age                        0
Annual Income ($)          0
Spending Score (1-100)     0
Profession                35
Work Experience            0
Family Size                0
dtype: int64

In [24]:
#renaming the columns for convenience
df2 = df.rename(columns={"Annual Income ($)":"AnnualIncome",
              "Spending Score (1-100)":"SpendingScore",
              "Work Experience":"WorkExperience",
              "Family Size":"FamilySize"})
df2.head()

Unnamed: 0,CustomerID,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
0,1,Male,19,15000,39,Healthcare,1,4
1,2,Male,21,35000,81,Engineer,3,3
2,3,Female,20,86000,6,Engineer,1,1
3,4,Female,23,59000,77,Lawyer,0,2
4,5,Female,31,38000,40,Entertainment,2,6


**Now the dataset looks clean and simple to work! üéâüòé**

## Questions And Solutions

### Q1. Which age groups make up most of our customers?

In [38]:
df2[(df2['Age'] <= 16)]


Unnamed: 0,CustomerID,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
202,203,Female,16,60000,0,Engineer,6,8
210,211,Female,1,57000,93,Engineer,1,2
211,212,Female,0,22000,92,Artist,2,1
228,229,Male,0,33000,64,Marketing,1,1
229,230,Male,15,94000,30,Healthcare,7,2
...,...,...,...,...,...,...,...,...
1974,1975,Female,14,153145,59,Healthcare,8,6
1979,1980,Male,0,165321,93,Doctor,8,1
1980,1981,Female,10,86925,76,Artist,7,2
1984,1985,Female,2,153622,51,Lawyer,6,6


**WHY do the **

### Q2. Do men or women spend more?

### Q3. Which income bracket dominates our customer base?

### Q4. Who are the ‚Äúhigh-potential but low-spend‚Äù customers?

### Q5. Which professions spend the most?

### Q6. Do younger customers spend more?

### Q7. Do highly experienced workers spend less?

### Q8. Do larger families spend more?