---
title: "Customer Shopping Behaviour Analysis"
author: "Karandeep Singh"
description: "Market Analysis"
date: "2025-03-25"
categories: [SQL, Python]
type: website
html: 
toc: True
toc-title: "On this page"
page-layout: full
code-summary: "Show SQL Query"
code-links:
   - icon: github
     text: "Project Repositary"
     href: "https://github.com/gitbykaran/Customer-Shopping-Insights"
---

![](project-image.jpg)


# How does a customer shop?

## Project Overview

In this project, we will be analyzing the shopping behavior of customers in a retail store. We will be using SQL to extract the data from the database and then use Python to analyze the data. We will be using the Pandas library to perform the analysis.

## Importing Libraries

In [None]:
import pandas as pd
import numpy as np 
import duckdb as db

## Data Preprocessing

In [None]:
df = pd.read_csv('shopping_trends.csv')
df.head()

shop = df.copy()
shop.columns
shop.info()
shop.isna().sum()
shop.describe()

shop.columns = shop.columns.str.replace(' ', '_')
shop.rename(columns={'Purchase_Amount_(USD)': 'Purchase_Amount_USD'}, inplace=True)

## Connecting to Database

In [None]:
from sqlalchemy import create_engine as ce
engine = ce('mysql+pymysql://root:Karandeep2417@localhost:3306/datawarehouseanalytics')
conn = engine.connect()

shop.to_sql('shopping_trends', conn, if_exists='replace', index=False)

In [None]:
#| echo: false 
%load_ext sql 
%sql mysql+pymysql://root:Karandeep2417@localhost/datawarehouseanalytics
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

In [None]:
#| echo: false
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = 0
%config SqlMagic.displaycon = False

## Exploratory Data Analysis

#### Age & Gender Distribution of Customers

In [None]:
#| code-fold: true
%%sql
SELECT 
Gender,
AVG(Purchase_Amount_USD) AS avg_usd_spent,
SUM(Purchase_Amount_USD) AS total_usd_spent
FROM shopping_trends
GROUP BY 1;

Unnamed: 0,Gender,avg_usd_spent,total_usd_spent
0,Male,59.5362,157890
1,Female,60.2492,75191


<br>

In [None]:
#| code-fold: true
%%sql
select 
CASE WHEN Age BETWEEN 18 AND 35 THEN "Young" 
	 WHEN Age BETWEEN 35 AND 55 THEN "Middle Age"
	 ELSE "Elderly"
END AS age_group,
COUNT(*) AS customer_count,
AVG(Purchase_Amount_USD) AS avg_usd_spent,
SUM(Purchase_Amount_USD) AS total_usd_spent    
from shopping_trends
GROUP BY 1;

Unnamed: 0,age_group,customer_count,avg_usd_spent,total_usd_spent
0,Middle Age,1482,59.9548,88853
1,Young,1313,60.1462,78972
2,Elderly,1105,59.0552,65256


<br>

#### Customer Purchase Behaviors based on location, gender, and subscription status.


In [None]:
#| code-fold: true
%%sql
SELECT 
Location,
Gender,
Subscription_Status,
count(*) Customer_Count,
avg(Purchase_Amount_USD) Avg_Dollar_Spent
from shopping_trends
group by 1,2,3
ORDER BY 1 , 2 desc, 3 desc

Unnamed: 0,Location,Gender,Subscription_Status,Customer_Count,Avg_Dollar_Spent
0,Alabama,Male,Yes,22,63.4091
1,Alabama,Male,No,40,54.7000
2,Alabama,Female,No,27,62.1481
3,Alaska,Male,Yes,18,59.3889
4,Alaska,Male,No,30,68.9333
...,...,...,...,...,...
145,Wisconsin,Male,No,37,56.5405
146,Wisconsin,Female,No,25,53.4000
147,Wyoming,Male,Yes,20,58.0000
148,Wyoming,Male,No,31,67.0323


<br>

#### Most Preferred product categories and colors by different age groups

In [None]:
#| code-fold: true
%%sql
with preferred_product as
(select 
CASE WHEN Age BETWEEN 18 AND 35 THEN "Young" 
	 WHEN Age BETWEEN 35 AND 55 THEN "Middle Age"
	 ELSE "Elderly"
END AS age_group,
Category,
Color,
Count(*) total_purchased_items
from shopping_trends
group by 1,2,3
order by 1,2,4 desc),
ranked as 
(
select 
*,
rank() over(partition by age_group, Category order by total_purchased_items desc) as rnk
from  preferred_product
)
select 
age_group,
Category,
Color,
total_purchased_items
from ranked where rnk = 1;


Unnamed: 0,age_group,Category,Color,total_purchased_items
0,Elderly,Accessories,Olive,21
1,Elderly,Clothing,Charcoal,30
2,Elderly,Footwear,Olive,9
3,Elderly,Footwear,Turquoise,9
4,Elderly,Footwear,Violet,9
5,Elderly,Footwear,Lavender,9
6,Elderly,Footwear,Maroon,9
7,Elderly,Outerwear,Teal,9
8,Middle Age,Accessories,Peach,24
9,Middle Age,Accessories,Red,24


<br>

#### Do customers with higher previous purchases tend to spend more per transaction?



In [None]:
#| code-fold: true
%%sql
select 
Customer_ID,
avg(Purchase_Amount_USD) avg_dollar_spent_per_transaction,
sum(Previous_Purchases) previous_purchase
from shopping_trends
group by 1
order by 3 desc

Unnamed: 0,Customer_ID,avg_dollar_spent_per_transaction,previous_purchase
0,3257,26.0000,50
1,3262,52.0000,50
2,3394,90.0000,50
3,3436,91.0000,50
4,3438,62.0000,50
...,...,...,...
3895,3729,31.0000,1
3896,3792,51.0000,1
3897,3803,72.0000,1
3898,3865,99.0000,1


<br>

#### Average purchase amount for each category & product type generates the most revenue


In [None]:
#| code-fold: true
%%sql
select 
Category,
avg(Purchase_Amount_USD) avg_dollar_spent
from shopping_trends
group by 1 
order by 1

Unnamed: 0,Category,avg_dollar_spent
0,Accessories,59.8387
1,Clothing,60.0253
2,Footwear,60.2554
3,Outerwear,57.1728


<br>

In [None]:
#| code-fold: true
%%sql
select 
Item_Purchased as Product_Type,
sum(Purchase_Amount_USD) Revenue_Generated
from shopping_trends
group by 1 
order by 2 desc

Unnamed: 0,Product_Type,Revenue_Generated
0,Blouse,10410
1,Shirt,10332
2,Dress,10320
3,Pants,10090
4,Jewelry,10010
5,Sunglasses,9649
6,Belt,9635
7,Scarf,9561
8,Sweater,9462
9,Shorts,9433


#### Impact of purchase frequency on total spending.



In [None]:
#| code-fold: true
%%sql
select 
Frequency_of_Purchases,
sum(Purchase_Amount_USD) Dollar_Spent
from shopping_trends
group by 1
order by 2 desc; 

Unnamed: 0,Frequency_of_Purchases,Dollar_Spent
0,Every 3 Months,35088
1,Annually,34419
2,Quarterly,33771
3,Bi-Weekly,33200
4,Monthly,32810
5,Fortnightly,32007
6,Weekly,31786


<br>

#### Most preferred payment method and its impact on total spending



In [None]:
#| code-fold: true
%%sql
select 
Payment_Method,
count(*) count,
sum(Purchase_Amount_USD) Total_Dollars_Spent
from shopping_trends
group by 1
order by 1 desc,2 desc;

Unnamed: 0,Payment_Method,count,Total_Dollars_Spent
0,Venmo,653,39991
1,PayPal,638,37449
2,Debit Card,633,37118
3,Credit Card,696,42567
4,Cash,648,38833
5,Bank Transfer,632,37123


<br>

#### What are the Seasonal Product Trends?


In [None]:
#| code-fold: true
%%sql
with trends as 
(select 
Season,
Item_Purchased as Product,
count(*) Items_Purchased
from shopping_trends
group by 1,2
order by 1,3 desc
),
ranked as (
select 
*,
rank() over(PARTITION BY Season order by Items_Purchased desc) as rnk
from trends
)
select Season,Product,Items_Purchased 
from ranked
where rnk between 1 and 3;

Unnamed: 0,Season,Product,Items_Purchased
0,Fall,Jacket,54
1,Fall,Hat,50
2,Fall,Handbag,48
3,Spring,Sweater,52
4,Spring,Shorts,47
5,Spring,Blouse,46
6,Spring,Coat,46
7,Spring,Skirt,46
8,Summer,Pants,50
9,Summer,Jewelry,47


<br>

#### Impact of discounts and promo codes on sales



In [None]:
#| code-fold: true
%%sql
select 
Discount_Applied,
Promo_Code_Used ,
count(*) Count,
sum(Purchase_Amount_USD) Sales,
avg(Purchase_Amount_USD) Avg_Dollar_Spent
from shopping_trends
GROUP BY 1,2

Unnamed: 0,Discount_Applied,Promo_Code_Used,Count,Sales,Avg_Dollar_Spent
0,Yes,Yes,1677,99411,59.2791
1,No,No,2223,133670,60.1305


<br>

#### Impact of Subscription Status on Spending Habbit



In [None]:
#| code-fold: true
%%sql
select 
Subscription_Status, 
Gender,
count(*) Cust_Count,
avg(Purchase_Amount_USD) Avg_Dollar_Spent,
sum(Purchase_Amount_USD) Revenue
from shopping_trends
group by 1,2

Unnamed: 0,Subscription_Status,Gender,Cust_Count,Avg_Dollar_Spent,Revenue
0,Yes,Male,1053,59.4919,62645
1,No,Male,1599,59.5654,95245
2,No,Female,1248,60.2492,75191


<br>

#### Impact of shipping type on total purchase value



In [None]:
#| code-fold: true
%%sql
select
Shipping_Type,
count(*) Count,
sum(Purchase_Amount_USD) Total_Purchase_Value,
avg(Purchase_Amount_USD) Avg_Purchase_Value
from shopping_trends
group by 1 
order by 3 desc;

Unnamed: 0,Shipping_Type,Count,Total_Purchase_Value,Avg_Purchase_Value
0,Free Shipping,675,40777,60.4104
1,Express,646,39067,60.4752
2,Store Pickup,650,38931,59.8938
3,Standard,654,38233,58.4602
4,2-Day Shipping,627,38080,60.7337
5,Next Day Air,648,37993,58.6312


<br>

#### Correleation between review rating and avg purchase value.
 


In [None]:
#| code-fold: true
%%sql
select
Category,
Item_Purchased Product,
round(avg(Review_Rating),2) Avg_Review ,
AVG(Purchase_Amount_USD) avg_spent
from shopping_trends
group by 1,2
order by 1,4 desc;

Unnamed: 0,Category,Product,Avg_Review,avg_spent
0,Accessories,Scarf,3.7,60.8981
1,Accessories,Hat,3.81,60.8766
2,Accessories,Gloves,3.86,60.55
3,Accessories,Backpack,3.75,60.3916
4,Accessories,Sunglasses,3.74,59.9317
5,Accessories,Belt,3.76,59.8447
6,Accessories,Jewelry,3.76,58.538
7,Accessories,Handbag,3.78,57.8889
8,Clothing,T-shirt,3.78,62.9116
9,Clothing,Dress,3.75,62.1687


<br>

#### Free Shipping vs Express (Insights)



In [None]:
#| code-fold: true
%%sql
with shipping as
(select
Shipping_Type,
count(*) Count,
avg(Purchase_Amount_USD) Avg_Purchase_Value
from shopping_trends
group by 1 
order by 3 desc
)
select * from shipping
where Shipping_Type in('Free Shipping','Express')

Unnamed: 0,Shipping_Type,Count,Avg_Purchase_Value
0,Express,646,60.4752
1,Free Shipping,675,60.4104
