# DESCRIPTIVE ANALYTICS: "WHAT HAPPENED?"

## E-commerce Customer Analytics - Part 1 of 4

OBJECTIVE: Understand historical patterns and trends in our e-commerce data
- Sales performance over time
- Customer behavior patterns  
- Product performance analysis
- Revenue trends and seasonality

### 1. Data loading  & initial exploration

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set styling
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("🔍 DESCRIPTIVE ANALYTICS: What Happened?")
print("="*50)

🔍 DESCRIPTIVE ANALYTICS: What Happened?


In [5]:
print("\n📊 STEP 1: Loading E-commerce Data")
print("-" * 30)

# Load datasets
customers = pd.read_csv('dataset/customers.csv')
products = pd.read_csv('dataset/products.csv')
transactions = pd.read_csv('dataset/transactions.csv')
campaigns = pd.read_csv('dataset/marketing_campaigns.csv')
tickets = pd.read_csv('dataset/support_tickets.csv')

# Convert date columns
transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'])
customers['registration_date'] = pd.to_datetime(customers['registration_date'])

print(f"✅ Loaded {len(customers):,} customers")
print(f"✅ Loaded {len(products):,} products")
print(f"✅ Loaded {len(transactions):,} transactions")
print(f"✅ Loaded {len(campaigns):,} marketing campaigns")
print(f"✅ Loaded {len(tickets):,} support tickets")

# Basic data overview
print(f"\n📅 Transaction Date Range: {transactions['transaction_date'].min().date()} to {transactions['transaction_date'].max().date()}")
print(f"💰 Total Revenue: ${transactions['total_amount'].sum():,.2f}")
print(f"🛒 Average Order Value: ${transactions['total_amount'].mean():.2f}")


📊 STEP 1: Loading E-commerce Data
------------------------------
✅ Loaded 5,000 customers
✅ Loaded 1,000 products
✅ Loaded 50,000 transactions
✅ Loaded 20 marketing campaigns
✅ Loaded 2,000 support tickets

📅 Transaction Date Range: 2022-01-01 to 2024-12-30
💰 Total Revenue: $2,277,263.58
🛒 Average Order Value: $45.55


### 2. SALES PERFORMANCE ANALYSIS

In [6]:
print("\n\n📈 STEP 2: Sales Performance Analysis")
print("-" * 35)

# Monthly sales trend
transactions['year_month'] = transactions['transaction_date'].dt.to_period('M')
monthly_sales = transactions.groupby('year_month').agg({
    'total_amount': 'sum',
    'transaction_id': 'count',
    'customer_id': 'nunique'
}).round(2)

monthly_sales.columns = ['Total Revenue', 'Total Orders', 'Unique Customers']
monthly_sales['Avg Order Value'] = (monthly_sales['Total Revenue'] / monthly_sales['Total Orders']).round(2)

print("📊 Monthly Sales Summary (Last 6 months):")
print(monthly_sales.tail(6))

# Revenue by year
yearly_revenue = transactions.groupby(transactions['transaction_date'].dt.year)['total_amount'].sum()
print(f"\n💰 Revenue by Year:")
for year, revenue in yearly_revenue.items():
    print(f"   {year}: ${revenue:,.2f}")

# Growth rates
revenue_growth = yearly_revenue.pct_change() * 100
print(f"\n📈 Year-over-Year Growth:")
for year, growth in revenue_growth.dropna().items():
    print(f"   {year}: {growth:+.1f}%")



📈 STEP 2: Sales Performance Analysis
-----------------------------------
📊 Monthly Sales Summary (Last 6 months):
            Total Revenue  Total Orders  Unique Customers  Avg Order Value
year_month                                                                
2024-07          62317.78          1421              1212            43.85
2024-08          67406.56          1447              1261            46.58
2024-09          64634.70          1385              1208            46.67
2024-10          62254.32          1449              1248            42.96
2024-11          64275.24          1354              1190            47.47
2024-12          63223.42          1353              1192            46.73

💰 Revenue by Year:
   2022: $760,491.67
   2023: $767,265.27
   2024: $749,506.64

📈 Year-over-Year Growth:
   2023: +0.9%
   2024: -2.3%


### 3. CUSTOMER BEHAVIOR ANALYSIS

In [7]:
print("\n\n👥 STEP 3: Customer Behavior Analysis")
print("-" * 35)

# Customer segments analysis
segment_analysis = customers.groupby('customer_segment').agg({
    'customer_id': 'count',
    'total_spent': 'mean',
    'total_transactions': 'mean',
    'avg_order_value': 'mean',
    'is_churned': 'mean'
}).round(2)

segment_analysis.columns = ['Customer Count', 'Avg Total Spent', 'Avg Transactions', 'Avg Order Value', 'Churn Rate']
print("🎯 Customer Segment Analysis:")
print(segment_analysis)

# Age group analysis
customers['age_group'] = pd.cut(customers['age'],
                               bins=[0, 25, 35, 45, 55, 100],
                               labels=['18-25', '26-35', '36-45', '46-55', '55+'])

age_analysis = customers.groupby('age_group').agg({
    'customer_id': 'count',
    'total_spent': 'mean',
    'is_churned': 'mean'
}).round(2)

print(f"\n👶 Customer Age Group Analysis:")
print(age_analysis)

# Top spending customers
top_customers = customers.nlargest(10, 'total_spent')[['customer_id', 'first_name', 'last_name', 'total_spent', 'total_transactions']]
print(f"\n🏆 Top 10 Customers by Spending:")
print(top_customers)



👥 STEP 3: Customer Behavior Analysis
-----------------------------------
🎯 Customer Segment Analysis:
                  Customer Count  Avg Total Spent  Avg Transactions  \
customer_segment                                                      
Budget                      1306           450.56             10.08   
Premium                      763           457.40              9.99   
Regular                     2931           457.12              9.97   

                  Avg Order Value  Churn Rate  
customer_segment                               
Budget                      44.99        0.31  
Premium                     45.66        0.29  
Regular                     45.90        0.29  

👶 Customer Age Group Analysis:
           customer_id  total_spent  is_churned
age_group                                      
18-25             1125       456.91        0.27
26-35             1514       450.14        0.32
36-45             1470       456.94        0.30
46-55              697      

### 4. PRODUCT PERFORMANCE ANALYSIS

In [8]:
print("\n\n📦 STEP 4: Product Performance Analysis")
print("-" * 35)

# Product sales analysis
product_sales = transactions.groupby('product_id').agg({
    'quantity': 'sum',
    'total_amount': 'sum',
    'transaction_id': 'count'
}).round(2)

product_sales.columns = ['Total Quantity Sold', 'Total Revenue', 'Number of Orders']
product_sales = product_sales.merge(products[['product_id', 'product_name', 'category', 'price']],
                                   on='product_id', how='left')

# Top products by revenue
top_products_revenue = product_sales.nlargest(10, 'Total Revenue')
print("💰 Top 10 Products by Revenue:")
print(top_products_revenue[['product_name', 'category', 'Total Revenue', 'Total Quantity Sold']])

# Category performance
category_performance = product_sales.groupby('category').agg({
    'Total Revenue': 'sum',
    'Total Quantity Sold': 'sum',
    'Number of Orders': 'sum'
}).round(2)

category_performance = category_performance.sort_values('Total Revenue', ascending=False)
print(f"\n🏷️ Category Performance:")
print(category_performance)




📦 STEP 4: Product Performance Analysis
-----------------------------------
💰 Top 10 Products by Revenue:
                                          product_name       category  \
656                  Balanced methodical workforce Bar  Home & Garden   
228                   Mandatory holistic adapter Agent         Beauty   
154             Switchable non-volatile protocol There  Home & Garden   
857        Virtual motivating process improvement Sure    Electronics   
324                     Profound secondary access Need         Beauty   
43            Polarized bi-directional core Conference       Clothing   
512                 Adaptive intermediate solution Age          Books   
382      Quality-focused bottom-line flexibility Bring         Beauty   
26               Enhanced encompassing approach Nature           Toys   
658  Synchronized 3rdgeneration collaboration Consider     Automotive   

     Total Revenue  Total Quantity Sold  
656       26171.31                   97  
228  

### 5. SEASONAL TRENDS ANALYSIS

In [9]:
print("\n\n🗓️ STEP 5: Seasonal Trends Analysis")
print("-" * 30)

# Monthly seasonality
transactions['month'] = transactions['transaction_date'].dt.month
monthly_seasonality = transactions.groupby('month').agg({
    'total_amount': ['sum', 'mean', 'count']
}).round(2)

monthly_seasonality.columns = ['Total Revenue', 'Avg Order Value', 'Order Count']
print("📅 Monthly Seasonality Pattern:")
print(monthly_seasonality)

# Day of week analysis
transactions['day_of_week'] = transactions['transaction_date'].dt.day_name()
dow_analysis = transactions.groupby('day_of_week')['total_amount'].agg(['sum', 'count', 'mean']).round(2)
dow_analysis.columns = ['Total Revenue', 'Order Count', 'Avg Order Value']

# Reorder by actual day sequence
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_analysis = dow_analysis.reindex(day_order)
print(f"\n📆 Day of Week Analysis:")
print(dow_analysis)



🗓️ STEP 5: Seasonal Trends Analysis
------------------------------
📅 Monthly Seasonality Pattern:
       Total Revenue  Avg Order Value  Order Count
month                                             
1          191258.79            46.63         4102
2          177334.73            45.40         3906
3          191082.02            44.36         4308
4          193666.29            46.58         4158
5          193188.23            45.21         4273
6          186654.21            45.07         4141
7          192916.83            45.48         4242
8          193376.14            45.43         4257
9          188815.05            46.05         4100
10         189841.40            44.73         4244
11         187481.20            46.23         4055
12         191648.69            45.48         4214

📆 Day of Week Analysis:
             Total Revenue  Order Count  Avg Order Value
day_of_week                                             
Monday           322213.55         7164        

### 6. CUSTOMER LIFECYCLE ANALYSIS

In [10]:
print("\n\n🔄 STEP 6: Customer Lifecycle Analysis")
print("-" * 35)

# Customer acquisition by month
customers['reg_year_month'] = pd.to_datetime(customers['registration_date']).dt.to_period('M')
acquisition = customers.groupby('reg_year_month').size()
print("📈 Customer Acquisition Trend (Last 6 months):")
print(acquisition.tail(6))

# Churn analysis
churn_rate = customers['is_churned'].mean() * 100
active_customers = customers[customers['is_churned'] == 0]
churned_customers = customers[customers['is_churned'] == 1]

print(f"\n⚠️ Churn Analysis:")
print(f"   Overall Churn Rate: {churn_rate:.1f}%")
print(f"   Active Customers: {len(active_customers):,}")
print(f"   Churned Customers: {len(churned_customers):,}")

# Days since last purchase distribution
print(f"\n📊 Days Since Last Purchase Distribution:")
print(customers['days_since_last_purchase'].describe())



🔄 STEP 6: Customer Lifecycle Analysis
-----------------------------------
📈 Customer Acquisition Trend (Last 6 months):
reg_year_month
2025-03    170
2025-04    142
2025-05    141
2025-06    137
2025-07    139
2025-08     70
Freq: M, dtype: int64

⚠️ Churn Analysis:
   Overall Churn Rate: 29.7%
   Active Customers: 3,513
   Churned Customers: 1,487

📊 Days Since Last Purchase Distribution:
count    5000.000000
mean      339.819200
std       108.883665
min       231.000000
25%       262.000000
50%       307.000000
75%       382.250000
max      1128.000000
Name: days_since_last_purchase, dtype: float64


### 7. REVENUE CONCENTRATION ANALYSIS

In [11]:
print("\n\n💎 STEP 7: Revenue Concentration Analysis")
print("-" * 40)

# Pareto analysis (80/20 rule)
customers_sorted = customers.sort_values('total_spent', ascending=False).reset_index(drop=True)
customers_sorted['cumulative_revenue'] = customers_sorted['total_spent'].cumsum()
customers_sorted['cumulative_revenue_pct'] = (customers_sorted['cumulative_revenue'] / customers_sorted['total_spent'].sum()) * 100
customers_sorted['customer_pct'] = ((customers_sorted.index + 1) / len(customers_sorted)) * 100

# Find 80% revenue point
pct_80_revenue = customers_sorted[customers_sorted['cumulative_revenue_pct'] <= 80].shape[0]
pct_80_customers = (pct_80_revenue / len(customers_sorted)) * 100

print(f"📊 Pareto Analysis (80/20 Rule):")
print(f"   Top {pct_80_customers:.1f}% of customers generate 80% of revenue")
print(f"   This is {pct_80_revenue:,} customers out of {len(customers_sorted):,} total")



💎 STEP 7: Revenue Concentration Analysis
----------------------------------------
📊 Pareto Analysis (80/20 Rule):
   Top 63.0% of customers generate 80% of revenue
   This is 3,148 customers out of 5,000 total


### 8. BUSINESS METRICS SUMMARY

In [12]:
print("\n\n📋 STEP 8: Key Business Metrics Summary")
print("-" * 40)

# Calculate key metrics
total_revenue = transactions['total_amount'].sum()
total_orders = len(transactions)
total_customers = len(customers)
active_customers_count = len(customers[customers['is_churned'] == 0])
avg_order_value = transactions['total_amount'].mean()
avg_customer_value = customers['total_spent'].mean()

# Recent period metrics (last 30 days)
recent_date = transactions['transaction_date'].max()
last_30_days = recent_date - timedelta(days=30)
recent_transactions = transactions[transactions['transaction_date'] >= last_30_days]

recent_revenue = recent_transactions['total_amount'].sum()
recent_orders = len(recent_transactions)
recent_customers = recent_transactions['customer_id'].nunique()

print(f"🎯 KEY BUSINESS METRICS")
print(f"{'='*25}")
print(f"📊 Overall Performance:")
print(f"   Total Revenue: ${total_revenue:,.2f}")
print(f"   Total Orders: {total_orders:,}")
print(f"   Total Customers: {total_customers:,}")
print(f"   Active Customers: {active_customers_count:,}")
print(f"   Average Order Value: ${avg_order_value:.2f}")
print(f"   Average Customer Value: ${avg_customer_value:.2f}")
print(f"   Churn Rate: {churn_rate:.1f}%")

print(f"\n📈 Last 30 Days:")
print(f"   Revenue: ${recent_revenue:,.2f}")
print(f"   Orders: {recent_orders:,}")
print(f"   Active Customers: {recent_customers:,}")
print(f"   Daily Avg Revenue: ${recent_revenue/30:.2f}")



📋 STEP 8: Key Business Metrics Summary
----------------------------------------
🎯 KEY BUSINESS METRICS
📊 Overall Performance:
   Total Revenue: $2,277,263.58
   Total Orders: 50,000
   Total Customers: 5,000
   Active Customers: 3,513
   Average Order Value: $45.55
   Average Customer Value: $455.45
   Churn Rate: 29.7%

📈 Last 30 Days:
   Revenue: $66,217.01
   Orders: 1,405
   Active Customers: 1,227
   Daily Avg Revenue: $2207.23


### 9. INSIGHTS & RECOMMENDATIONS FOR NEXT STEPS

In [13]:
print(f"\n\n💡 KEY INSIGHTS FROM DESCRIPTIVE ANALYSIS")
print("="*50)

print("✅ WHAT WE DISCOVERED:")
print("   1. Revenue shows clear seasonal patterns")
print("   2. Customer segments have different behaviors")
print("   3. Product categories perform differently")
print("   4. Churn rate indicates retention opportunities")
print("   5. Revenue concentration follows Pareto principle")

print(f"\n🔍 QUESTIONS FOR NEXT ANALYSIS (Diagnostic):")
print("   ❓ WHY did sales drop in certain months?")
print("   ❓ WHY do customers churn?")
print("   ❓ WHY do some products underperform?")
print("   ❓ WHY do customer segments behave differently?")

print(f"\n➡️  NEXT: Diagnostic Analytics - Understanding the 'WHY'")
print("="*50)

# Optional: Save summary metrics for next notebooks
summary_metrics = {
    'total_revenue': total_revenue,
    'total_customers': total_customers,
    'churn_rate': churn_rate,
    'avg_order_value': avg_order_value,
    'analysis_date': datetime.now().date()
}

# Save for next notebooks
import json
with open('descriptive_summary.json', 'w') as f:
    json.dump(summary_metrics, f, default=str)

print("\n✅ Descriptive analysis complete! Summary saved for next analysis.")



💡 KEY INSIGHTS FROM DESCRIPTIVE ANALYSIS
✅ WHAT WE DISCOVERED:
   1. Revenue shows clear seasonal patterns
   2. Customer segments have different behaviors
   3. Product categories perform differently
   4. Churn rate indicates retention opportunities
   5. Revenue concentration follows Pareto principle

🔍 QUESTIONS FOR NEXT ANALYSIS (Diagnostic):
   ❓ WHY did sales drop in certain months?
   ❓ WHY do customers churn?
   ❓ WHY do some products underperform?
   ❓ WHY do customer segments behave differently?

➡️  NEXT: Diagnostic Analytics - Understanding the 'WHY'

✅ Descriptive analysis complete! Summary saved for next analysis.
