# Customer Behavior Descriptive Analysis

**Course**: IIMK's Professional Certificate in Data Science and Artificial Intelligence for Managers  
**Student Name**: Lalit Nayyar  
**Email ID**: lalitnayyar@gmail.com  
**Assignment Name**: Week 4: Required Assignment 4.1

## Descriptive Analytics Section
In this section, we'll perform detailed descriptive analytics to understand customer behavior patterns and trends.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime

# Set visualization style
plt.style.use('seaborn')
sns.set_palette('husl')

### 1. Purchase Frequency Analysis

In [None]:
# Load and prepare the data
df = pd.read_excel('Online Retail.xlsx')
df_clean = clean_data(df)  # Using the cleaning function from first notebook

# Customer purchase frequency
customer_frequency = df_clean['CustomerID'].value_counts()

print("Purchase Frequency Statistics:")
print("-" * 50)
print(f"Average purchases per customer: {customer_frequency.mean():.2f}")
print(f"Median purchases per customer: {customer_frequency.median():.2f}")

# Visualize purchase frequency distribution
plt.figure(figsize=(10, 6))
sns.histplot(customer_frequency, bins=50)
plt.title('Distribution of Purchase Frequency per Customer')
plt.xlabel('Number of Purchases')
plt.ylabel('Number of Customers')
plt.show()

### 2. Popular Products Analysis

In [None]:
# Analyze top selling products
product_sales = df_clean.groupby('Description').agg({
    'Quantity': 'sum',
    'TotalAmount': 'sum',
    'InvoiceNo': 'count'
}).rename(columns={'InvoiceNo': 'TransactionCount'})

# Sort by quantity sold
top_products_by_quantity = product_sales.sort_values('Quantity', ascending=False).head(10)

# Visualize top products
plt.figure(figsize=(12, 6))
sns.barplot(data=top_products_by_quantity.reset_index(), 
            x='Quantity', 
            y='Description')
plt.title('Top 10 Products by Quantity Sold')
plt.xlabel('Total Quantity Sold')
plt.show()

### 3. Temporal Purchase Patterns

In [None]:
# Add datetime components
df_clean['InvoiceDate'] = pd.to_datetime(df_clean['InvoiceDate'])
df_clean['Month'] = df_clean['InvoiceDate'].dt.month
df_clean['DayOfWeek'] = df_clean['InvoiceDate'].dt.day_name()
df_clean['Hour'] = df_clean['InvoiceDate'].dt.hour

# Monthly sales trend
monthly_sales = df_clean.groupby('Month')['TotalAmount'].sum().reset_index()

plt.figure(figsize=(10, 6))
sns.lineplot(data=monthly_sales, x='Month', y='TotalAmount', marker='o')
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales Amount')
plt.show()

### 4. Customer Spending Analysis

In [None]:
# Calculate customer spending metrics
customer_spending = df_clean.groupby('CustomerID').agg({
    'TotalAmount': ['sum', 'mean', 'count'],
    'Quantity': 'sum'
}).round(2)

customer_spending.columns = ['TotalSpent', 'AverageTransactionValue', 'TransactionCount', 'TotalItems']

print("Customer Spending Statistics:")
print("-" * 50)
print(customer_spending.describe())

# Visualize distribution of customer spending
plt.figure(figsize=(10, 6))
sns.histplot(data=customer_spending, x='TotalSpent', bins=50)
plt.title('Distribution of Customer Total Spending')
plt.xlabel('Total Amount Spent')
plt.ylabel('Number of Customers')
plt.show()

### 5. Key Insights Summary

Based on the descriptive analytics performed above, we can identify the following key patterns and trends:

**What:**
- Most popular products and their sales volumes
- Distribution of transaction values
- Customer spending patterns

**Which:**
- Which products are bestsellers
- Which months show highest sales
- Which customers are most valuable (by spending)

**How Many:**
- Average purchases per customer
- Total transactions per product
- Distribution of order quantities