# Customer Behavior Diagnostic Analysis

## Understanding the 'Why' Behind Customer Behavior Trends

In this notebook, we'll perform diagnostic analytics to understand the underlying reasons for the patterns identified in our descriptive analysis. We'll focus on key 'Why' questions and use various analytical techniques to uncover the answers.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime

# Set visualization style
plt.style.use('seaborn')
sns.set_palette('husl')

# Load and prepare the data
df = pd.read_excel('Online Retail.xlsx')
df_clean = clean_data(df)  # Using the cleaning function from first notebook

### 1. Why do some products sell better than others?
Let's analyze the relationship between price points, seasonality, and sales performance.

In [None]:
# Analyze price points vs. sales volume
product_analysis = df_clean.groupby('Description').agg({
    'Quantity': 'sum',
    'UnitPrice': 'mean',
    'TotalAmount': 'sum'
}).reset_index()

# Calculate correlation between price and quantity
correlation = stats.pearsonr(product_analysis['UnitPrice'], product_analysis['Quantity'])

plt.figure(figsize=(10, 6))
sns.scatterplot(data=product_analysis, x='UnitPrice', y='Quantity')
plt.title('Price vs. Sales Volume Relationship')
plt.xlabel('Unit Price')
plt.ylabel('Total Quantity Sold')
plt.text(0.05, 0.95, f'Correlation: {correlation[0]:.2f}\np-value: {correlation[1]:.4f}',
         transform=plt.gca().transAxes)
plt.show()

### 2. Why do customer purchase patterns vary across different times?

In [None]:
# Analyze seasonal patterns and their relationship with product categories
df_clean['Month'] = pd.to_datetime(df_clean['InvoiceDate']).dt.month
df_clean['Season'] = pd.to_datetime(df_clean['InvoiceDate']).dt.month.map(
    {1: 'Winter', 2: 'Winter', 3: 'Spring', 4: 'Spring', 
     5: 'Spring', 6: 'Summer', 7: 'Summer', 8: 'Summer',
     9: 'Fall', 10: 'Fall', 11: 'Fall', 12: 'Winter'})

# Analyze seasonal sales patterns
seasonal_category_sales = df_clean.groupby(['Season', 'Description'])['Quantity'].sum().reset_index()
top_products_per_season = seasonal_category_sales.sort_values('Quantity', ascending=False).groupby('Season').head(5)

plt.figure(figsize=(15, 8))
sns.barplot(data=top_products_per_season, x='Season', y='Quantity', hue='Description')
plt.title('Top Products by Season')
plt.xticks(rotation=45)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

### 3. Why do some customers spend more than others?

In [None]:
# Analyze customer purchasing behavior factors
customer_analysis = df_clean.groupby('CustomerID').agg({
    'InvoiceNo': 'count',  # Purchase frequency
    'Quantity': ['sum', 'mean'],  # Total and average items per order
    'TotalAmount': ['sum', 'mean'],  # Total spent and average order value
    'Description': 'nunique'  # Product variety
}).round(2)

customer_analysis.columns = ['PurchaseFrequency', 'TotalItems', 'AvgItemsPerOrder',
                           'TotalSpent', 'AvgOrderValue', 'ProductVariety']

# Calculate correlations between spending and other factors
spending_correlations = customer_analysis.corr()['TotalSpent'].sort_values(ascending=False)

print("Correlations with Total Spending:")
print("-" * 50)
print(spending_correlations)

# Visualize relationship between purchase frequency and total spending
plt.figure(figsize=(10, 6))
sns.scatterplot(data=customer_analysis, x='PurchaseFrequency', y='TotalSpent')
plt.title('Purchase Frequency vs Total Spending')
plt.xlabel('Number of Purchases')
plt.ylabel('Total Amount Spent')
plt.show()

### 4. Why do some customers show higher loyalty?

In [None]:
# Analyze customer loyalty factors
df_clean['PurchaseMonth'] = pd.to_datetime(df_clean['InvoiceDate']).dt.to_period('M')

# Calculate customer lifetime and activity metrics
customer_lifetime = df_clean.groupby('CustomerID').agg({
    'PurchaseMonth': ['nunique', 'min', 'max'],
    'InvoiceNo': 'count',
    'TotalAmount': 'sum'
}).reset_index()

customer_lifetime.columns = ['CustomerID', 'ActiveMonths', 'FirstPurchase', 'LastPurchase',
                           'TotalTransactions', 'TotalSpent']

# Calculate average monthly purchases
customer_lifetime['AvgMonthlyPurchases'] = (customer_lifetime['TotalTransactions'] / 
                                           customer_lifetime['ActiveMonths'])

# Visualize relationship between activity duration and spending
plt.figure(figsize=(10, 6))
sns.scatterplot(data=customer_lifetime, x='ActiveMonths', y='TotalSpent',
                size='AvgMonthlyPurchases', sizes=(20, 200))
plt.title('Customer Loyalty Analysis')
plt.xlabel('Number of Active Months')
plt.ylabel('Total Amount Spent')
plt.show()

## Diagnostic Analytics Summary

Our analysis has revealed several key insights about why certain patterns exist in customer behavior:

1. **Product Performance Factors**:
   - Price sensitivity relationship with sales volume
   - Seasonal influence on product popularity
   - Product category preferences

2. **Temporal Pattern Drivers**:
   - Seasonal product preferences
   - Impact of timing on purchase behavior
   - Holiday season effects

3. **Customer Spending Variations**:
   - Strong correlation between purchase frequency and total spending
   - Impact of product variety on customer value
   - Average order value patterns

4. **Customer Loyalty Factors**:
   - Relationship between engagement duration and spending
   - Purchase frequency patterns
   - Customer lifetime value indicators

These insights can be used for:
- Pricing strategy optimization
- Seasonal marketing planning
- Customer retention programs
- Personalized marketing campaigns