# üèÜ Feb 09: Week 7 Final Practice - E-Commerce Analytics

Welcome to your final Week 7 challenge! You'll analyze e-commerce data and create a comprehensive visualization report.

## Setup and Data Generation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Set style
sns.set_theme(style="whitegrid", palette="husl")
%matplotlib inline

# Set random seed for reproducibility
np.random.seed(42)

In [None]:
# Generate realistic e-commerce dataset
n_orders = 1000

# Generate dates (last 12 months)
start_date = datetime(2024, 2, 1)
dates = [start_date + timedelta(days=np.random.randint(0, 365)) for _ in range(n_orders)]

# Categories with different popularity
categories = np.random.choice(
    ['Electronics', 'Clothing', 'Home', 'Books', 'Sports'],
    n_orders,
    p=[0.3, 0.25, 0.2, 0.15, 0.1]
)

# Regions
regions = np.random.choice(['North', 'South', 'East', 'West'], n_orders)

# Sales amounts (vary by category)
sales = []
for cat in categories:
    if cat == 'Electronics':
        sales.append(np.random.gamma(2, 150))
    elif cat == 'Clothing':
        sales.append(np.random.gamma(2, 50))
    elif cat == 'Home':
        sales.append(np.random.gamma(2, 100))
    elif cat == 'Books':
        sales.append(np.random.gamma(2, 20))
    else:  # Sports
        sales.append(np.random.gamma(2, 80))

# Quantities
quantities = np.random.randint(1, 10, n_orders)

# Customer ages
ages = np.random.choice(
    np.concatenate([
        np.random.normal(25, 5, 300).astype(int),
        np.random.normal(40, 8, 400).astype(int),
        np.random.normal(60, 7, 300).astype(int)
    ]),
    n_orders
)
ages = np.clip(ages, 18, 80)

# Customer type
customer_types = np.random.choice(['New', 'Returning'], n_orders, p=[0.3, 0.7])

# Create DataFrame
df = pd.DataFrame({
    'OrderDate': dates,
    'Category': categories,
    'Region': regions,
    'Sales': sales,
    'Quantity': quantities,
    'CustomerAge': ages,
    'CustomerType': customer_types
})

# Sort by date
df = df.sort_values('OrderDate').reset_index(drop=True)

# Add month column for analysis
df['Month'] = df['OrderDate'].dt.to_period('M')

print("Dataset created successfully!")
print(f"\nShape: {df.shape}")
print(f"\nFirst few rows:")
df.head(10)

## Step 1: Data Exploration

Before visualizing, let's understand our data.

In [None]:
# Basic information
print("=" * 50)
print("DATASET INFORMATION")
print("=" * 50)
print(df.info())
print("\n" + "=" * 50)
print("MISSING VALUES")
print("=" * 50)
print(df.isnull().sum())
print("\n" + "=" * 50)
print("BASIC STATISTICS")
print("=" * 50)
print(df.describe())
print("\n" + "=" * 50)
print("CATEGORICAL DISTRIBUTIONS")
print("=" * 50)
print("\nCategories:")
print(df['Category'].value_counts())
print("\nRegions:")
print(df['Region'].value_counts())
print("\nCustomer Types:")
print(df['CustomerType'].value_counts())

## Task 1: Time Series Analysis

**Question**: What is the overall sales trend over time? Are there seasonal patterns?

**Your Task**: Create a line chart showing monthly sales trends.

In [None]:
# Your code here
# Hint: Group by month and sum sales, then create a line chart


## Task 2: Category Performance

**Question**: Which product categories generate the most revenue?

**Your Task**: Create a bar chart comparing total sales by category.

In [None]:
# Your code here
# Hint: Group by category, sum sales, sort, and create a bar chart


## Task 3: Regional Analysis

**Question**: How do sales vary across regions? Are there regional preferences?

**Your Task**: Create visualizations showing regional performance.

In [None]:
# Your code here - Part A: Total sales by region


In [None]:
# Your code here - Part B: Sales by region and category (grouped or stacked bar)


## Task 4: Customer Demographics

**Question**: What is the age distribution? How does spending vary by age?

**Your Task**: Create visualizations exploring customer age patterns.

In [None]:
# Your code here - Part A: Age distribution (histogram)


In [None]:
# Your code here - Part B: Age vs Sales (scatter plot or box plot by age groups)


## Task 5: Customer Behavior

**Question**: How do new vs returning customers differ?

**Your Task**: Compare new and returning customer behavior.

In [None]:
# Your code here - Part A: Customer type distribution (pie or bar)


In [None]:
# Your code here - Part B: Average sales by customer type (bar chart)


## Task 6: Relationships

**Question**: What relationships exist between variables?

**Your Task**: Explore correlations using scatter plots.

In [None]:
# Your code here - Part A: Quantity vs Sales


In [None]:
# Your code here - Part B: Age vs Sales with regression line


## Task 7: Comprehensive Dashboard

**Your Task**: Create a professional dashboard combining 4-6 key visualizations.

**Requirements**:
- Use subplots to create a multi-panel figure
- Include the most important insights
- Professional styling and formatting
- Clear titles and labels
- Logical layout

In [None]:
# Your code here - Create a comprehensive dashboard
# Suggested layout: 2 rows x 3 columns or 3 rows x 2 columns


## Task 8: Insights Summary

**Your Task**: Write a summary of your key findings.

Use the markdown cell below to document:
1. **Top 3 Insights**: What are the most important findings?
2. **Recommendations**: What actions should the business take?
3. **Next Steps**: What additional analysis would be valuable?

### üìä Key Insights

1. **[Your insight here]**
   - Supporting evidence from your visualizations

2. **[Your insight here]**
   - Supporting evidence from your visualizations

3. **[Your insight here]**
   - Supporting evidence from your visualizations

### üí° Recommendations

1. **[Your recommendation]**
   - Why this matters

2. **[Your recommendation]**
   - Why this matters

3. **[Your recommendation]**
   - Why this matters

### üîç Next Steps

1. **[Additional analysis needed]**
2. **[Additional analysis needed]**
3. **[Additional analysis needed]**

## üåü Bonus Challenges

If you've completed all tasks, try these advanced challenges:

### Bonus 1: Correlation Heatmap

Create a heatmap showing correlations between numerical variables.

In [None]:
# Your code here


### Bonus 2: Advanced Segmentation

Create customer segments based on age and analyze their preferences.

In [None]:
# Your code here
# Hint: Create age groups (18-30, 31-45, 46-60, 61+) and analyze by category


### Bonus 3: Statistical Analysis

Add statistical annotations to your visualizations (mean lines, confidence intervals, etc.).

In [None]:
# Your code here


## üéâ Congratulations!

You've completed Week 7: Data Visualization!

### What You've Learned

‚úÖ **Matplotlib Basics**: Complete control over visualizations  
‚úÖ **Line & Bar Charts**: The workhorses of data visualization  
‚úÖ **Seaborn**: Beautiful statistical graphics with less code  
‚úÖ **Chart Selection**: Choosing the right visualization for your data  
‚úÖ **Real-World Analysis**: Applying skills to business problems  

### Skills Acquired

- Creating professional, publication-quality visualizations
- Selecting appropriate chart types for different data stories
- Customizing plots for clarity and impact
- Building comprehensive dashboards
- Extracting and communicating insights from data

### Next Steps

Week 8 will focus on **Mini Projects** where you'll apply everything you've learned to real-world datasets!

Keep practicing and happy visualizing! üìä‚ú®