# Exploratory Data Analysis - MMM Project

This notebook demonstrates the hybrid approach: using functional code from `src/` modules for reusable analysis.

## Setup and Imports


In [None]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import our functional modules
import sys
sys.path.append('../src')

from pip_ai_mmm_test.data.loaders import (
    load_marketing_spend_data,
    load_business_metrics_data,
    validate_data_quality,
    get_data_summary
)

from pip_ai_mmm_test.analysis.eda import (
    plot_time_series,
    plot_channel_spend_distribution,
    plot_correlation_heatmap,
    plot_missing_data_patterns,
    analyze_channel_performance,
    detect_seasonality,
    enable_xkcd_style,
    disable_xkcd_style
)

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline


## XKCD Style Plotting

Our EDA functions now support XKCD hand-drawn style! Use `xkcd_style=True` parameter or enable globally.


In [None]:
# Example: XKCD Style Plotting
from pip_ai_mmm_test.analysis.eda import enable_xkcd_style, disable_xkcd_style

# Option 1: Enable XKCD style globally
# enable_xkcd_style()

# Option 2: Use xkcd_style=True parameter in individual functions
# This gives you more control over which plots use XKCD style

# For demonstration, let's create some sample data
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Create sample marketing data
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
channels = ['TV', 'Digital', 'Print', 'Radio', 'Events']
np.random.seed(42)

sample_data = []
for date in dates:
    for channel in channels:
        spend = np.random.exponential(1000) if channel == 'TV' else np.random.exponential(500)
        sample_data.append({
            'date': date,
            'channel': channel,
            'spend': spend
        })

spend_df = pd.DataFrame(sample_data)
print(f"Created sample data with {len(spend_df)} rows")


In [None]:
# Compare normal vs XKCD style plots

# Normal style
fig1 = plot_channel_spend_distribution(
    spend_df,
    channel_col='channel',
    spend_col='spend',
    title="Normal Style - Marketing Spend by Channel",
    xkcd_style=False
)
plt.show()

# XKCD style
fig2 = plot_channel_spend_distribution(
    spend_df,
    channel_col='channel',
    spend_col='spend',
    title="XKCD Style - Marketing Spend by Channel",
    xkcd_style=True
)
plt.show()


In [None]:
# More XKCD examples

# Time series with XKCD style
daily_spend = spend_df.groupby('date')['spend'].sum().reset_index()
fig3 = plot_time_series(
    daily_spend,
    date_col='date',
    value_cols=['spend'],
    title="XKCD Style - Daily Marketing Spend",
    xkcd_style=True
)
plt.show()

# Missing data patterns with XKCD style
# Add some missing data for demonstration
spend_df_with_missing = spend_df.copy()
spend_df_with_missing.loc[spend_df_with_missing.sample(100).index, 'spend'] = np.nan

fig4 = plot_missing_data_patterns(
    spend_df_with_missing,
    title="XKCD Style - Missing Data Patterns",
    xkcd_style=True
)
plt.show()


## XKCD Style Usage Guide

### Two Ways to Use XKCD Style:

1. **Individual Plot Control**: Use `xkcd_style=True` parameter
2. **Global Style**: Use `enable_xkcd_style()` for all subsequent plots

### When to Use XKCD Style:
- ✅ **Presentations** - More engaging and memorable
- ✅ **Stakeholder Reports** - Friendly, approachable appearance  
- ✅ **Public Dashboards** - Less intimidating than formal charts
- ❌ **Technical Documentation** - Stick to normal style for precision
- ❌ **Academic Papers** - Use standard matplotlib styling


In [None]:
# Global XKCD Style Example
print("🎨 Demonstrating Global XKCD Style...")

# Enable global XKCD style
enable_xkcd_style()

# Create a simple plot that will automatically use XKCD style
fig, ax = plt.subplots(figsize=(10, 6))
x = np.linspace(0, 10, 100)
y = np.sin(x) * np.exp(-x/5)

ax.plot(x, y, linewidth=3, color='xkcd:blue')
ax.set_title('Global XKCD Style - Sinusoidal Decay', fontsize=16, fontweight='bold')
ax.set_xlabel('Time', fontsize=12)
ax.set_ylabel('Amplitude', fontsize=12)
ax.grid(True, alpha=0.3)

# Add fun annotation
ax.annotate('Peak Value!', 
           xy=(x[np.argmax(y)], np.max(y)), 
           xytext=(20, 20), 
           textcoords='offset points',
           bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.8),
           arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.2'))

plt.tight_layout()
plt.show()

# Disable global XKCD style
disable_xkcd_style()
print("📊 Back to normal matplotlib styling")


## Data Loading and Validation

Using our functional data loaders for consistent data handling.


In [None]:
# Load your actual data here
# spend_data_path = '../data/raw/marketing_spend.csv'
# spend_df = load_marketing_spend_data(spend_data_path)

# metrics_data_path = '../data/raw/business_metrics.csv'
# metrics_df = load_business_metrics_data(metrics_data_path)

# For this template, we'll use the sample data created above
print(f"Using sample data:")
print(f"Marketing spend data shape: {spend_df.shape}")
print(f"Sample channels: {spend_df['channel'].unique()}")
print(f"Date range: {spend_df['date'].min()} to {spend_df['date'].max()}")


## Side-by-Side Style Comparison

Compare normal vs XKCD styles for the same data.


In [None]:
# Side-by-side comparison: Normal vs XKCD Style

# 1. Channel Spend Distribution Comparison
print("📊 Channel Spend Distribution - Normal Style")
fig1 = plot_channel_spend_distribution(
    spend_df,
    channel_col='channel',
    spend_col='spend',
    title="Normal Style - Marketing Spend by Channel",
    xkcd_style=False
)
plt.show()

print("🎨 Channel Spend Distribution - XKCD Style")
fig2 = plot_channel_spend_distribution(
    spend_df,
    channel_col='channel',
    spend_col='spend',
    title="XKCD Style - Marketing Spend by Channel",
    xkcd_style=True
)
plt.show()


In [None]:
# 2. Time Series Comparison
daily_spend = spend_df.groupby('date')['spend'].sum().reset_index()

print("📊 Time Series - Normal Style")
fig3 = plot_time_series(
    daily_spend,
    date_col='date',
    value_cols=['spend'],
    title="Normal Style - Daily Marketing Spend",
    xkcd_style=False
)
plt.show()

print("🎨 Time Series - XKCD Style")
fig4 = plot_time_series(
    daily_spend,
    date_col='date',
    value_cols=['spend'],
    title="XKCD Style - Daily Marketing Spend",
    xkcd_style=True
)
plt.show()


## Key Insights and Next Steps

### Key Findings:
1. **Data Quality**: [Summarize validation results]
2. **Channel Performance**: [Summarize performance analysis]
3. **Seasonality**: [Summarize seasonality findings]
4. **Correlations**: [Summarize correlation insights]

### Style Recommendations:
- **Use Normal Style** for technical analysis and model development
- **Use XKCD Style** for stakeholder presentations and reports
- **Mix Both** - Normal for analysis, XKCD for final presentations

### Next Steps:
1. Feature engineering based on findings
2. Model development
3. Validation and testing
4. Create presentation-ready XKCD plots for stakeholders

---

**Note**: This notebook demonstrates the hybrid approach where:
- **Functions in `src/`** provide reusable, testable code
- **Notebooks** provide interactive exploration and visualization
- **XKCD Style** makes presentations more engaging and memorable
- **Templates** ensure consistency across analyses


date 