# Funnel Decomposition Analysis - Comprehensive Demo

This notebook demonstrates **all decomposition methodologies** available in the funnel decomposition package:

## Available Methods

1. **Symmetric Decomposition** (Order-independent midpoint methodology)
2. **Hierarchical Decomposition** (Sequential waterfall methodology)
3. **Weekly Analysis** (Using flexible date_column parameter)
4. **Multi-Lender Analysis** (Aggregate lender-level decomposition)

## Decomposition Effects

All methods decompose booking changes into 6 effects:
- **Volume Effect**: Change in total application volume
- **Mix Effect**: Change in segment distribution
- **Straight Approval Effect**: Change in straight approval rates
- **Conditional Approval Effect**: Change in conditional approval rates
- **Straight Booking Effect**: Change in straight booking rates
- **Conditional Booking Effect**: Change in conditional booking rates

## Shared Visualization Engine

All methods use the same visualization engine, producing:
- Waterfall grids (2×2 layout)
- Dimensional stacked waterfalls
- Dimension drilldown charts
- Multi-lender comparison charts

## Setup

In [None]:
import sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

# Add src to path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

import symmetric_decomposition_calculator
import hier_decomposition_calculator
import visualization_engine

# Configure matplotlib for inline display
%matplotlib inline
plt.rcParams['figure.figsize'] = (16, 12)
plt.rcParams['figure.dpi'] = 100

## Load Data

The dataset contains:
- **Time periods**: 24 months (2023-01-01 to 2024-12-01)
- **Lenders**: ACA, ALY, CAP
- **Dimensions**: FICO bands (High/Med/Low/Null), offer competition tier, product line
- **Funnel metrics**: Applications, approval rates, booking rates
- **Segments**: 24 segments per month per lender (4 FICO bands × 3 offer tiers × 2 product lines)

In [None]:
# Load monthly data
data_path = Path.cwd().parent / 'data' / 'funnel_data_mock_v2.csv'
df_monthly = pd.read_csv(data_path)
df_monthly['month_begin_date'] = pd.to_datetime(df_monthly['month_begin_date'])

print(f"Loaded {len(df_monthly)} rows of monthly data")
print(f"\nLenders: {', '.join(sorted(df_monthly['lender'].unique()))}")
print(f"Date range: {df_monthly['month_begin_date'].min().date()} to {df_monthly['month_begin_date'].max().date()}")
print(f"Unique months: {df_monthly['month_begin_date'].nunique()}")
print(f"Rows per month: {len(df_monthly) // df_monthly['month_begin_date'].nunique()}")

# Show sample
print("\nSample data:")
df_monthly.head(10)

---

# 1. Symmetric Decomposition (Order-Independent)

## Methodology

**Symmetric decomposition** uses a **midpoint approach** where all effects are calculated independently using **average values** from both periods:

1. **Volume Effect**: `ΔA × p_avg × conversion_avg`
2. **Mix Effect**: `A_avg × Δp × conversion_avg`
3. **Straight Approval Effect**: `A_avg × p_avg × Δr_str × b_str_avg`
4. **Conditional Approval Effect**: `A_avg × p_avg × Δr_cond × b_cond_avg`
5. **Straight Booking Effect**: `A_avg × p_avg × r_str_avg × Δb_str`
6. **Conditional Booking Effect**: `A_avg × p_avg × r_cond_avg × Δb_cond`
7. **Interaction Effect**: Residual to ensure perfect reconciliation

Where `_avg = (value_1 + value_2) / 2`

## Key Advantages

- **Order-independent**: Effect order doesn't matter
- **Balanced**: No bias toward either period
- **Perfect reconciliation**: Includes interaction effect
- **Consistent**: Same results regardless of calculation sequence

In [None]:
# Calculate symmetric decomposition for June 2023 → June 2024 (YoY)
results_sym = symmetric_decomposition_calculator.calculate_decomposition(
    df=df_monthly,
    date_a='2023-06-01',
    date_b='2024-06-01',
    lender='ACA'
)

print("Symmetric Decomposition Results:")
print(f"  Method: {results_sym.metadata['method']}")
print(f"  Period 1: {results_sym.metadata['date_a']}")
print(f"  Period 2: {results_sym.metadata['date_b']}")
print(f"  Period 1 bookings: {results_sym.metadata['period_1_total_bookings']:,.0f}")
print(f"  Period 2 bookings: {results_sym.metadata['period_2_total_bookings']:,.0f}")
print(f"  Delta bookings: {results_sym.metadata['delta_total_bookings']:+,.0f}")
print(f"  Number of segments: {results_sym.metadata['num_segments']}")

In [None]:
# View summary
print("\nSymmetric Decomposition Summary:")
results_sym.summary

In [None]:
# Create waterfall grid visualization
fig_sym = visualization_engine.create_waterfall_grid(
    summary=results_sym.summary,
    segment_detail=results_sym.segment_detail,
    lender='ACA'
)

plt.show()

---

# 2. Hierarchical Decomposition (Sequential Waterfall)

## Methodology

**Hierarchical decomposition** uses a **sequential waterfall** where each effect builds on previous steps:

1. **Volume Effect**: `ΔA × p[1] × conversion[1]` (using Period 1 values)
2. **Mix Effect**: `A[2] × Δp × conversion[1]` (using new apps, Period 1 rates)
3. **Straight Approval Effect**: `A[2] × p[2] × Δr_str × b_str[1]`
4. **Conditional Approval Effect**: `A[2] × p[2] × Δr_cond × b_cond[1]`
5. **Straight Booking Effect**: `A[2] × p[2] × r_str[2] × Δb_str`
6. **Conditional Booking Effect**: `A[2] × p[2] × r_cond[2] × Δb_cond`

## Key Advantages

- **Sequential logic**: Mirrors step-by-step business changes
- **Interpretable**: Each effect shows incremental impact
- **Perfect reconciliation**: No residual interaction effect needed
- **Traditional**: Familiar waterfall approach

In [None]:
# Calculate hierarchical decomposition for same period
results_hier = hier_decomposition_calculator.calculate_decomposition(
    df=df_monthly,
    date_a='2023-06-01',
    date_b='2024-06-01',
    lender='ACA'
)

print("Hierarchical Decomposition Results:")
print(f"  Period 1: {results_hier.metadata['date_a']}")
print(f"  Period 2: {results_hier.metadata['date_b']}")
print(f"  Period 1 bookings: {results_hier.metadata['period_1_total_bookings']:,.0f}")
print(f"  Period 2 bookings: {results_hier.metadata['period_2_total_bookings']:,.0f}")
print(f"  Delta bookings: {results_hier.metadata['delta_total_bookings']:+,.0f}")
print(f"  Number of segments: {results_hier.metadata['num_segments']}")

In [None]:
# View summary
print("\nHierarchical Decomposition Summary:")
results_hier.summary

In [None]:
# Create waterfall grid visualization
fig_hier = visualization_engine.create_waterfall_grid(
    summary=results_hier.summary,
    segment_detail=results_hier.segment_detail,
    lender='ACA'
)

plt.show()

## Compare Symmetric vs Hierarchical

Let's compare the two methodologies side-by-side:

In [None]:
# Create comparison table
comparison = pd.merge(
    results_sym.summary,
    results_hier.summary,
    on='effect_type',
    suffixes=('_symmetric', '_hierarchical')
)
comparison['difference'] = comparison['booking_impact_symmetric'] - comparison['booking_impact_hierarchical']
comparison['pct_difference'] = (comparison['difference'] / comparison['booking_impact_hierarchical'].abs() * 100).round(1)

print("\nSymmetric vs Hierarchical Comparison:")
print("=" * 100)
comparison

---

# 3. Weekly Analysis (Using date_column Parameter)

## New Feature: Flexible Date Column

All calculators now support a **`date_column` parameter** that allows you to:
- Analyze weekly data without renaming columns
- Use custom date column names
- Keep your code explicit and clear

**Default**: `date_column='month_begin_date'` (backward compatible)

## Weekly Data Example

In [None]:
# Load weekly data
weekly_path = Path.cwd().parent / 'data' / 'funnel_data_mock_weekly.csv'
df_weekly = pd.read_csv(weekly_path)
df_weekly['week_begin_date'] = pd.to_datetime(df_weekly['week_begin_date'])

print(f"Loaded {len(df_weekly)} rows of weekly data")
print(f"\nDate range: {df_weekly['week_begin_date'].min().date()} to {df_weekly['week_begin_date'].max().date()}")
print(f"Unique weeks: {df_weekly['week_begin_date'].nunique()}")
print(f"Segments per week: {len(df_weekly) // df_weekly['week_begin_date'].nunique()}")

In [None]:
# Select weeks for analysis (Week 26 vs Week 78 = ~1 year apart)
unique_weeks = sorted(df_weekly['week_begin_date'].unique())
week_1 = unique_weeks[25]  # Week 26
week_2 = unique_weeks[77]  # Week 78

print(f"Comparing:")
print(f"  Week 1: {week_1.date()}")
print(f"  Week 2: {week_2.date()}")
print(f"  Time span: {(week_2 - week_1).days} days (~{(week_2 - week_1).days // 7} weeks)")

In [None]:
# Calculate symmetric decomposition with date_column parameter
results_weekly = symmetric_decomposition_calculator.calculate_decomposition(
    df=df_weekly,
    date_a=week_1,
    date_b=week_2,
    lender='ACA',
    date_column='week_begin_date'  # NEW: Specify the date column directly!
)

print("Weekly Symmetric Decomposition Results:")
print(f"  Method: {results_weekly.metadata['method']}")
print(f"  Period 1: {results_weekly.metadata['date_a']}")
print(f"  Period 2: {results_weekly.metadata['date_b']}")
print(f"  Period 1 bookings: {results_weekly.metadata['period_1_total_bookings']:,.0f}")
print(f"  Period 2 bookings: {results_weekly.metadata['period_2_total_bookings']:,.0f}")
print(f"  Delta bookings: {results_weekly.metadata['delta_total_bookings']:+,.0f}")
print(f"\nSummary:")
results_weekly.summary

In [None]:
# Create waterfall grid for weekly data
fig_weekly = visualization_engine.create_waterfall_grid(
    summary=results_weekly.summary,
    segment_detail=results_weekly.segment_detail,
    lender='ACA'
)

plt.show()

## Weekly Trend Analysis (Bonus)

With weekly data, we can analyze trends over time:

In [None]:
# Calculate weekly totals
weekly_totals = df_weekly.groupby('week_begin_date').agg({
    'num_tot_apps': 'first',
    'num_tot_bks': 'first'
}).reset_index()

# Create trend plot
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16, 8))

# Applications trend
ax1.plot(weekly_totals['week_begin_date'], weekly_totals['num_tot_apps'], 
         marker='o', markersize=3, linewidth=1.5, color='#2E86AB')
ax1.set_title('Weekly Applications Trend', fontsize=14, fontweight='bold')
ax1.set_ylabel('Applications', fontsize=12)
ax1.grid(True, alpha=0.3)
ax1.axvline(week_1, color='red', linestyle='--', alpha=0.5, label='Period 1')
ax1.axvline(week_2, color='green', linestyle='--', alpha=0.5, label='Period 2')
ax1.legend()

# Bookings trend
ax2.plot(weekly_totals['week_begin_date'], weekly_totals['num_tot_bks'], 
         marker='o', markersize=3, linewidth=1.5, color='#A23B72')
ax2.set_title('Weekly Bookings Trend', fontsize=14, fontweight='bold')
ax2.set_ylabel('Bookings', fontsize=12)
ax2.set_xlabel('Week', fontsize=12)
ax2.grid(True, alpha=0.3)
ax2.axvline(week_1, color='red', linestyle='--', alpha=0.5, label='Period 1')
ax2.axvline(week_2, color='green', linestyle='--', alpha=0.5, label='Period 2')
ax2.legend()

plt.tight_layout()
plt.show()

---

# 4. Multi-Lender Analysis

## Multi-Lender Decomposition

Analyze multiple lenders simultaneously with:
- **Aggregate view**: Overall performance across all lenders
- **Lender attribution**: See which lender drives each effect
- **Side-by-side comparison**: Overall vs by-lender breakdowns

This is useful for:
- Portfolio management
- Lender comparison and benchmarking
- Strategic resource allocation
- Root cause analysis (lender-specific vs systematic)

In [None]:
# Calculate multi-lender decomposition
results_multi = symmetric_decomposition_calculator.calculate_multi_lender_decomposition(
    df=df_monthly,
    date_a='2023-06-01',
    date_b='2024-06-01'
)

print("Multi-Lender Symmetric Decomposition Results:")
print(f"  Method: {results_multi.metadata['method']}")
print(f"  Period 1: {results_multi.metadata['date_a']}")
print(f"  Period 2: {results_multi.metadata['date_b']}")
print(f"  Lenders analyzed: {', '.join(results_multi.metadata['lenders'])}")
print(f"\nAggregate Results (All Lenders):")
print(f"  Period 1 bookings: {results_multi.metadata['aggregate_period_1_bookings']:,.0f}")
print(f"  Period 2 bookings: {results_multi.metadata['aggregate_period_2_bookings']:,.0f}")
print(f"  Delta bookings: {results_multi.metadata['aggregate_delta_bookings']:+,.0f}")

In [None]:
# View aggregate summary
print("\nAggregate Summary (All Lenders):")
results_multi.aggregate_summary

In [None]:
# Create lender waterfall grid (2-panel: Overall vs By Lender)
fig_multi = visualization_engine.create_lender_waterfall_grid(
    lender_summaries=results_multi.lender_summaries,
    aggregate_summary=results_multi.aggregate_summary,
    metadata=results_multi.metadata
)

plt.show()

In [None]:
# Print lender-level breakdowns
visualization_engine.print_lender_breakdowns(results_multi.lender_summaries)

In [None]:
# Create lender drilldown (each effect broken down by lender)
fig_lender_drill = visualization_engine.create_lender_drilldown(
    lender_summaries=results_multi.lender_summaries,
    date_a=results_multi.metadata['date_a'],
    date_b=results_multi.metadata['date_b']
)

plt.show()

---

# 5. Dimension Drilldowns

## Detailed Analysis by Dimension

For any decomposition result, you can create drilldown charts showing each effect broken down by:
- FICO bands (High/Med/Low/Null)
- Offer competition tier (solo_offer/multi_best/multi_other)
- Product line (Used/VMax)

### Example: FICO Band Drilldown

In [None]:
# Create FICO band drilldown
fig_fico = visualization_engine.create_dimension_drilldown(
    segment_detail=results_sym.segment_detail,
    dimension='fico_bands',
    lender='ACA'
)

plt.show()

### Example: Offer Comp Tier Drilldown

In [None]:
# Create Offer Comp Tier drilldown
fig_comp = visualization_engine.create_dimension_drilldown(
    segment_detail=results_sym.segment_detail,
    dimension='offer_comp_tier',
    lender='ACA'
)

plt.show()

### Example: Product Line Drilldown

In [None]:
# Create Product Line drilldown
fig_prod = visualization_engine.create_dimension_drilldown(
    segment_detail=results_sym.segment_detail,
    dimension='prod_line',
    lender='ACA'
)

plt.show()

---

# 6. Segment-Level Detail

## Detailed Segment Analysis

All decomposition results include segment-level detail showing:
- Period 1 and Period 2 metrics for each segment
- Delta values for all metrics
- Individual effect contributions by segment

This is useful for:
- Identifying specific segments driving changes
- Deep-dive analysis of particular dimension combinations
- Validation of aggregate results

In [None]:
# Show segment-level detail (first 10 segments)
print("Segment-Level Detail (First 10 Segments):")
results_sym.segment_detail.head(10)

In [None]:
# Example: Find top 5 segments by total effect
top_segments = results_sym.segment_detail.nlargest(5, 'total_effect')[[
    'fico_bands', 'offer_comp_tier', 'prod_line', 
    'period_1_segment_bookings', 'period_2_segment_bookings', 'total_effect'
]]

print("\nTop 5 Segments by Positive Impact:")
print(top_segments.to_string(index=False))

---

# 7. Exporting Results

## Export to CSV

All results can be easily exported using pandas:

In [None]:
# Export examples (uncomment to use)
# results_sym.summary.to_csv('symmetric_summary.csv', index=False)
# results_sym.segment_detail.to_csv('symmetric_segment_detail.csv', index=False)
# results_hier.summary.to_csv('hierarchical_summary.csv', index=False)
# results_multi.aggregate_summary.to_csv('multi_lender_aggregate.csv', index=False)

## Export Charts to PNG

See the **`chart_export.ipynb`** notebook for detailed examples of:
- Extracting individual charts from grids
- Exporting charts as PNG files
- Batch export functionality
- Organized directory structure for exports

---

# Summary

## Key Takeaways

### 1. **Multiple Methodologies**
- **Symmetric**: Order-independent, balanced, includes interaction effect
- **Hierarchical**: Sequential waterfall, step-by-step logic
- Both methods produce perfect reconciliation to actual booking changes

### 2. **Flexible Time Granularity**
- Monthly analysis (default: `date_column='month_begin_date'`)
- Weekly analysis (use: `date_column='week_begin_date'`)
- Any custom date column supported

### 3. **Multi-Lender Capabilities**
- Aggregate view across all lenders
- Lender attribution for each effect
- Side-by-side comparisons

### 4. **Rich Visualizations**
- Waterfall grids (2×2 dimensional breakdowns)
- Dimensional stacked waterfalls
- Dimension drilldown charts
- Multi-lender comparison charts
- All using shared visualization engine

### 5. **Detailed Analytics**
- Segment-level detail for deep dives
- Effect attribution by dimension
- Export capabilities for further analysis

## When to Use Each Method

**Use Symmetric when:**
- You want order-independent results
- Comparing multiple decompositions
- Balanced view is important
- Need consistent methodology across analyses

**Use Hierarchical when:**
- Sequential logic matches business process
- Explaining step-by-step transformations
- Traditional waterfall approach is preferred

**Use Weekly when:**
- Need higher frequency monitoring
- Faster detection of trends
- Operational decision-making cycles

**Use Multi-Lender when:**
- Portfolio-level analysis
- Lender comparison and benchmarking
- Strategic resource allocation
- Distinguishing systematic vs lender-specific issues