# Week 9: Attribution Modeling

**Goal:** Master attribution modeling to understand multi-touch customer journeys and allocate credit across marketing touchpoints.

**Time Commitment:** ~1 hour per day √ó 7 days = 7 hours total

**What You'll Learn:**
- Attribution fundamentals and why it matters
- Rule-based attribution models (last-touch, first-touch, linear)
- Time-decay and position-based attribution
- Introduction to data-driven attribution
- Building attribution models in Python
- Comparing attribution models
- Attribution insights for budget allocation

**Why This Matters:**
As a Marketing Measurement Partner, attribution helps you:
- Understand the true value of each marketing channel
- Move beyond last-click bias
- Allocate budgets based on contribution, not just final conversions
- Identify hidden value in awareness and consideration channels
- Optimize the full customer journey, not just the last step

Modern customers interact with brands 5-7 times before converting. Attribution reveals which touchpoints drive that journey.

---

## üìÖ Day 57: Attribution Concepts (~60 min)

### Learning Objectives
- Understand what attribution modeling is and why it matters
- Learn about multi-touch customer journeys
- Understand the limitations of last-click attribution
- Explore different types of attribution models

### The Business Problem
Your Google Search ads get credit for all conversions, but customers also saw Facebook ads, display banners, and email campaigns before converting. How do you fairly distribute credit?

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Settings
np.random.seed(42)
sns.set_style('whitegrid')
pd.set_option('display.max_columns', None)

### üìñ Concept: The Attribution Problem

**Scenario:** A customer converts after this journey:
1. Saw Facebook ad (Day 1)
2. Clicked Google Search ad (Day 3)
3. Received email (Day 5)
4. Clicked Google Search ad again (Day 7) ‚Üí **Converted**

**Question:** Which channel should get credit for the conversion?

**Traditional Answer:** Google Search (last click)

**Problem:** This ignores the role of Facebook and Email in the journey!

In [None]:
# Sample customer journey data
journey_example = pd.DataFrame({
    'touchpoint_num': [1, 2, 3, 4],
    'channel': ['Facebook', 'Google Search', 'Email', 'Google Search'],
    'date': ['2024-01-01', '2024-01-03', '2024-01-05', '2024-01-07'],
    'action': ['Impression', 'Click', 'Open', 'Click ‚Üí Conversion']
})

print("Customer Journey to Conversion:")
print(journey_example)
print("\nLast-Click Attribution: 100% credit to Google Search")
print("But is that fair? ü§î")

### üìñ Concept: Types of Attribution Models

**Rule-Based (Heuristic) Models:**
1. **Last-Touch**: 100% credit to last interaction
2. **First-Touch**: 100% credit to first interaction
3. **Linear**: Equal credit to all touchpoints
4. **Time-Decay**: More credit to recent touchpoints
5. **Position-Based (U-Shaped)**: More credit to first and last, less to middle

**Data-Driven Models:**
- Use machine learning to determine credit based on observed conversion patterns
- More accurate but require significant data

This week focuses on rule-based models, which are easier to implement and explain.

In [None]:
# Create a more realistic dataset of customer journeys
journeys_data = [
    # Journey 1: Facebook ‚Üí Google Search ‚Üí Conversion
    {'user_id': 'U001', 'touchpoint': 1, 'channel': 'Facebook', 'converted': 0},
    {'user_id': 'U001', 'touchpoint': 2, 'channel': 'Google Search', 'converted': 1},
    
    # Journey 2: Email ‚Üí Google Search ‚Üí Display ‚Üí Google Search ‚Üí Conversion
    {'user_id': 'U002', 'touchpoint': 1, 'channel': 'Email', 'converted': 0},
    {'user_id': 'U002', 'touchpoint': 2, 'channel': 'Google Search', 'converted': 0},
    {'user_id': 'U002', 'touchpoint': 3, 'channel': 'Display', 'converted': 0},
    {'user_id': 'U002', 'touchpoint': 4, 'channel': 'Google Search', 'converted': 1},
    
    # Journey 3: Facebook ‚Üí Direct ‚Üí Conversion
    {'user_id': 'U003', 'touchpoint': 1, 'channel': 'Facebook', 'converted': 0},
    {'user_id': 'U003', 'touchpoint': 2, 'channel': 'Direct', 'converted': 1},
    
    # Journey 4: Google Search only ‚Üí Conversion
    {'user_id': 'U004', 'touchpoint': 1, 'channel': 'Google Search', 'converted': 1},
    
    # Journey 5: Display ‚Üí Email ‚Üí Facebook ‚Üí Google Search ‚Üí Conversion
    {'user_id': 'U005', 'touchpoint': 1, 'channel': 'Display', 'converted': 0},
    {'user_id': 'U005', 'touchpoint': 2, 'channel': 'Email', 'converted': 0},
    {'user_id': 'U005', 'touchpoint': 3, 'channel': 'Facebook', 'converted': 0},
    {'user_id': 'U005', 'touchpoint': 4, 'channel': 'Google Search', 'converted': 1},
]

df_journeys = pd.DataFrame(journeys_data)

print("Customer Journey Dataset:")
print(df_journeys)
print(f"\nTotal Conversions: {df_journeys['converted'].sum()}")
print(f"Total Users: {df_journeys['user_id'].nunique()}")

### üí° Try It: Visualize Customer Journeys

Create a visualization showing the path to conversion for each user.

In [None]:
# YOUR CODE HERE
# For each user who converted:
# 1. Extract their full journey
# 2. Create a visual representation (could be text-based)
# 3. Show: User ‚Üí Channel 1 ‚Üí Channel 2 ‚Üí ... ‚Üí Conversion
# 4. Calculate average journey length (touchpoints before conversion)



### ‚úèÔ∏è Exercise 1: Last-Click Attribution Bias

Calculate conversions by channel using last-click attribution and identify the bias.

In [None]:
# YOUR CODE HERE
# Using df_journeys:
# 1. For each converted user, identify the last touchpoint (where converted=1)
# 2. Count conversions by channel (last-click)
# 3. Also count total touchpoints by channel (participation)
# 4. Compare: Which channels are over-credited? Under-credited?
# 5. What insights does this reveal?



### üéØ Day 57 Mini-Project: Journey Analysis

Analyze a larger dataset of customer journeys to understand patterns.

In [None]:
# Generate synthetic customer journey data
np.random.seed(42)

def generate_journey():
    """Generate a realistic customer journey."""
    channels = ['Display', 'Facebook', 'Instagram', 'Google Search', 'Email', 'Direct']
    # Journey length varies
    length = np.random.choice([1, 2, 3, 4, 5], p=[0.20, 0.30, 0.25, 0.15, 0.10])
    
    journey = []
    # First touch often awareness channels
    first_touch_channels = ['Display', 'Facebook', 'Instagram', 'Google Search']
    journey.append(np.random.choice(first_touch_channels))
    
    # Middle touches
    for _ in range(length - 2):
        journey.append(np.random.choice(channels))
    
    # Last touch often direct or search
    if length > 1:
        last_touch_channels = ['Google Search', 'Direct', 'Email']
        journey.append(np.random.choice(last_touch_channels, p=[0.5, 0.3, 0.2]))
    
    return journey

# Generate 1000 conversion journeys
all_journeys = []
for user_id in range(1000):
    journey = generate_journey()
    for i, channel in enumerate(journey):
        all_journeys.append({
            'user_id': f'U{user_id:04d}',
            'touchpoint': i + 1,
            'channel': channel,
            'converted': 1 if i == len(journey) - 1 else 0
        })

df_large = pd.DataFrame(all_journeys)

# YOUR CODE HERE
# Analyze this dataset:
# 1. What's the average journey length?
# 2. What's the distribution of journey lengths?
# 3. Which channel appears most frequently in position 1 (first touch)?
# 4. Which channel appears most frequently in last position?
# 5. Create a sankey diagram or path analysis showing common journeys
# 6. What would last-click attribution tell us vs reality?



### üéì Day 57 Key Takeaways

‚úÖ Most conversions involve multiple touchpoints  
‚úÖ Last-click attribution over-credits bottom-funnel channels  
‚úÖ Different channels play different roles in the journey  
‚úÖ Attribution modeling distributes credit more fairly  
‚úÖ Understanding journey patterns is crucial for optimization  

**Next:** Tomorrow we'll implement last-touch and first-touch attribution!

---

## üìÖ Day 58: Last-Touch Attribution (~60 min)

### Learning Objectives
- Implement last-touch attribution in Python
- Understand when last-touch is appropriate
- Calculate attributed conversions and ROI by channel
- Compare attributed performance to observed performance

### The Business Problem
You need to build an attribution system. Start with the simplest model: last-touch. Despite its limitations, it's still widely used and provides a baseline.

### üìñ Concept: Last-Touch Attribution

**Rule:** Give 100% credit to the last touchpoint before conversion.

**Pros:**
- Simple to implement and explain
- Reflects the "closing" action
- Matches most analytics platforms' default

**Cons:**
- Ignores all earlier touchpoints
- Over-credits bottom-funnel channels
- Under-values awareness and consideration activities

In [None]:
def last_touch_attribution(df):
    """
    Apply last-touch attribution to journey data.
    
    Parameters:
    - df: DataFrame with columns: user_id, touchpoint, channel, converted
    
    Returns:
    - DataFrame with attributed conversions by channel
    """
    # Get only converting touchpoints
    conversions = df[df['converted'] == 1].copy()
    
    # Count by channel
    attributed_conversions = conversions.groupby('channel').size().reset_index(name='conversions')
    
    return attributed_conversions

# Apply to our data
last_touch_results = last_touch_attribution(df_large)
last_touch_results = last_touch_results.sort_values('conversions', ascending=False)

print("Last-Touch Attribution Results:")
print(last_touch_results)
print(f"\nTotal Conversions: {last_touch_results['conversions'].sum()}")

# Visualize
plt.figure(figsize=(10, 6))
plt.bar(last_touch_results['channel'], last_touch_results['conversions'])
plt.xlabel('Channel')
plt.ylabel('Attributed Conversions')
plt.title('Last-Touch Attribution: Conversions by Channel')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### üí° Try It: Calculate Channel Participation

Compare how often each channel appears in conversion journeys vs. how much credit it gets.

In [None]:
# YOUR CODE HERE
# 1. Count total touchpoints by channel (participation)
# 2. Count attributed conversions by channel (last-touch)
# 3. Calculate participation rate vs attribution rate
# 4. Which channels are over-credited relative to participation?
# 5. Which are under-credited?



### üìñ Concept: Adding Cost Data

Attribution is most useful when combined with cost data to calculate ROI by channel.

In [None]:
# Channel cost data
channel_costs = pd.DataFrame({
    'channel': ['Display', 'Facebook', 'Instagram', 'Google Search', 'Email', 'Direct'],
    'cost': [15000, 25000, 18000, 35000, 5000, 0]
})

# Merge with attribution results
results_with_cost = last_touch_results.merge(channel_costs, on='channel')

# Calculate metrics (assume $100 revenue per conversion)
revenue_per_conversion = 100
results_with_cost['revenue'] = results_with_cost['conversions'] * revenue_per_conversion
results_with_cost['roi'] = (results_with_cost['revenue'] - results_with_cost['cost']) / results_with_cost['cost']
results_with_cost['cpa'] = results_with_cost['cost'] / results_with_cost['conversions']
results_with_cost['roas'] = results_with_cost['revenue'] / results_with_cost['cost']

# Replace inf with NaN for Direct (0 cost)
results_with_cost = results_with_cost.replace([np.inf, -np.inf], np.nan)

print("Last-Touch Attribution with ROI:")
print(results_with_cost.sort_values('roas', ascending=False))

### ‚úèÔ∏è Exercise 2: Attribution-Based Budget Allocation

Use last-touch attribution to recommend budget allocation.

In [None]:
# YOUR CODE HERE
# Given a total budget of $120,000:
# 1. Current allocation is in channel_costs
# 2. Based on last-touch ROAS, which channels should get more budget?
# 3. Which should get less?
# 4. Propose a new budget allocation that:
#    - Shifts budget to higher ROAS channels
#    - Maintains some presence in awareness channels
#    - Totals $120,000
# 5. What's the expected impact on total conversions?



### üéØ Day 58 Mini-Project: Last-Touch Attribution Dashboard

Build a comprehensive last-touch attribution reporting system.

In [None]:
# YOUR CODE HERE
# Create a complete attribution reporting function that:
# 
# Takes as input:
# - Journey data (user_id, touchpoint, channel, converted)
# - Cost data by channel
# - Revenue per conversion
#
# Produces:
# 1. Attributed conversions by channel
# 2. CPA, ROAS, ROI by channel
# 3. Ranked list of channels by efficiency
# 4. Visualization: Bar chart of conversions and ROAS
# 5. Summary statistics
# 6. Budget recommendations
#
# Test with df_large and channel_costs above



### üéì Day 58 Key Takeaways

‚úÖ Last-touch attribution is simple but biased  
‚úÖ Combines easily with cost data for ROI analysis  
‚úÖ Over-credits bottom-funnel, under-credits top-funnel  
‚úÖ Useful as a baseline for comparison  
‚úÖ Should not be the only attribution model used  

**Next:** Tomorrow we'll implement first-touch and linear attribution!

---

## üìÖ Day 59-63: Additional Attribution Models (Condensed)

### Day 59: First-Touch & Linear Attribution
- First-touch: 100% credit to first interaction
- Linear: Equal credit to all touchpoints
- Implementation and comparison

### Day 60: Time-Decay Attribution
- More recent touchpoints get more credit
- Exponential decay functions
- Choosing the right half-life parameter

### Day 61: Position-Based Attribution
- U-shaped: 40% first, 40% last, 20% to middle
- W-shaped variation
- When to use position-based models

### Day 62: Data-Driven Attribution Intro
- Markov chain models
- Shapley value approach
- Comparing removal effects

### Day 63: Capstone - Build Attribution Model
- Implement all attribution models
- Compare results across models
- Make data-driven budget recommendations
- Present findings to stakeholders

*Note: These sections would be fully expanded in a production version with detailed code examples, exercises, and mini-projects.*

---

### üéì Week 9 Complete!

**Congratulations!** You've mastered attribution modeling.

**What You've Learned:**
- ‚úÖ Multi-touch customer journey analysis
- ‚úÖ Rule-based attribution models (last, first, linear, time-decay, position)
- ‚úÖ Data-driven attribution fundamentals
- ‚úÖ Attribution-based budget optimization
- ‚úÖ Comparing and selecting attribution models

**Next Week:** Marketing Mix Modeling - understanding aggregate channel impact!

---