# Week 11: Incrementality & Lift Studies

**Goal:** Master incrementality measurement to prove causal marketing impact through rigorous experimentation.

**Time Commitment:** ~1 hour per day √ó 7 days = 7 hours total

**What You'll Learn:**
- Incrementality concepts and why correlation ‚â† causation
- Geo-lift studies and geographic experimentation
- Matched market testing methodology
- Difference-in-differences (DiD) analysis
- Synthetic control methods
- Public Service Announcement (PSA) analysis
- Designing and analyzing incrementality tests

**Why This Matters:**
As a Marketing Measurement Partner, incrementality studies help you:
- Prove causality (not just correlation)
- Measure true incremental value of marketing
- Identify wasted spend on non-incremental channels
- Make confident investment decisions
- Answer: "What would happen if we turned off this channel?"
- Validate attribution and MMM findings

Incrementality is the gold standard for marketing measurement. It separates marketing effects from everything else happening in the business.

---

## üìÖ Day 71: Incrementality Concepts (~60 min)

### Learning Objectives
- Understand incrementality vs observational measurement
- Learn the difference between correlation and causation
- Explore types of incrementality tests
- Understand the counterfactual framework

### The Business Problem
Your brand search campaigns show amazing ROAS. But are they actually driving new customers, or just capturing people who would have found you anyway? You need to measure incremental value.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

# Settings
np.random.seed(42)
sns.set_style('whitegrid')
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

### üìñ Concept: What is Incrementality?

**Incrementality** = The causal lift in conversions/sales due to marketing, compared to what would have happened without it.

**Key Question:** What is the **counterfactual**?
- **Factual**: What happened (with marketing)
- **Counterfactual**: What would have happened (without marketing)
- **Incremental Lift**: Factual - Counterfactual

**The Challenge:** We can never observe both for the same unit at the same time!

**Solution:** Use experimental or quasi-experimental designs to estimate the counterfactual.

### üìñ Concept: Correlation vs Causation

**Example: Brand Search Problem**

Observation: Brand search ads have 10x ROAS!

But consider:
- People searching your brand name already know about you
- They might click your organic result instead
- They might type your URL directly
- They might convert anyway

**Question:** How many conversions are truly incremental?

In [None]:
# Simulate brand search scenario
np.random.seed(42)

# Scenario 1: Ads ON
weeks_with_ads = pd.DataFrame({
    'week': range(1, 13),
    'paid_brand_search_conversions': np.random.randint(800, 1200, 12),
    'organic_conversions': np.random.randint(200, 400, 12),
    'direct_conversions': np.random.randint(300, 500, 12),
    'brand_search_spend': np.random.randint(8000, 12000, 12)
})

weeks_with_ads['total_conversions'] = (weeks_with_ads['paid_brand_search_conversions'] + 
                                        weeks_with_ads['organic_conversions'] + 
                                        weeks_with_ads['direct_conversions'])

# Calculate apparent ROAS (assumes all paid conversions are incremental)
revenue_per_conversion = 100
weeks_with_ads['paid_revenue'] = weeks_with_ads['paid_brand_search_conversions'] * revenue_per_conversion
weeks_with_ads['apparent_roas'] = weeks_with_ads['paid_revenue'] / weeks_with_ads['brand_search_spend']

print("Brand Search Campaign Performance (Ads ON):")
print(weeks_with_ads[['week', 'paid_brand_search_conversions', 'brand_search_spend', 'apparent_roas']].head())
print(f"\nAverage Apparent ROAS: {weeks_with_ads['apparent_roas'].mean():.2f}x")
print("\n‚ö†Ô∏è  But this assumes ALL paid conversions are incremental...")

### üí° Try It: Simulate an Incrementality Test

What happens if we turn off brand search ads for a few weeks?

In [None]:
# YOUR CODE HERE
# Simulate what happens when ads are turned OFF:
# 
# Assumptions (you can adjust):
# - 70% of paid search conversions would have happened anyway (via organic or direct)
# - 30% are truly incremental
#
# 1. Create a "weeks_without_ads" scenario
# 2. Calculate total conversions without paid search
# 3. Calculate true incrementality
# 4. Calculate true incremental ROAS
# 5. Compare to apparent ROAS
# 6. How much was wasted spend?



### üìñ Concept: Types of Incrementality Tests

| Method | Level | Pros | Cons | Use Case |
|--------|-------|------|------|----------|
| **User-level holdout** | Individual users | Gold standard, precise | Requires user tracking | Digital channels |
| **Geo experiments** | Geographic regions | No user tracking needed | Need many markets | TV, radio, OOH |
| **Time-based (on/off)** | Time periods | Simple to run | Seasonality confounds | Quick tests |
| **PSA (ghost ads)** | Individual users | Measures true view-through | Platform-dependent | Display, video |

This week focuses on **geo experiments** and **difference-in-differences**, which work without user tracking.

### ‚úèÔ∏è Exercise 1: Calculate Incrementality Metrics

Given test results, calculate key incrementality metrics.

In [None]:
# Test results
test_data = pd.DataFrame({
    'group': ['Control (ads off)', 'Treatment (ads on)'],
    'users': [50000, 50000],
    'conversions': [850, 1200],
    'spend': [0, 25000]
})

# YOUR CODE HERE
# Calculate:
# 1. Conversion rate for each group
# 2. Absolute lift (treatment CVR - control CVR)
# 3. Relative lift (% improvement)
# 4. Incremental conversions (total treatment conversions - expected from control rate)
# 5. Incremental CPA (spend / incremental conversions)
# 6. Incrementality rate (incremental / total treatment conversions)
# 7. Is this campaign incremental?



### üéØ Day 71 Mini-Project: Incrementality Calculator

Build a comprehensive incrementality analysis tool.

In [None]:
# YOUR CODE HERE
# Create a function that takes test results and produces:
#
# Inputs:
# - control_conversions, control_population
# - treatment_conversions, treatment_population  
# - treatment_spend
# - revenue_per_conversion
#
# Outputs:
# 1. Statistical significance (two-proportion z-test)
# 2. Incremental conversions
# 3. Incrementality rate (%)
# 4. Incremental CPA
# 5. Incremental ROAS
# 6. Total incremental revenue
# 7. ROI (incremental revenue - spend) / spend
# 8. Recommendation (scale, optimize, or pause)
#
# Test with multiple scenarios



### üéì Day 71 Key Takeaways

‚úÖ Incrementality measures causal lift, not correlation  
‚úÖ The counterfactual is what would happen without marketing  
‚úÖ Not all attributed conversions are incremental  
‚úÖ Incrementality tests reveal true marketing value  
‚úÖ Different test types for different situations  

**Next:** Tomorrow we'll design geo-lift studies!

---

## üìÖ Day 72: Geo-Lift Studies (~60 min)

### Learning Objectives
- Understand geographic experimentation
- Design geo-lift tests
- Select test and control markets
- Analyze geo-lift results

### The Business Problem
You want to test a new TV campaign, but can't track individual users. Solution: Run ads in some cities (test markets) and compare to similar cities without ads (control markets).

### üìñ Concept: Geographic Experiments

**Design:**
1. Select **test markets** where you'll increase/decrease marketing
2. Select **control markets** with similar characteristics
3. Run for sufficient duration (typically 4-8 weeks)
4. Compare outcomes between test and control

**Key Requirements:**
- Test and control markets must be **similar** (size, demographics, baseline sales)
- Need enough markets for statistical power
- Markets should be **independent** (no spillover)
- Measure during **stable** period (avoid major events)

In [None]:
# Generate synthetic geo data
np.random.seed(42)

cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia',
          'San Antonio', 'San Diego', 'Dallas', 'San Jose', 'Austin', 'Jacksonville',
          'Fort Worth', 'Columbus', 'Charlotte', 'San Francisco', 'Indianapolis', 'Seattle',
          'Denver', 'Boston']

# Pre-test data (8 weeks before test)
pre_test_data = []
for city in cities:
    baseline_sales = np.random.randint(800, 2000)
    for week in range(1, 9):
        sales = baseline_sales + np.random.randint(-100, 100)
        pre_test_data.append({
            'city': city,
            'week': -week,  # Negative for pre-test
            'sales': sales,
            'period': 'pre-test'
        })

df_pre = pd.DataFrame(pre_test_data)

# Calculate baseline sales by city
city_baselines = df_pre.groupby('city')['sales'].mean().reset_index()
city_baselines.columns = ['city', 'baseline_sales']

print("Pre-Test Baseline Sales by City:")
print(city_baselines.sort_values('baseline_sales', ascending=False))

### üí° Try It: Select Matched Markets

Choose test and control markets that are similar.

In [None]:
# YOUR CODE HERE
# Select markets for test:
# 1. Rank cities by baseline sales
# 2. Create pairs of similar cities (similar baseline sales)
# 3. Randomly assign one from each pair to test, one to control
# 4. You want 10 test cities and 10 control cities
# 5. Verify test and control groups have similar average baseline sales
#
# This is called "matched pairs" design



### üìñ Concept: Running the Geo Test

After selecting markets, run the test and collect data.

In [None]:
# Assign test vs control
np.random.seed(42)
test_cities = np.random.choice(cities, size=10, replace=False)
control_cities = [c for c in cities if c not in test_cities]

# Generate test period data (8 weeks)
# Test cities get 15% lift from new marketing
test_data = []
for city in cities:
    baseline = city_baselines[city_baselines['city'] == city]['baseline_sales'].values[0]
    is_test = city in test_cities
    lift = 0.15 if is_test else 0.00
    
    for week in range(1, 9):
        sales = baseline * (1 + lift) + np.random.randint(-100, 100)
        test_data.append({
            'city': city,
            'week': week,
            'sales': sales,
            'group': 'test' if is_test else 'control',
            'period': 'test'
        })

df_test = pd.DataFrame(test_data)

# Combine pre and test data
df_pre['group'] = df_pre['city'].apply(lambda x: 'test' if x in test_cities else 'control')
df_geo = pd.concat([df_pre, df_test], ignore_index=True)

print("Geo Test Assignment:")
print(f"Test markets: {len(test_cities)}")
print(f"Control markets: {len(control_cities)}")
print(f"\nTest markets: {test_cities[:5]}...")
print(f"Control markets: {control_cities[:5]}...")

### ‚úèÔ∏è Exercise 2: Analyze Geo Test Results

Calculate the incrementality from the geo test.

In [None]:
# YOUR CODE HERE
# 1. Calculate average sales by group (test vs control) for pre-test period
# 2. Calculate average sales by group for test period
# 3. Calculate the difference-in-differences:
#    DiD = (Test_after - Test_before) - (Control_after - Control_before)
# 4. Calculate % lift in test markets
# 5. Perform statistical test (t-test comparing test vs control in test period)
# 6. Is the lift statistically significant?
# 7. Visualize pre and test periods for both groups



### üéØ Day 72 Mini-Project: Geo Test Simulator

Build a tool to simulate and analyze geo tests.

In [None]:
# YOUR CODE HERE
# Create a comprehensive geo test simulator that:
#
# 1. Generates synthetic pre-test data for N markets
# 2. Selects matched test and control markets
# 3. Simulates test period with specified lift %
# 4. Analyzes results using difference-in-differences
# 5. Produces:
#    - Summary statistics
#    - Statistical test results
#    - Visualization of pre/post trends
#    - ROI calculation
#
# Test with different scenarios:
# - True lift of 10%
# - True lift of 5%
# - True lift of 0% (null test)



### üéì Day 72 Key Takeaways

‚úÖ Geo tests measure incrementality without user tracking  
‚úÖ Matched markets provide valid control groups  
‚úÖ Difference-in-differences isolates treatment effect  
‚úÖ Need sufficient markets and duration for power  
‚úÖ Useful for TV, radio, OOH, and regional campaigns  

**Next:** Tomorrow we'll dive deeper into matched market testing!

---

## üìÖ Day 73-77: Advanced Incrementality Methods (Condensed)

### Day 73: Matched Market Testing
- Matching algorithms (propensity scores)
- Pre-test bias correction
- Power analysis for geo tests

### Day 74: Difference-in-Differences
- DiD methodology and assumptions
- Parallel trends assumption
- Multiple time periods
- DiD with covariates

### Day 75: Synthetic Control Methods
- Building synthetic controls
- Weighted combinations of controls
- When markets aren't perfectly matched

### Day 76: PSA Analysis
- Ghost ads and placebo tests
- View-through incrementality
- Platform-specific methods

### Day 77: Capstone - Incrementality Study
- Design complete incrementality test
- Analyze results with multiple methods
- Business recommendations
- Present findings with confidence intervals

*Note: These sections would be fully expanded in a production version with detailed code examples, exercises, and mini-projects.*

---

### üéì Week 11 Complete!

**Congratulations!** You've mastered incrementality testing.

**What You've Learned:**
- ‚úÖ Incrementality fundamentals and causality
- ‚úÖ Geographic experimentation
- ‚úÖ Matched market testing
- ‚úÖ Difference-in-differences analysis
- ‚úÖ Synthetic control methods

**Next Week:** Customer Lifetime Value and your FINAL CAPSTONE!

---