# Marketing Campaign A/B Testing Analysis

## Business Problem

**Context:** A company is looking to optimize their digital marketing strategy to improve customer acquisition and conversion rates. They have been running traditional marketing campaigns but want to determine if a new campaign approach can deliver better results.

**Role:** As a data scientist, your goal is to design, run and analyze an A/B experiment that tests two different marketing campaign strategies to help the marketing team decide which approach to implement company-wide.

**Campaign Setup:**
- **Control Campaign:** The existing marketing approach that the company has been using
- **Test Campaign:** A new experimental marketing strategy with potentially different targeting, creative, or budget allocation

## Business Objectives

Help the marketing team decide which campaign strategy to use based on A/B test results to maximize:

1. **Customer acquisition efficiency**
2. **Conversion rates throughout the marketing funnel**  
3. **Return on advertising spend (ROAS)**
4. **Overall business impact**

## Key Business Questions

1. **Which campaign drives more purchases at a lower cost?**
2. **Which campaign has better conversion rates at each funnel stage?**
3. **What is the statistical significance of the performance difference?**
4. **What is the estimated business impact of choosing one campaign over the other?**
5. **Should the company scale the Test Campaign or stick with the Control Campaign?**

## Success Metrics

### Primary Metrics
- **Cost per Purchase** (Spend ÷ Purchases)
- **Conversion Rate** (Purchases ÷ Impressions)

### Secondary Metrics  
- **Click-through Rate** (Clicks ÷ Impressions)
- **Cost per Click** (Spend ÷ Clicks)
- **Purchase Rate from Clicks** (Purchases ÷ Clicks)

### Business Impact Metrics
- **Total Purchases Generated**
- **Return on Ad Spend (ROAS)**
- **Customer Acquisition Cost**

## Dataset Overview

**Time Period:** August 1-30, 2019 (30 days)
**Campaigns:** Control vs Test Campaign
**Data Structure:** Daily performance metrics including spend, impressions, reach, clicks, searches, content views, cart additions, and purchases

This analysis will provide data-driven recommendations for the marketing team's strategic decision on campaign implementation.

In [2]:
# Built-in libraries
from datetime import datetime
import random
import math

# Third-party libraries
import seaborn as sns
import pandas as pd
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import MultipleLocator
from statsmodels.stats.power import TTestIndPower, tt_ind_solve_power
from statsmodels.stats.weightstats import ttest_ind
from statsmodels.stats.proportion import proportions_chisquare, confint_proportions_2indep

# Disable warnings
from warnings import filterwarnings
filterwarnings('ignore')

# Set seed for np random
SEED = 123
np.random.seed(SEED)

## Load Data

In [18]:
# Load Data with correct separator
control_df = pd.read_csv('data/control_group.csv', sep=';')
test_df = pd.read_csv('data/test_group.csv', sep=';')

In [13]:
control_df.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [20]:
print("Control Campaign Data:")
print(f"Shape: {control_df.shape}")
print("Columns:", control_df.columns.tolist())

Control Campaign Data:
Shape: (30, 10)
Columns: ['Campaign Name', 'Date', 'Spend [USD]', '# of Impressions', 'Reach', '# of Website Clicks', '# of Searches', '# of View Content', '# of Add to Cart', '# of Purchase']


In [None]:
control_df.head()


Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [23]:
print("\nTest Campaign Data:")
print(f"Shape: {test_df.shape}")
print("Columns:", test_df.columns.tolist())


Test Campaign Data:
Shape: (30, 10)
Columns: ['Campaign Name', 'Date', 'Spend [USD]', '# of Impressions', 'Reach', '# of Website Clicks', '# of Searches', '# of View Content', '# of Add to Cart', '# of Purchase']


In [24]:
test_df.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [25]:
# Clean column names for easier handling
def clean_column_names(df):
    df.columns = df.columns.str.replace('# of ', 'num_')
    df.columns = df.columns.str.replace(' [USD]', '_usd') 
    df.columns = df.columns.str.replace(' ', '_')
    return df

In [26]:
control_df = clean_column_names(control_df)
test_df = clean_column_names(test_df)

print("Cleaned columns:", control_df.columns.tolist())

Cleaned columns: ['Campaign_Name', 'Date', 'Spend_[USD]', 'num_Impressions', 'Reach', 'num_Website_Clicks', 'num_Searches', 'num_View_Content', 'num_Add_to_Cart', 'num_Purchase']


In [27]:
# Convert date and check data types
control_df['Date'] = pd.to_datetime(control_df['Date'], format='%d.%m.%Y')
test_df['Date'] = pd.to_datetime(test_df['Date'], format='%d.%m.%Y')

In [28]:
# Basic info
print("\n=== CONTROL CAMPAIGN ===")
print(f"Date range: {control_df['Date'].min()} to {control_df['Date'].max()}")
print(f"Missing values:\n{control_df.isnull().sum()}")


=== CONTROL CAMPAIGN ===
Date range: 2019-08-01 00:00:00 to 2019-08-30 00:00:00
Missing values:
Campaign_Name         0
Date                  0
Spend_[USD]           0
num_Impressions       1
Reach                 1
num_Website_Clicks    1
num_Searches          1
num_View_Content      1
num_Add_to_Cart       1
num_Purchase          1
dtype: int64


In [29]:

print("\n=== TEST CAMPAIGN ===") 
print(f"Date range: {test_df['Date'].min()} to {test_df['Date'].max()}")
print(f"Missing values:\n{test_df.isnull().sum()}")


=== TEST CAMPAIGN ===
Date range: 2019-08-01 00:00:00 to 2019-08-30 00:00:00
Missing values:
Campaign_Name         0
Date                  0
Spend_[USD]           0
num_Impressions       0
Reach                 0
num_Website_Clicks    0
num_Searches          0
num_View_Content      0
num_Add_to_Cart       0
num_Purchase          0
dtype: int64


In [30]:
# Handle missing data
print("Missing data details:")
print("Control campaign missing data on:", control_df[control_df.isnull().any(axis=1)]['Date'].dt.date.tolist())

Missing data details:
Control campaign missing data on: [datetime.date(2019, 8, 5)]


In [31]:
# Remove the row with missing data for now
control_clean = control_df.dropna()
test_clean = test_df.copy()

In [32]:
print(f"\nFinal datasets:")
print(f"Control: {len(control_clean)} days")  
print(f"Test: {len(test_clean)} days")


Final datasets:
Control: 29 days
Test: 30 days


In [34]:
# Calculate key business metrics
def calculate_metrics(df):
    return {
        'total_spend': df['Spend_[USD]'].sum(),
        'total_impressions': df['num_Impressions'].sum(),
        'total_clicks': df['num_Website_Clicks'].sum(),
        'total_purchases': df['num_Purchase'].sum(),
        'ctr': df['num_Website_Clicks'].sum() / df['num_Impressions'].sum(),
        'conversion_rate': df['num_Purchase'].sum() / df['num_Impressions'].sum(),
        'cost_per_purchase': df['Spend_[USD]'].sum() / df['num_Purchase'].sum(),
        'avg_daily_purchases': df['num_Purchase'].mean()
    }

control_metrics = calculate_metrics(control_clean)
test_metrics = calculate_metrics(test_clean)

In [35]:
print("\n=== KEY BUSINESS METRICS ===")
for metric in control_metrics:
    print(f"{metric}:")
    print(f"  Control: {control_metrics[metric]:.4f}")
    print(f"  Test: {test_metrics[metric]:.4f}")
    print(f"  Difference: {((test_metrics[metric]/control_metrics[metric])-1)*100:.1f}%")
    print()


=== KEY BUSINESS METRICS ===
total_spend:
  Control: 66818.0000
  Test: 76892.0000
  Difference: 15.1%

total_impressions:
  Control: 3177233.0000
  Test: 2237544.0000
  Difference: -29.6%

total_clicks:
  Control: 154303.0000
  Test: 180970.0000
  Difference: 17.3%

total_purchases:
  Control: 15161.0000
  Test: 15637.0000
  Difference: 3.1%

ctr:
  Control: 0.0486
  Test: 0.0809
  Difference: 66.5%

conversion_rate:
  Control: 0.0048
  Test: 0.0070
  Difference: 46.5%

cost_per_purchase:
  Control: 4.4072
  Test: 4.9173
  Difference: 11.6%

avg_daily_purchases:
  Control: 522.7931
  Test: 521.2333
  Difference: -0.3%

