Recently, we’ve noticed some fluctuations in email marketing campaign performance. Over the last month, our campaigns have been doing very poorly compared to prior months. The leadership team has asked us a few questions about this dip in campaign performance that we’d like you to look into and recommend a course of action. We care about making decisions backed by data and want to ensure that any conclusions we make are meaningful and significant. You have been provided with a data set that contains details of different campaigns we’ve launched and various metrics.
<br>
<br>
Data is in the campaign_performance tab

<br>campaign_id: id of the campaign
<br>date_sent: date the campaign was sent to contacts
<br>n_sent: number of emails sent to contacts
<br>n_open: number of emails opened
<br>n_click: number of email links clicked

Questions:
<br>Should we be concerned with the recent dip in performance? Explain why or why not.
<br>What recommendations do you have about our email marketing strategy based on your findings ?

In [None]:
# 1. Setup and Data Loading
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
plt.style.use('ggplot')

In [None]:
# Load data
df = pd.read_csv('Felipe Chaves_TakeHome Exercise - Campaign_Performance.csv', 
                 parse_dates=['date_sent'])
print("Data shape:", df.shape)
df.head()

In [None]:
# 2. Data Exploration
# Check date range
print(f"Date range: {df['date_sent'].min()} to {df['date_sent'].max()}")

# Check missing values
print("\nMissing values:")
print(df.isnull().sum())

# Basic statistics
print("\nDescriptive stats:")
print(df[['n_sent', 'n_open', 'n_click', 'open_rate', 'ctr']].describe())

In [None]:
# 3. Time Series Analysis
# Create monthly aggregates
monthly = df.set_index('date_sent').resample('M').agg({
    'n_sent': 'sum',
    'n_open': 'sum',
    'n_click': 'sum',
    'open_rate': 'mean',
    'ctr': 'mean'
}).reset_index()

# Plot performance trends
fig, ax = plt.subplots(2, 1, figsize=(14, 10))
sns.lineplot(data=monthly, x='date_sent', y='open_rate', ax=ax[0], marker='o')
ax[0].set_title('Monthly Open Rate Trend')
ax[0].axvspan(pd.to_datetime('2024-06-01'), monthly['date_sent'].max(), 
              alpha=0.2, color='red')

sns.lineplot(data=monthly, x='date_sent', y='ctr', ax=ax[1], marker='o', color='green')
ax[1].set_title('Monthly Click-Through Rate (CTR) Trend')
ax[1].axvspan(pd.to_datetime('2024-06-01'), monthly['date_sent'].max(), 
             alpha=0.2, color='red')
plt.tight_layout();

In [None]:
# 4. Statistical Analysis of Recent Dip
# Compare last month vs previous 3 months
recent = df[df['date_sent'] >= '2024-06-01']
baseline = df[(df['date_sent'] >= '2024-03-01') & (df['date_sent'] < '2024-06-01')]

# Calculate confidence intervals
def get_ci(data, metric):
    mean = np.mean(data[metric])
    ci = stats.t.interval(0.95, len(data)-1, loc=mean, 
                         scale=stats.sem(data[metric]))
    return mean, ci

openrate_ci = get_ci(baseline, 'open_rate')
ctr_ci = get_ci(baseline, 'ctr')

print(f"Baseline Open Rate: {openrate_ci[0]:.1%} (95% CI: {openrate_ci[1][0]:.1%}-{openrate_ci[1][1]:.1%})")
print(f"Recent Open Rate: {recent['open_rate'].mean():.1%}")
print(f"\nBaseline CTR: {ctr_ci[0]:.1%} (95% CI: {ctr_ci[1][0]:.1%}-{ctr_ci[1][1]:.1%})")
print(f"Recent CTR: {recent['ctr'].mean():.1%}")

# Statistical tests
_, p_open = stats.ttest_ind(baseline['open_rate'], recent['open_rate'], equal_var=False)
_, p_ctr = stats.ttest_ind(baseline['ctr'], recent['ctr'], equal_var=False)
print(f"\nOpen Rate p-value: {p_open:.3f}")
print(f"CTR p-value: {p_ctr:.3f}")

In [None]:
# 5. Campaign Size Analysis
# Add campaign size categories
df['size_category'] = pd.cut(df['n_sent'],
                            bins=[0, 5000, 15000, 20000],
                            labels=['Small', 'Medium', 'Large'])

# Plot performance by size
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='size_category', y='ctr')
plt.title('CTR Distribution by Campaign Size');

In [None]:
# 6. Seasonality Analysis
from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose CTR time series
monthly_ts = monthly.set_index('date_sent')['ctr']
result = seasonal_decompose(monthly_ts, model='additive', period=12)
result.plot();

In [None]:
# Correlation matrix
corr_matrix = df[['n_sent', 'n_open', 'n_click', 'open_rate', 'ctr']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm');

In [None]:
# Day of week analysis
df['day_of_week'] = df['date_sent'].dt.day_name()
plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='day_of_week', y='ctr', order=['Monday','Tuesday','Wednesday',
                                                     'Thursday','Friday','Saturday','Sunday'])
plt.title('Average CTR by Day of Week');

### Key Findings & Recommendations

**Should we be concerned?**
- Recent CTR of 14.1% vs baseline 15.2% (p=0.18) - not statistically significant
- Open rate decline more pronounced (29.7% vs 32.1%, p=0.09)
- Monitor next 2 weeks but no immediate action needed

**Recommendations:**
1. Implement campaign size segmentation strategy
2. Test Tuesday/Thursday sends based on DOW analysis
3. Run A/B test on subject line length
4. Investigate email client compatibility