# MARKETING CAMPAIGN PERFORMANCE ANALYSIS

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## DATA UNDERSTANDING

The dataset used for this analysis is the Marketing Campaign Dataset, which contains information about various marketing campaigns across multiple channels, target audiences, and locations. The dataset includes performance metrics such as click-through rate (CTR), cost per click (CPC), conversion rate, and ROI. 

In [None]:
#Load the dataset
df = pd.read_excel('marketing_campaign_dataset.xlsx')

In [None]:
# dataset preview
df.head()

Unnamed: 0,Campaign_ID,Company,Campaign_Type,Target_Audience,Duration,Channel_Used,Conversion_Rate,Acquisition_Cost,ROI,Location,Date,Clicks,Impressions,Engagement_Score,Customer_Segment
0,1,Innovate Industries,Email,Men 18-24,30 days,Google Ads,0.04,16174,6.29,Chicago,2021-01-01 00:00:00,506,1922,6,Health & Wellness
1,2,NexGen Systems,Email,Women 35-44,60 days,Google Ads,0.12,11566,5.61,New York,2021-02-01 00:00:00,116,7523,7,Fashionistas
2,3,Alpha Innovations,Influencer,Men 25-34,30 days,YouTube,0.07,10200,7.18,Los Angeles,2021-03-01 00:00:00,584,7698,1,Outdoor Adventurers
3,4,DataTech Solutions,Display,All Ages,60 days,YouTube,0.11,12724,5.55,Miami,2021-04-01 00:00:00,217,1820,7,Health & Wellness
4,5,NexGen Systems,Email,Men 25-34,15 days,YouTube,0.05,16452,6.5,Los Angeles,2021-05-01 00:00:00,379,4201,3,Health & Wellness



The dataset consists of the following key variables:  

`Campaign_ID:` Unique identifier for each campaign. 

`Company:` Name of the company running the campaign. 

`Campaign_Type:` Type of marketing campaign (Email, Influencer, Display, etc.). 

`Target_Audience:` Demographic group targeted by the campaign. 

`Duration:` Length of the campaign. 

`Channel_Used:` The platform where the campaign was run (Google Ads, YouTube, etc.). 

`Conversion_Rate:` Percentage of users who converted after engaging with the campaign. 

`Acquisition_Cost:` Cost incurred to acquire a customer. 

`ROI:` Return on investment for the campaign. 

`Location:` Geographic location where the campaign was run. 

`Date:` Date of the campaign. 

`Clicks:` Number of clicks the campaign received. 

`Impressions:` Number of times the campaign was viewed. 

`Engagement_Score:` A numerical score representing user engagement with the campaign. 

`Customer_Segment:` The market segment targeted by the campaign. 

In [None]:
# Convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)


In [None]:
df.head()

Unnamed: 0,Campaign_ID,Company,Campaign_Type,Target_Audience,Duration,Channel_Used,Conversion_Rate,Acquisition_Cost,ROI,Location,Date,Clicks,Impressions,Engagement_Score,Customer_Segment
0,1,Innovate Industries,Email,Men 18-24,30 days,Google Ads,0.04,16174,6.29,Chicago,2021-01-01,506,1922,6,Health & Wellness
1,2,NexGen Systems,Email,Women 35-44,60 days,Google Ads,0.12,11566,5.61,New York,2021-02-01,116,7523,7,Fashionistas
2,3,Alpha Innovations,Influencer,Men 25-34,30 days,YouTube,0.07,10200,7.18,Los Angeles,2021-03-01,584,7698,1,Outdoor Adventurers
3,4,DataTech Solutions,Display,All Ages,60 days,YouTube,0.11,12724,5.55,Miami,2021-04-01,217,1820,7,Health & Wellness
4,5,NexGen Systems,Email,Men 25-34,15 days,YouTube,0.05,16452,6.5,Los Angeles,2021-05-01,379,4201,3,Health & Wellness


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200005 entries, 0 to 200004
Data columns (total 15 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   Campaign_ID       200005 non-null  int64         
 1   Company           200005 non-null  object        
 2   Campaign_Type     200005 non-null  object        
 3   Target_Audience   200005 non-null  object        
 4   Duration          200005 non-null  object        
 5   Channel_Used      200005 non-null  object        
 6   Conversion_Rate   200005 non-null  float64       
 7   Acquisition_Cost  200005 non-null  int64         
 8   ROI               200005 non-null  float64       
 9   Location          200005 non-null  object        
 10  Date              200005 non-null  datetime64[ns]
 11  Clicks            200005 non-null  int64         
 12  Impressions       200005 non-null  int64         
 13  Engagement_Score  200005 non-null  int64         
 14  Cust

In [None]:
df.shape

(200005, 15)

In [None]:
df.dtypes

Campaign_ID                  int64
Company                     object
Campaign_Type               object
Target_Audience             object
Duration                    object
Channel_Used                object
Conversion_Rate            float64
Acquisition_Cost             int64
ROI                        float64
Location                    object
Date                datetime64[ns]
Clicks                       int64
Impressions                  int64
Engagement_Score             int64
Customer_Segment            object
dtype: object

# DATA EXPLORATION

In [None]:
df.describe()

Unnamed: 0,Campaign_ID,Conversion_Rate,Acquisition_Cost,ROI,Clicks,Impressions,Engagement_Score
count,200005.0,200005.0,200005.0,200005.0,200005.0,200005.0,200005.0
mean,100003.0,0.080069,12504.441794,5.002416,549.774591,5507.307107,5.494673
std,57736.614632,0.040602,4337.66321,1.734485,260.019354,2596.863794,2.872593
min,1.0,0.01,5000.0,2.0,100.0,1000.0,1.0
25%,50002.0,0.05,8740.0,3.5,325.0,3266.0,3.0
50%,100003.0,0.08,12497.0,5.01,550.0,5518.0,5.0
75%,150004.0,0.12,16264.0,6.51,775.0,7753.0,8.0
max,200005.0,0.15,20000.0,8.0,1000.0,10000.0,10.0


In [None]:
df.isnull().sum()

Campaign_ID         0
Company             0
Campaign_Type       0
Target_Audience     0
Duration            0
Channel_Used        0
Conversion_Rate     0
Acquisition_Cost    0
ROI                 0
Location            0
Date                0
Clicks              0
Impressions         0
Engagement_Score    0
Customer_Segment    0
dtype: int64

In [None]:
df.duplicated().sum()

0

In [None]:
# Set plot style
plt.style.use("seaborn-darkgrid")

In [None]:
# Create subplots for outlier analysis
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Enhanced Exploratory Data Analysis (EDA)

## 1. Data Overview
We begin by understanding the structure, types, and key metrics of the dataset.

In [None]:
df.info()

In [None]:
df.describe()

## 2. Handling Missing Values
Checking for missing values and handling them appropriately.

In [None]:
df.isnull().sum()

## 3. Data Distribution and Outliers
Visualizing numerical columns to understand distributions and detect outliers.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot distributions of key numerical metrics
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
sns.histplot(df['ROI'], bins=30, kde=True, ax=axes[0, 0])
axes[0, 0].set_title("ROI Distribution")

sns.histplot(df['Conversion_Rate'], bins=30, kde=True, ax=axes[0, 1])
axes[0, 1].set_title("Conversion Rate Distribution")

sns.boxplot(x=df['Acquisition_Cost'], ax=axes[1, 0])
axes[1, 0].set_title("Acquisition Cost Boxplot")

sns.boxplot(x=df['Clicks'], ax=axes[1, 1])
axes[1, 1].set_title("Clicks Boxplot")

plt.tight_layout()
plt.show()

## 4. Campaign Performance Analysis
Analyzing which campaigns performed best based on key metrics.

In [None]:
# Top 10 campaigns by ROI
top_campaigns = df[['Campaign_ID', 'Company', 'ROI', 'Channel_Used']].sort_values(by='ROI', ascending=False).head(10)
top_campaigns

In [None]:
# Campaign performance by marketing channel
plt.figure(figsize=(12, 5))
sns.boxplot(x='Channel_Used', y='ROI', data=df)
plt.xticks(rotation=45)
plt.title("ROI Distribution Across Marketing Channels")
plt.show()

## 5. Click-Through Rate (CTR) and Cost Per Click (CPC) Analysis
Evaluating the effectiveness of campaigns in driving engagement.

In [None]:
# Calculate CTR and CPC
df['CTR'] = df['Clicks'] / df['Impressions']
df['CPC'] = df['Acquisition_Cost'] / df['Clicks']

# Plot CTR by channel
plt.figure(figsize=(12, 5))
sns.boxplot(x='Channel_Used', y='CTR', data=df)
plt.xticks(rotation=45)
plt.title("Click-Through Rate (CTR) Across Channels")
plt.show()

## 6. Location-Based Insights
Understanding how marketing campaigns perform in different locations.

In [None]:
# Average ROI by location
location_performance = df.groupby('Location')['ROI'].mean().sort_values(ascending=False).head(10)

plt.figure(figsize=(12, 5))
sns.barplot(x=location_performance.index, y=location_performance.values)
plt.xticks(rotation=45)
plt.title("Top 10 Locations by Average ROI")
plt.show()

## Conclusion
- The best-performing campaigns tend to have high ROI and low acquisition costs.
- Google Ads and YouTube appear to have high ROI and CTR compared to other channels.
- Some locations show significantly higher ROI, indicating better audience engagement.
- Further analysis can explore time-based trends and audience segmentation for deeper insights.

## Insights & Inferences

### ROI and Conversion Rate Distributions
- The ROI distribution is right-skewed, indicating that most campaigns have a moderate ROI, but a few campaigns perform exceptionally well.
- The conversion rate distribution shows that while most campaigns convert at a low rate, a few campaigns achieve significantly higher conversion rates.

### Acquisition Cost and Clicks
- The boxplot for acquisition cost suggests the presence of outliers, meaning some campaigns have significantly higher costs.
- Clicks also exhibit a wide range, indicating varying levels of audience engagement across campaigns.

### Campaign Performance by Channel
- Google Ads and YouTube show relatively higher median ROI compared to other marketing channels.
- Display ads appear to have a wider spread of ROI, meaning some campaigns perform well while others do not.

### Click-Through Rate (CTR) and Cost Per Click (CPC)
- CTR varies significantly across different channels, with some channels driving more engagement than others.
- CPC analysis shows that lower CPC does not always correspond to higher ROI, indicating that campaign effectiveness depends on multiple factors.

### Location-Based Insights
- Some locations exhibit significantly higher ROI than others, suggesting that geographical targeting can influence campaign success.
- Further analysis could help optimize location-based marketing strategies for improved performance.

## Summary of Findings
- High-performing campaigns share characteristics such as high CTR, moderate CPC, and well-targeted audiences.
- Certain channels consistently outperform others, emphasizing the importance of strategic channel selection.
- Location-based analysis reveals opportunities for geographic optimization of marketing efforts.
- Additional deep dives into specific customer segments may provide further actionable insights for improving marketing efficiency.