## Question 3: What are the most effective marketing channels and campaigns?
##### Import packages

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from q3 import *

##### Data cleaning

In [14]:
# Read data
df = pd.read_csv('final_data.csv', low_memory=False)

# Data cleaning
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
df['month'] = df['purchase_date'].dt.month  # Extract the month
df['year'] = df['purchase_date'].dt.year  # Extract the year
df['campaign_name'] = df['campaign_key'].str[5:8]  # Extract the campaign code

# Map campaign codes to actual campaign names
campaign_mapping = {
    'EST': 'Easter Sale',
    'MID': 'Mid-Year Sale',
    'HLW': 'Halloween',
    'XMS': 'Christmas',
    'STD': "St. Patrick's Day",
    'CMO': 'Cinco de Mayo',
    'DRK': 'Drinks Bonanza',
    'FBD': 'Food and Beverage Day',
    'SMS': 'Super Mart Sale',
    'MMS': 'Markdown Mega Sale'
}
df['campaign_name'] = df['campaign_name'].map(campaign_mapping)

# Select required columns
df = df[['customer_key', 'quantity_purchased', 'total_price', 'purchase_date', 'description', 'revenue', 'campaign_key', 'mkt_chnl_key','month', 'year', 'campaign_name']]
df['category'] = df['description'].apply(extract_category) # Add 'category' column

# For AOV analysis, filter out rows from 2021 onwards
df2 = df[df['year'] < 2021] 

df2.head()

Unnamed: 0,customer_key,quantity_purchased,total_price,purchase_date,description,revenue,campaign_key,mkt_chnl_key,month,year,campaign_name,category
0,C001743,4,72.0,2014-05-25,Food - Chips,56.0,,,5,2014,,Food
1,C008827,11,77.0,2018-12-31,a. Beverage - Soda,22.0,2018-XMS-DEC,,12,2018,Christmas,Beverage
2,C008830,11,253.0,2015-12-21,Food - Healthy,143.0,2015-XMS-DEC,,12,2015,Christmas,Food
3,C004301,5,275.0,2014-05-25,Beverage - Energy/Protein,160.0,,,5,2014,,Beverage
4,C008848,10,150.0,2020-12-22,Dishware - Utensils,90.0,2020-XMS-DEC,,12,2020,Christmas,Dishware


In [15]:
campaign_grouped = aov_analysis(df2)[2]
campaign_grouped

Unnamed: 0,year,campaign_name,revenue,customer_key,aov
0,2014,Christmas,441516.0,5099,86.588743
1,2014,Easter Sale,64532.0,1072,60.197761
2,2014,Halloween,115926.0,1941,59.724884
3,2014,Mid-Year Sale,144860.0,2731,53.042841
4,2015,Christmas,367689.0,4588,80.141456
5,2015,Easter Sale,58166.0,978,59.474438
6,2015,Mid-Year Sale,129731.0,2446,53.038021
7,2015,St. Patrick's Day,112089.0,1804,62.133592
8,2016,Christmas,363552.0,4512,80.574468
9,2016,Cinco de Mayo,127015.0,2071,61.330275


In [16]:
non_campaign_grouped = aov_analysis(df2)[3]
non_campaign_grouped

Unnamed: 0,year,revenue,customer_key,aov
0,2014,6824457.0,125279,54.47407
1,2015,7169941.0,133631,53.654773
2,2016,7186566.0,131948,54.465138
3,2017,7174335.0,132667,54.077766
4,2018,7135464.0,132086,54.02135
5,2019,7184018.0,132183,54.349031
6,2020,7261243.0,133454,54.410081


##### General Trend of Overall Campaign AOV vs Non-Campaign AOV

In [17]:
df_campaign = aov_analysis(df2)[0]
df_non_campaign = aov_analysis(df2)[1]
df_comparison = aov_yearly(df_campaign, df_non_campaign)
aov_yearly_plot(df_comparison)

##### Comparing Specific Campaign AOV vs Non-Campaign AOV Trend

In [18]:
aov_campaign_plot(campaign_grouped, non_campaign_grouped)

##### AOV per Campaign (2014 - 2020)

In [19]:
df_campaign_by_month = aov_analysis(df2)[4]
aov_campaign_bar_graph(df_campaign_by_month)

From 2014 to 2020, the Christmas campaign consistently achieves the highest AOV among all campaigns, indicating it is highly effective at driving larger purchases. This suggests that customers may be more willing to spend higher amounts during the holiday season, possibly due to holiday gifting. The next best campaign with relatively high AOV is the Mid-Year Sale, where the AOV is the second highest from 2016 to 2020. As for events that has happened only once before, the Markdown Mega Sale is the best performing campaign since it has the highest AOV of 62.3. This suggests that Markdown Mega Sale can continue to be implemented in future years since it has been proven to be effective in achieving relatively high AOV in 2020 as a one-time event. 

##### Sales Growth Rate

In [20]:
# Input the chosen year for analysis
chosen_year = 2019


metrics = campaign_eff_metrics(df, chosen_year)
sales_growth_rate_plot(metrics[0],chosen_year)

Here, we see that for 2019, the Mid-Year Sale generated the highest gross sales. However, revenue is a better indicator of how well our business is going - the total amount of money that our business brings in. Therefore, we shall analyse revenue and transaction count in greater detail to accurately evaluate campaign effectiveness.

##### Revenue and Transaction Count

In [21]:
revenue_transaction_plot(metrics[1], metrics[2], chosen_year)

##### Average Order Size (AOS) and Average Order Value (AOV)

In [22]:
aos_aov_plot(metrics[3], metrics[4], chosen_year)

##### AOV by Product Category for Each Campaign

In [23]:
# Plot aov by category for each campaign in the chosen year
plot_aov_by_category(metrics[5],chosen_year)