# Analysis of Digital Marketing Data and KPIs

In this project, we will be a digital marketing dataset and answering questions about the the effectiveness of the different types of campaigns.
This dataset comes from an Indian based company, so the data is based around the Rupee, the Indian Currency. 

The dataset we will use contains data on different digital marketing campaigns and the performances. The columns are as follows:
- id : Unique identifier for each entry (index column)
- c_date : Date of the campaign
- campaign_name : Name of the campaign 
- category: Category of the campaign (e.g., social, search, influencer, media)
- campaign_id: Unique identifier for each campaign
- impressions: Number of impressions generated by the campaign
- mark_spent: Amount spent on the campaign
- clicks: Number of clicks received
- leads: Number of leads generated
- orders: Number of orders made
- revenue: Revenue generated from the campaign

Using this data, we want to calculate marketing metrics that will help us evaluate the success of the campaigns.
- Return on Marketing Investment (ROMI) : Effectiveness of every Rupee Spent
- Click Through Rate (CTR) : Percentage of People who click banner
- Cost per Click (CPC): Cost to attract one click
- Cost per Lead (CPL) : Cost to attract 1 lead
- Customer Acquisition Cost (CAC) : Cost to attract 1 sale
- Average Order Value (AOV) : Average Order Value from 1 Sale
- Conversion Rate 1 (Visitors to Leads)
- Conversion Rate 2 (Leads to Sales)

We will clean our data and add our KPIs to the dataset, with the intention of using the new dataset in Power BI to create a dashboard to explore our data.

In [96]:
# import dependencies 
import pandas as pd
import numpy as np

In [97]:
# import data
df = pd.read_csv('Marketing.csv')
df1 = pd.read_csv('Marketing.csv')


## Explore Initial Dataset

In [98]:
df.shape

(308, 11)

In [99]:
df.head()

Unnamed: 0,id,c_date,campaign_name,category,campaign_id,impressions,mark_spent,clicks,leads,orders,revenue
0,1,2021-02-01,facebook_tier1,social,349043,148263,7307.37,1210,13,1,4981.0
1,2,2021-02-01,facebOOK_tier2,social,348934,220688,16300.2,1640,48,3,14962.0
2,3,2021-02-01,google_hot,search,89459845,22850,5221.6,457,9,1,7981.0
3,4,2021-02-01,google_wide,search,127823,147038,6037.0,1196,24,1,2114.0
4,5,2021-02-01,youtube_blogger,influencer,10934,225800,29962.2,2258,49,10,84490.0


In [100]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 308 entries, 0 to 307
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             308 non-null    int64  
 1   c_date         308 non-null    object 
 2   campaign_name  308 non-null    object 
 3   category       308 non-null    object 
 4   campaign_id    308 non-null    int64  
 5   impressions    308 non-null    int64  
 6   mark_spent     308 non-null    float64
 7   clicks         308 non-null    int64  
 8   leads          308 non-null    int64  
 9   orders         308 non-null    int64  
 10  revenue        308 non-null    float64
dtypes: float64(2), int64(6), object(3)
memory usage: 26.6+ KB


## Data Preprocessing
Let's take a look at the data and see if we need to perform some cleaning. One thing I want to do is to convert rename the "mark_spent" column to "market_spent" and convert the values in "market_spent" and "revenue" to US Dollars, that way it's easier for me to conceptualize the amount of money being spent. 

In [101]:
# function to convert INR to USD
def convert(amount):
    rupee =  73.9339 # Average Conversion Rate in 2021 INR to USD
    dollar = round(amount/rupee, 2)
    return dollar

In [102]:
# Drop Duplicates
df = df.drop_duplicates()

In [103]:
# Check if missing vales
df.isnull().values.any()

False

In [104]:
# Correct Structural Errors
df['id'].unique()

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103, 104,
       105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
       118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
       131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
       144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
       157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
       170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 18

In [105]:
df['c_date'].unique()

array(['2021-02-01', '2021-02-02', '2021-02-03', '2021-02-04',
       '2021-02-05', '2021-02-06', '2021-02-07', '2021-02-08',
       '2021-02-09', '2021-02-10', '2021-02-11', '2021-02-12',
       '2021-02-13', '2021-02-14', '2021-02-15', '2021-02-16',
       '2021-02-17', '2021-02-18', '2021-02-19', '2021-02-20',
       '2021-02-21', '2021-02-22', '2021-02-23', '2021-02-24',
       '2021-02-25', '2021-02-26', '2021-02-27', '2021-02-28'],
      dtype=object)

In [106]:
df['campaign_name'].unique()

array(['facebook_tier1', 'facebOOK_tier2', 'google_hot', 'google_wide',
       'youtube_blogger', 'instagram_tier1', 'instagram_tier2',
       'facebook_retargeting', 'facebook_lal', 'instagram_blogger',
       'banner_partner'], dtype=object)

In [107]:
# Change 'facebOOK_tier2' to lowercase
df['campaign_name'] = df['campaign_name'].str.lower()
df['campaign_name'].unique()

array(['facebook_tier1', 'facebook_tier2', 'google_hot', 'google_wide',
       'youtube_blogger', 'instagram_tier1', 'instagram_tier2',
       'facebook_retargeting', 'facebook_lal', 'instagram_blogger',
       'banner_partner'], dtype=object)

In [108]:
df['category'].unique()

array(['social', 'search', 'influencer', 'media'], dtype=object)

In [109]:
df['campaign_id'].unique()

array([  349043,   348934, 89459845,   127823,    10934,  9034945,
         983498,  4387490,   544756,   374754,    39889], dtype=int64)

In [110]:
df['impressions']

0       148263
1       220688
2        22850
3       147038
4       225800
        ...   
303     775780
304       1933
305      25840
306      94058
307    8490000
Name: impressions, Length: 308, dtype: int64

In [111]:
df['clicks']

0      1210
1      1640
2       457
3      1196
4      2258
       ... 
303    1024
304      58
305     248
306     594
307     849
Name: clicks, Length: 308, dtype: int64

In [112]:
df['leads']

0      13
1      48
2       9
3      24
4      49
       ..
303     4
304     0
305     5
306    12
307    18
Name: leads, Length: 308, dtype: int64

In [113]:
df['orders']

0       1
1       3
2       1
3       1
4      10
       ..
303     0
304     0
305     1
306     1
307     2
Name: orders, Length: 308, dtype: int64

df['']

In [114]:
# Convert mark_spent to USD
df['mark_spent'] = df['mark_spent'].apply(convert)
df['mark_spent']

0       98.84
1      220.47
2       70.63
3       81.65
4      405.26
        ...  
303     10.29
304      3.04
305     92.58
306     65.54
307     92.28
Name: mark_spent, Length: 308, dtype: float64

In [115]:
# Convert revenue to USD
df['revenue'] = df['revenue'].apply(convert)
df['revenue']

0        67.37
1       202.37
2       107.95
3        28.59
4      1142.78
        ...   
303       0.00
304       0.00
305      20.17
306      67.74
307      95.08
Name: revenue, Length: 308, dtype: float64

In [116]:
df = df.rename(columns={"mark_spent" : "marketing_spent"})
df.head()

Unnamed: 0,id,c_date,campaign_name,category,campaign_id,impressions,marketing_spent,clicks,leads,orders,revenue
0,1,2021-02-01,facebook_tier1,social,349043,148263,98.84,1210,13,1,67.37
1,2,2021-02-01,facebook_tier2,social,348934,220688,220.47,1640,48,3,202.37
2,3,2021-02-01,google_hot,search,89459845,22850,70.63,457,9,1,107.95
3,4,2021-02-01,google_wide,search,127823,147038,81.65,1196,24,1,28.59
4,5,2021-02-01,youtube_blogger,influencer,10934,225800,405.26,2258,49,10,1142.78


In [117]:
df

Unnamed: 0,id,c_date,campaign_name,category,campaign_id,impressions,marketing_spent,clicks,leads,orders,revenue
0,1,2021-02-01,facebook_tier1,social,349043,148263,98.84,1210,13,1,67.37
1,2,2021-02-01,facebook_tier2,social,348934,220688,220.47,1640,48,3,202.37
2,3,2021-02-01,google_hot,search,89459845,22850,70.63,457,9,1,107.95
3,4,2021-02-01,google_wide,search,127823,147038,81.65,1196,24,1,28.59
4,5,2021-02-01,youtube_blogger,influencer,10934,225800,405.26,2258,49,10,1142.78
...,...,...,...,...,...,...,...,...,...,...,...
303,304,2021-02-28,instagram_tier2,social,983498,775780,10.29,1024,4,0,0.00
304,305,2021-02-28,facebook_retargeting,social,4387490,1933,3.04,58,0,0,0.00
305,306,2021-02-28,facebook_lal,social,544756,25840,92.58,248,5,1,20.17
306,307,2021-02-28,instagram_blogger,influencer,374754,94058,65.54,594,12,1,67.74


## Adding KPI Metrics
We want to add our KPI metrics to the data frame for easier analysis.
- Return on Marketing Investment (ROMI) : Effectiveness of every Rupee Spent
- Click Through Rate (CTR) : Percentage of People who click banner
- Cost per Click (CPC): Cost to attract one click
- Cost per Lead (CPL) : Cost to attract 1 lead
- Customer Acquisition Cost (CAC) : Cost to attract 1 sale
- Average Order Value (AOV) : Average Order Value from 1 Sale
- Conversion Rate 1 (CONV1) : (Clicks to Leads)
- Conversion Rate 2 (CONV2) : (Leads to Orders)

In [118]:
df['ROMI'] = round((df['revenue'] - df['marketing_spent']) / df['marketing_spent']*100,2)
df['CTR'] = round((df['clicks'] / df['impressions']) * 100, 2)
df['CPC'] = round(df['marketing_spent'] / df['clicks'], 2)
df['CPL'] = round(df['marketing_spent']/df['leads'], 2)
df['CAC'] = round(df['marketing_spent'] / df['orders'], 2)
df['AOV'] = round(df['revenue'] / df['orders'], 2)
df['CONV1'] = round(df['leads']/df['clicks'], 3)
df['CONV2'] = round(df['orders']/df['leads'], 3)
df.head()

Unnamed: 0,id,c_date,campaign_name,category,campaign_id,impressions,marketing_spent,clicks,leads,orders,revenue,ROMI,CTR,CPC,CPL,CAC,AOV,CONV1,CONV2
0,1,2021-02-01,facebook_tier1,social,349043,148263,98.84,1210,13,1,67.37,-31.84,0.82,0.08,7.6,98.84,67.37,0.011,0.077
1,2,2021-02-01,facebook_tier2,social,348934,220688,220.47,1640,48,3,202.37,-8.21,0.74,0.13,4.59,73.49,67.46,0.029,0.062
2,3,2021-02-01,google_hot,search,89459845,22850,70.63,457,9,1,107.95,52.84,2.0,0.15,7.85,70.63,107.95,0.02,0.111
3,4,2021-02-01,google_wide,search,127823,147038,81.65,1196,24,1,28.59,-64.98,0.81,0.07,3.4,81.65,28.59,0.02,0.042
4,5,2021-02-01,youtube_blogger,influencer,10934,225800,405.26,2258,49,10,1142.78,181.99,1.0,0.18,8.27,40.53,114.28,0.022,0.204


In [119]:
# If a campaign did not generate any leads or any sales, then the value of some of our KPIs could be inf or null. We will fill these with zero
df = df.replace([np.nan, np.inf], 0)
# Export cleaned data as csv
df.to_csv('cleaned_marketing.csv')