# Ad Effectiveness: Analysis of Ad Interactivity and Location 

Github link: https://github.com/hjeffreywang/Ad-Effectiveness-Report

-----

#### =========================================================================================================================

# Work below

## **Scenario** 
We are given data from ad sites such as google and facebook, how can these metrics be used to create actionable insights for the rest of the teams? Done in presentation form for ease of process. 

## **Goal** 
ABCJewelry wishes to increase sales while also reducing ad costs. Our job is to find actionable insights to help with those decisions.


## **Key Tasks**  
1.   Data Acquisition and Exploration
2.   Feature Processing
3.   Data Analysis
4.   Data Engineering
5.   Visualization Development

## **Deliverables** 
1. How effective are our ads? (appearances, clicks, and Clicks/Appearance ratio)
2. Which ads are best? (Campaign, Site, Platform)
3. How long before ads become ineffective (# of appearances vs length of time)
4. Where are our audience? (Most campaigns, most appearances, most clicks)




---




## **Feature definitions**

*   **campaign_item_id** : unique id of each adevertising campaign
*   **no_of_days** : number of days campaign has been running
*   **time** : timestamp on which the data was captured
*   **ext_service_id** : id of each advertising platforms used
*   **ext_service_name** : name of each advertising platforms used
*   **creative_id** : id of the creative images used for ads
*   **creative_height** : height of the creative image for the ad in pixels
*   **creative_width** : width of the creative image for the ad in pixels
*   **search_tags** : search tags used for displaying ads
*   **template_id** : template used in the creative image
*   **landing_page** : landing page url on which users clicked or browsed through
*   **advertiser_id** : id of the advertiser
*  **advertiser_name** : name of the place of the advertiser ( city , country , state )
*  **network_id** : id of the each agency
*  **advertiser_currency** : currency of the country in which the advertiser operates in
*  **channel_id** : id of each channel used for placed ads
*  **channel_name** : name of the channel ( display , search , social , mobile video )
*  **max_bid_cpm** : maximum value of bid for optimizing cpm
*  **campaign_budget_usd** : overall budget of the campaign or the amount of money that the campaign can spend
*  **impressions** : the number of times an advertisement is displayed on a website or social media platform.
*  **clicks** : the number of times an advertisement is clicked on by a user, leading them to the advertiser's website or landing page.
*  **currency_code** : the currency code of the advertiser
*  **exchange_rate** : a relative price of one currency expressed in terms of another currency.
*  **media_cost_usd** : the amount of money that the campaign has spent on that particuar day
*  **position_in_content** : position where the ad was placed on the website page
*  **unique_reach** : the number of unique users who see your post or page.
*  **total_reach** : the number of people who saw any content from your page or about your page.
*  **search_tags** : a word or set of words a person enters when searching on Google or one of our Search Network sites.
*  **cmi_currency_code** : campaign currency code
*  **time_zone** : timezone in which the campaign is running
*  **weekday_cat** : weekday / weekend catgeory
*  **keywords** : a word or set of words that Google Ads advertisers can add to a given ad group so that your ads are targeting the right audience.














### **Import Required Libraries**

In [None]:
# Import 
import numpy as np
import pandas as pd
import altair as alt
import pandas_profiling as pp


import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

import warnings # to avoid warnings
warnings.filterwarnings('ignore')


In [None]:
import plotly.graph_objs as go
import plotly.offline as pyo
import plotly.express as px



In [None]:
# display all columns of the dataframe
pd.options.display.max_columns = None

# use below code to convert the 'exponential' values to float
np.set_printoptions(suppress=True)

In [None]:
# set the plot size using 'rcParams'
# once the plot size is set using 'rcParams', it sets the size of all the forthcoming plots in the file
# pass width and height in inches to 'figure.figsize' 
plt.rcParams['figure.figsize'] = [15,8]

### **Load and Exploring the dataset**

In [None]:
# load
df=pd.read_csv("dataset.csv",low_memory=False)

# preview 5 first 5 rows
df.head(5)

In [None]:
# see total number of rows
df.shape 

In [None]:
df.columns

In [None]:
df.info() 

### Cleaning Null Values 

In [None]:
# sort the variables on the basis of total null values in the variable
Total = df.isnull().sum().sort_values(ascending = False)          

#calculate nulls
Percent = (df.isnull().sum()*100/df.isnull().count()).sort_values(ascending = False)   
missing_data = pd.concat([Total, Percent], axis = 1, keys = ['Total', 'Percentage of Missing Values'])    

# add the column containing data type of each variable
missing_data['Type'] = df[missing_data.index].dtypes
missing_data

In [None]:
# creative width
df['creative_width'] = df['creative_width'].fillna(0) 

# creative height
df['creative_height'] = df['creative_height'].fillna(0)

# template id
df['template_id'] = df['template_id'].fillna(-1)

# approved_budget
df['approved_budget'] = df['approved_budget'].fillna(0)

### **Drop unnecessary columns**

* Prune features that are entirely made up of null or actively harmful to analysis.

In [None]:
df.drop(columns=['position_in_content','unique_reach','total_reach','max_bid_cpm'],inplace=True)

* **no_of_days** : campaigns run for atleast a month , so when no_of_days == 0 means one day only.
* **


In [None]:
df.describe()

* **ext_service_name** : most ads were Facebook Ads since it is the most populated social channel for target audience. 
* **landing_page** : boho jewelry page has the most clicked ads.

In [None]:
# summary of categorical variables
df.describe(include=object)

# Note: If we pass 'include=object' to the .describe(), it will return descriptive statistics for categorical variables only

### **Creating a metric to measure Clicks per appearance**

In [None]:
df['ctr']=(df['clicks']/df['impressions'])*100

In [None]:
df.loc[:,:].sort_values('ctr',ascending=False)[1000:1050]

In [None]:
df.loc[:,:].sort_values('clicks',ascending=False)[1000:1050]

In [None]:
df['ext_service_name'].value_counts()

# **Visualization Implementations**
**Reasonings are conveyed at the chart descriptions**


## **Histogram chart**
A histogram is used to illustrate the distribution of a dataset and displays which values are most frequent.

**Reasons**
1.   To calculate the probability of representation of any value of a continuous variable
2.    Helps to visualize whether the distribution is symmetric or skewed left or right.
3. It can also show any outliers or gaps in the data. 



---



The benchmarks for CTR ( Click through rate ) is 0.76% for Style & Fashion tags (Google) and 2.71% to be in Top 10% competition. Our CTR distribution lies between 0.76-2.71 for ABC company.

In [None]:
# Clicks per 100 Appearances Frequency Distribution
# set the xlabel and the fontsize
plt.xlabel("ctr", fontsize=15)

# set the ylabel and the fontsize
plt.ylabel("Frequency", fontsize=15)

# set the title of the plot
plt.title("Clicks per 100 Appearances Distribution", fontsize=15)

# plot the histogram for the target variable
plt.hist(df.loc[(df["ctr"]>=0.1) & (df["ctr"]<=0.76)]['ctr'])
plt.show()

In [None]:
# Filter the data and create a histogram
hist_data = df.loc[(df["ctr"]>0.1) & (df["ctr"]<=3.71)]['ctr']
fig = px.histogram(hist_data, nbins=20)

# Set the layout properties
fig.update_layout(
    title="Clicks per 100 Appearances Distribution",
    xaxis_title="cpa",
    yaxis_title="Frequency",
    font=dict(size=15),
    showlegend=False,    width=750, # set width to 500 pixels
    height=500, # set height to 500 pixels

)

fig.show()

## **SCATTER PLOT**

#### Purpose:


1. Identify easily visible patterns and relationships

2. Detecting outliers

3. Visualize trends over time


---


## **Conclusions**

1. As campaign length increases, impressions and clicks decrease 

3. Campaigns of longer duration have constant & low impressions and clicks 

4. The graphs below can show that most campaigns with less duration were newly created or paused due to poor performance.

5. We can spot outliers in both the graphs which indicates sudden spikes in the impressions and clicks which maybe due to certain events such as festivals , social media popularity , etc . We can further analyse at what time of the day , on which days , in which season , festivals , national or public holidays the performance usually goes up.




## Prettier Graph

In [None]:
import plotly.express as px

fig = px.scatter(df, x="no_of_days", y="impressions", trendline="ols",    width=750, # set width to 500 pixels
    height=500, # set height to 500 pixels
)
fig.show()

In [None]:

fig = px.scatter(df, x="no_of_days", y="clicks", trendline="ols",    width=750, # set width to 500 pixels
    height=500, # set height to 500 pixels
)
fig.show()

## **`Conclusions`**

1. Mobile campaigns are able to get higher clicks
2. The performance of search campaigns needs to be improved 
3. Social campaigns are able to reach audiences more but unable to get conversions

In [None]:
# scatter plot : impressions vs clicks ( hue : channel_name )
sns.lmplot(x = "impressions", y = "clicks", data = df, fit_reg=False, hue='channel_name')


In [None]:
fig = px.scatter(df, x="impressions", y="clicks", color="channel_name",
                 hover_name="channel_name",
                 labels={"impressions": "Impressions", "clicks": "Clicks"},
                 title="Impressions vs Clicks by Channel Name")

fig.update_layout(
    font=dict(size=14),
    legend=dict(title=None),
    plot_bgcolor="white",
    margin=dict(l=80, r=20, t=60, b=80),
    width=750, # set width to 500 pixels
    height=500, # set height to 500 pixels
)


fig.show()

## **`Conclusions`**
 
1. Top 3 countries where the campaigns are running are India, Oman , Qatar & UAE. This means the our company mainly operates in the Middle Eastern Asian region .
2. We can further find out performance metrics of each country vs budget they were alloted to gauge the relative performance and take more informed decision.

## **Making a more Readable Visualization**

We need to group Indian states as one country.

In [None]:
#  calculate data
labels=df['advertiser_name'].value_counts().index,
values=df['advertiser_name'].value_counts(),

In [None]:
df_abridged=df

In [None]:
df.head()

In [None]:
India = ['Andhra Pradesh', 'Karnataka', 'Pan India','North', 'Gujarat', 'Orissa', 'Tamil Nadu', 'Maharashtra', 'West Bengal', 'Madhya Pradesh', 'Coimbatore', 'Bangalore', 'Chennai', 'Punjab', 'Delhi', 'Haryana', 'UP','Pali', 'Vadodara']

Oman = ['Muscat','Sohar']

Qatar = ['Doha']

UAE = ['Dubai', 'Abu Dhabi']

Bahrain = ['Manama']

Kuwait = ['Kuwait City', 'Al Ahmadi']

KSA = ['Jeddah']

Malaysia = ['Kuala Lumpur']

Singapore = ['Singapore']

USA = ['New York']

Thailand = ['Bangkok']

Egypt = ['Cairo', 'Luxor',  'Almaza Bay']

Bangladesh = ['Chattogram', 'Chandpur']

Ethiopia = ['Addis Ababa']


In [None]:
cities_by_country = {'India': India,
'Oman': Oman,
'Qatar': Qatar,
'UAE': UAE,
'Bahrain': Bahrain,
'Kuwait': Kuwait,
'KSA': KSA,
'Malaysia': Malaysia,
'Singapore': Singapore,
'USA': USA,
'Thailand': Thailand,
'Egypt': Egypt,
'Bangladesh': Bangladesh,
'Ethiopia': Ethiopia}

In [None]:
for country,cities in cities_by_country.items():
    df.loc[df['advertiser_name'].isin(cities), "advertiser_name"] = country
    

In [None]:
label = df['advertiser_name'].value_counts().index
name = df['advertiser_name'].value_counts()

In [None]:

# Create data for the Pie Chart
data = [go.Pie(labels=name.index,
               values=name,
               hole=0.4,
               textposition='inside',
               textinfo='label+percent',
               hoverinfo='label+percent+value')]

# Set layout for the Pie Chart
layout = go.Layout(title='Ad Campaigns running accross the globe (Percentage)',
                   showlegend=False,
                   legend=dict(orientation="h"),
                   width=1000,
                   height=500,)

# Create figure object
fig = go.Figure(data=data, layout=layout)

# Display the figure
pyo.iplot(fig)

In [None]:
data_2=df.groupby('advertiser_name')['clicks'].sum().sort_values()

In [None]:

# Create data for the Pie Chart
data = [go.Pie(labels=data_2.index,
               values=data_2,
               hole=0.4,
               textposition='inside',
               textinfo='label+percent',
               hoverinfo='label+percent+value')]

# Set layout for the Pie Chart
layout = go.Layout(title='Clicks across the Globe (%)',
                   showlegend=False,
                    width=1000,
                   height=500,)

# Create figure object
fig = go.Figure(data=data, layout=layout)

# Display the figure
pyo.iplot(fig)

In [None]:
data_3=df.groupby('advertiser_name')['impressions'].sum().sort_values()

In [None]:

# Create data for the Pie Chart
data = [go.Pie(labels=data_3.index,
               values=data_3,
               hole=0.4,
               textposition='inside',
               textinfo='label+percent',
               hoverinfo='label+percent+value')]

# Set layout for the Pie Chart
layout = go.Layout(title='Ad Appearances across the Globe (%)',
                   showlegend=False,
                    width=1000,
                   height=500,)

# Create figure object
fig = go.Figure(data=data, layout=layout)

# Display the figure
pyo.iplot(fig)

In [None]:
data_4=df.groupby('advertiser_name')['ctr'].mean().sort_values()

In [None]:
data_4