# Analyzing and cleaning
1.) ad_id: an unique ID for each ad.

2.) xyzcampaignid: an ID associated with each ad campaign of XYZ company.

3.) fbcampaignid: an ID associated with how Facebook tracks each campaign.

4.) age: age of the person to whom the ad is shown.

5.) gender: gender of the person to whim the add is shown

6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

7.) Impressions: the number of times the ad was shown.

8.) Clicks: number of clicks on for that ad.

9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

11.) Approved conversion: Total number of people who bought the product after seeing the ad.

Questions:

1.)How to optimize the social ad campaigns for the highest conversion rate possible. (Attain best Reach to Conversion ratios/Click to Conversion ratios)

2.)Finding the perfect target demographics with the appropriate clickthrough rates

3.)Understanding the ideal turnaround/decision making time per age group to convert and retarget future social campaigns

4.)Comparing the individual campaign performance so the best creative/campaign can be run again with adjusted audiences.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
import plotly.express as px

In [None]:
df = pd.read_csv('../input/clicks-conversion-tracking/KAG_conversion_data.csv').set_index('ad_id')

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
#4 clusters can be made
df.age.unique()

In [None]:
df.gender.unique()

In [None]:
df.Total_Conversion.unique()

In [None]:
df.interest.unique()

# 1.) How to optimize the social ad campaigns for the highest conversion rate possible. (Attain best Reach to Conversion ratios/Click to Conversion ratios)

In [None]:
df.head()

In [None]:
px.bar(df,x='age',y='Spent',color='gender')

In [None]:
#the total conversion doesnt increase for paying more to facebook
px.bar(df,x='gender',y='Spent',color='age')

In [None]:
#High no. of people who enquired about the product are from age group 30-34
px.bar(df,x='age',y='Total_Conversion')

In [None]:
#more no. of people who clicked on ad are of age group 45-49
px.bar(df,x='age',y='Clicks',color='gender')

In [None]:
px.scatter(x=df['interest'],y=df['Total_Conversion'],color=df['age'])

In [None]:
px.scatter(x=df['Impressions'],y=df['Total_Conversion'],color=df['gender'])

In [None]:
df2 = df.groupby('fb_campaign_id')[['interest','Impressions','Spent','Total_Conversion']].mean()

In [None]:
df2.head()

In [None]:
#feature scaling df2
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

In [None]:
sc_features = sc.fit_transform(df2.values)

In [None]:
df3 = pd.DataFrame(sc_features)
df3 = df3.rename(columns = {0:'interest',1:'Impressions',2:'Spent',3:'Total_Conversion'})

In [None]:
df3.head()

In [None]:
sc_features

In [None]:
#Using KMeans Clustering
from sklearn.cluster import KMeans
import seaborn as sns

In [None]:
model = KMeans(n_clusters=2)
model.fit(sc_features)
df3['cluster'] = model.predict(sc_features)

In [None]:
sns.pairplot(df3,hue='cluster')

In [None]:
#getting the values of companys who are in cluster 1
data = df3[df3.cluster == 1]

In [None]:
#these data can be optimized to increase conversion rate
data

# Thus we can use data's features to increase the conversion rate 
# ---------------------------------------------------------------------------------------------------------------

# 2.)Finding the perfect target demographics with the appropriate clickthrough rates.

click through rate = the percentage of people visiting a web page who access a hypertext link to a particular advertisement.

In [None]:
df.describe()
#target demographics can be age, person's interest, gender

In [None]:
gen = pd.get_dummies(df.gender)

In [None]:
#calculating click-through rate
df['click_through'] = (df['Clicks'] / df['Impressions'])*100

In [None]:
df2 = pd.concat([df,gen],axis=1)

In [None]:
def age_mean(x):
     return ((int(x.split('-')[0]) + int(x.split('-')[1])) /2)

In [None]:
df2['age_mean'] = df2.age.map(lambda x: age_mean(x))

In [None]:
df2.head()

In [None]:
plt.figure(figsize=(12,8))
sns.boxplot(x=df2.age,y=df2.click_through,hue=df2.gender)

In [None]:
# females with 45-49 age have higher clicking rates.
plt.figure(figsize=(12,8))
sns.boxplot(x=df2.gender,y=df2.click_through,hue=df2.age)

In [None]:
plt.figure(figsize=(20,8))
sns.set(style='whitegrid')
sns.stripplot(data=df,x='interest',y='click_through')

In [None]:
plt.figure(figsize=(20,8))
sns.set(style='whitegrid')
sns.stripplot(data=df,x='gender',y='click_through')

# Hence the target group of people who has higher click-through rates are females under the age group of 45-49.
# ---------------------------------------------------------------------------------------------------------------

## 3.)Understanding the ideal turnaround/decision making time per age group to convert and retarget future social campaigns

In [None]:
df.head()

In [None]:
 sns.catplot(data=df, kind="bar",x="age", y="Total_Conversion", hue="gender",palette="dark", alpha=.6, height=6)

In [None]:
sns.catplot(data=df, kind="bar",x="age", y="Approved_Conversion", hue="gender",palette="dark", alpha=.6, height=6)

In [None]:
sns.catplot(data=df, kind="bar",x="age", y="click_through",hue="gender",palette="dark", alpha=.6, height=6)

# Results:
### 1.)Age Group 30-34: This age group has lesser percentage of click through rates but they have higher ratio in buying the product and doing enquiry about the product. Hence, they are risk takers who sees the ad clicks and most of the time takes the product.
### 2.)Age group 45-49: This age group clicks on ad more no. of times but don't buy the product or enquire frequently.
# Thus the company can target on age group 30-34
# ---------------------------------------------------------------------------------------------------------------

# 4.)Comparing the individual campaign performance so the best creative/campaign can be run again with adjusted audiences.

In [None]:
df.head()

In [None]:
df.xyz_campaign_id.value_counts()

In [None]:
company_1 = df[df.xyz_campaign_id == 1178]
company_2 = df[df.xyz_campaign_id == 936]
company_3 = df[df.xyz_campaign_id == 916]

In [None]:
#Using click_through rates to measure company's performance
px.scatter(x=company_1.Spent,y=company_1.click_through,color=company_1.gender)

In [None]:
px.scatter(x=company_2.Spent,y=company_2.click_through,color=company_2.gender)

In [None]:
px.scatter(x=company_3.Spent,y=company_3.click_through,color=company_3.gender)

# Company 1s campaigns can be used as better click through rates!