In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df=pd.read_csv("/kaggle/input/clicks-conversion-tracking/KAG_conversion_data.csv")

In [None]:
df.head()



The documenation describes the columns in the data as follows:

1.) ad_id: unique ID for each ad.

2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.

3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.

4.) age: age of the person to whom the ad is shown.

5.) gender: gender of the person to whom the add is shown

6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

7.) Impressions: the number of times the ad was shown.

8.) Clicks: number of clicks on for that ad.

9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

11.) Approved conversion: Total number of people who bought the product after seeing the ad.

In [None]:
import seaborn as sns
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df.info()

We can see 7 features holds int object properties while 1 being float and the rest following object properties

In [None]:
df.shape

1143 -> rows

11 -> columns(features)

In [None]:
df.describe()



Let's break this down now,

count -> number of rows(data points)

mean -> mean of the entire dataset

std -> standard deviation of the entire dataset

min -> minimum value in the dataset

25% (quantile value) -> 25% of the people have clicked on at least one ad 

50%(median value) -> 50% of the people have clicked on at least 8 ads

75% -> 75% of the people have clicked on atleast ads equivalent to 38

max -> maximum value in the dataset

***Let's briefly consider important notes,***

1. For Clicks, 
    * Maximum -  421
    * Minimum - 0
2. Amount spent by the company to show facebook ads,
    * Maximum - 639
    * Minimum - 0
3. Maximum 60 number of people who enquired about the product after seeing the ad 
4. Maximum 21 number of people who bought the product after seeing the ad.
5. Average conversion rate is almost 3%.



In [None]:
print("We have customers from age groups as follows:")
print(df['age'].unique())

# Clicks vs Gender and Age

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, col="gender", hue="age")
g.map(plt.scatter, "Impressions", "Clicks", alpha=.4)
g.add_legend();

It can be depicted that the largest age group(45-49) participated the most for clicks. It is noteworthy to notice the contribution of Female gender is considerably larger than that of Male gender

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, col="age", hue="gender")
g.map(plt.scatter, "Impressions", "Clicks", alpha=.4)
g.add_legend();

In both male and female gender, the age groups :45-49 and 30-34 clicks on more ads howevever the rest performs less than 300 clicks

# Who actually enquired about the product

In [None]:
#Let's see how many people actually equired about the product
g = sns.FacetGrid(df, col="gender", hue="age")
g.map(plt.scatter, "Clicks", "Total_Conversion", alpha=.4)
g.add_legend();

However, even though women tend to click more. Men, especially in the age group of 30-34, tend to enquire more about the products after seeing an ad as compared to women.

In [None]:
g = sns.FacetGrid(df, col="age", hue="gender")
g.map(plt.scatter, "Clicks", "Total_Conversion", alpha=.4)
g.add_legend();

Highest number of people who enquired about the product after seeing the ad comes from the age group 30-34, for both, male and female. 

# Who actually bought the product

In [None]:
g = sns.FacetGrid(df, col="gender", hue="age")
g.map(plt.scatter, "Total_Conversion", "Approved_Conversion", alpha=.4)
g.add_legend();

Here, we have compared total_conversion to approved_conversion to represent how many people goes form "enquiring the product after seeing an ad" (total_coversion) to actually "buying the product"(approved_conversion).

Turns out men buys more products than women. 

In [None]:
g = sns.FacetGrid(df, col="age", hue="gender")
g.map(plt.scatter, "Total_Conversion", "Approved_Conversion", alpha=.4)
g.add_legend();

Most men and women in the age group of 30-34 bought products after enquiring about it 

# What have we analysed so far?

*With respect to Gender*
1. Women clicks more on the ads than men
2. Men tends to enquire more about the product than women
3. More Men tends to buy product after enquiring than women

*With respect to Age*
1. The age group of 35-39 showed least participation on clicking ads, enquiring about or buying the products
2. The age group of 45-49 showed highest activity of clicking on ads
3. The age group of 30-34 enquired about the product and bought the product more as compared to the rest of the age groups

In [None]:
#now let's dive deeper
plt.figure(figsize=(8,6))
sns.scatterplot(x = 'Impressions' ,y='Clicks', hue='age', data=df)

Shows a linear relationship, with the increase in number of times the ads are shown, clicks on ads are increasing too!

In [None]:
df['interest'].unique()

There are total 40 categories in which people's interest is divided

# Sales vs Interests

In [None]:
g = sns.FacetGrid(df, col="age", hue="gender")
g.map(plt.scatter, "interest", "Approved_Conversion", alpha=.4)
g.add_legend();

We can see higher amount of Sales between product id 0-26(for both gender), especially for age group 30-34, as men within the age of 30-34 bought highest amount of products

# Money spent on ads

In [None]:
plt.figure(figsize=(15,8))
sns.swarmplot(x = 'interest' ,y='Spent', data=df, alpha = .6)

We can see that highest amount of money spent by a company to display ads lies in the category of interests:

10, 15, 16, 27, 28, 29 and 63

# Campaign vs Geder

In [None]:
plt.figure(figsize=(8,15))
g = sns.FacetGrid(df, col="gender", hue="age")
g.map(plt.scatter, "fb_campaign_id", "Clicks", alpha=.4)
g.add_legend();

In [None]:
plt.figure(figsize=(8,15))
g = sns.FacetGrid(df, col="gender", hue="age")
g.map(plt.scatter, "xyz_campaign_id", "Clicks", alpha=.4)
g.add_legend();

It shows that campaign ads from facebook and other companies had more female audience than male 

# Who did better?


In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Approved_Conversion', hue='gender', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'xyz_campaign_id' ,y='Approved_Conversion', hue='gender', data=df)

The above two graphs depicts number of sales achieved as per the campaigns by facebook and other companies as well. Overall, facebook had more sales as compared to other companies

# Who got more clicks?

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Clicks', hue='gender', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Approved_Conversion', hue='age', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'xyz_campaign_id' ,y='Clicks', hue='gender', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'xyz_campaign_id' ,y='Approved_Conversion', hue='age', data=df)

Both the companies got highest amount clicks from age group 45-49 and gender female. 

# Clearly, facebook ads got more clicks. Let's understand why, by analyzing whose ads were shown more?

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Impressions', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'xyz_campaign_id' ,y='Impressions', data=df)

This explains the sales and clicks. The ads by xyz companies were displayed relatively fewer times than facebook ads.

# Who spent more on ads?

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'xyz_campaign_id' ,y='Spent', data=df)

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Spent', data=df)

It is explanatory that facebook spent more, as it displays more ads than other companies. However, xyz companies and facebook both has the highest expenditure on ads of as much as 630 

# Which campagin performed the best?

In [None]:
plt.figure(figsize=(8,4))
sns.scatterplot(x = 'fb_campaign_id' ,y='Approved_Conversion', data=df)

Facebook Campaign id lying between 140k - 150k performed the best by selling the product to as many as 21 people, which is the highest

In [None]:
df['xyz_campaign_id'].unique()

Other companies account for only three campaign ads 

In [None]:
plt.figure(figsize=(8,4))
sns.swarmplot(x = 'Approved_Conversion' ,y='xyz_campaign_id', data=df)

xyz company id 1178 perfomed the best by selling the product to as many as 21 people


# Basic Statistics


*     25% of the people have clicked on at least one ad
*     50% of the people have clicked on at least 8 ads
*     75% of the people have clicked on atleast ads equivalent to 38
*     Maximum number of clicks are 421
*     Maximum amount spent on ads is 639
*     Maximum 60 number of people who enquired about the product after seeing the ad
*     Maximum 21 number of people who bought the product after seeing the ad


# Noteworthy Summary



    The more impression of ads, the more likely ads will be clicked

    Most amount of sales were covered by ads lying in the interest between 0-26

    The highest amount of money(639) spent by a company to display ads lies in the category of interests: 10, 15, 16, 27, 28, 29 and 63

    It shows that campaign ads from facebook and other companies had more female audience than male

    With respect to Gender
        Women clicks more on the ads than men
        Men tends to enquire more about the product than women
        More Men tends to buy product after enquiring than women

    With respect to Age
        The age group of 35-39 showed least participation on clicking ads, enquiring about or buying the products
        The age group of 45-49 showed highest activity of clicking on ads
        The age group of 30-34 enquired about the product and bought the product more as compared to the rest of the age groups


# Insightful Questions



   *Who did better in sales?*

    Overall, facebook had more sales as compared to other companies

   *Who got more clicks?*

    Both the companies got highest amount clicks from age group 45-49 and gender female . Clearly, facebook ads got more clicks.

   *Whose ads were shown more?*

    Facebook. This explains the sales and clicks. The ads by xyz companies were displayed relatively fewer times than facebook ads.

   *Who spent more on ads?*

    It is explanatory that facebook spent more, as it displays more ads than other companies. However, xyz companies and facebook both has the highest expenditure on ads of as much as 630

   *Which campaign performed the best?*

    For Facebook, Campaign id lying between 140k - 150k performed the best by selling the product to as many as 21 people, which is the highest. For the rest of the companies(xyz), company id 1178 perfomed the best by selling the product to as many as 21 people
