# A/B Testing towards two marketing strategies

We analyzing two marketing strategies to choose which one the best strategy that can more efficient and effective to convert more traffic intol sales. After analyzing we can choose the best one for future marketing. 
For this project I have a dataset that i downloaded from https://statso.io/a-b-testing-case-study/ . 
So let's start..

In [1]:
## Import Python libraries

import pandas as pd
import datetime
import matplotlib.pyplot as plot
import seaborn as sns
import numpy as np
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"


### Import Data

In [2]:
control_data = pd.read_csv("control_group.csv", sep = ";")
test_data = pd.read_csv("test_group.csv", sep = ";")

In [3]:
# Show Control Data head

print(control_data.head())

      Campaign Name       Date  Spend [USD]  # of Impressions     Reach  \
0  Control Campaign  1.08.2019         2280           82702.0   56930.0   
1  Control Campaign  2.08.2019         1757          121040.0  102513.0   
2  Control Campaign  3.08.2019         2343          131711.0  110862.0   
3  Control Campaign  4.08.2019         1940           72878.0   61235.0   
4  Control Campaign  5.08.2019         1835               NaN       NaN   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               8110.0         2033.0             1841.0            1219.0   
2               6508.0         1737.0             1549.0            1134.0   
3               3065.0         1042.0              982.0            1183.0   
4                  NaN            NaN                NaN               NaN   

   # of Purchase  
0          618.0  
1          511.0  
2          372.0  
3   

In [4]:
# Show test_data Head

print(test_data.head())

   Campaign Name       Date  Spend [USD]  # of Impressions  Reach  \
0  Test Campaign  1.08.2019         3008             39550  35820   
1  Test Campaign  2.08.2019         2542            100719  91236   
2  Test Campaign  3.08.2019         2365             70263  45198   
3  Test Campaign  4.08.2019         2710             78451  25937   
4  Test Campaign  5.08.2019         2297            114295  95138   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0                 3038           1946               1069               894   
1                 4657           2359               1548               879   
2                 7885           2572               2367              1268   
3                 4216           2216               1437               566   
4                 5863           2106                858               956   

   # of Purchase  
0            255  
1            677  
2            578  
3            340  
4            768  


### Data Preparation

In [5]:
# Check if the dataset have null values or not

print(control_data.isnull().sum())

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       1
Reach                  1
# of Website Clicks    1
# of Searches          1
# of View Content      1
# of Add to Cart       1
# of Purchase          1
dtype: int64


In [6]:
print(test_data.isnull().sum())

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       0
Reach                  0
# of Website Clicks    0
# of Searches          0
# of View Content      0
# of Add to Cart       0
# of Purchase          0
dtype: int64


From the result above, we can see there are 7 missing values/null in control_data

In [7]:
# Fill the missing values/null by mean

control_data['# of Impressions'].fillna(value = control_data['# of Impressions'].mean(), 
                                      inplace = True)
control_data['Reach'].fillna(value = control_data['Reach'].mean(), 
                                      inplace = True)
control_data['# of Website Clicks'].fillna(value = control_data['# of Website Clicks'].mean(),
                                      inplace = True)
control_data['# of Searches'].fillna(value = control_data['# of Searches'].mean(),
                                    inplace = True)
control_data['# of View Content'].fillna(value = control_data['# of View Content'].mean(),
                                        inplace = True)
control_data['# of Add to Cart'].fillna(value = control_data['# of Add to Cart'].mean(),
                                       inplace = True)
control_data['# of Purchase'].fillna(value = control_data['# of Purchase'].mean(),
                                    inplace = True)

In [9]:
# create new dataset by merging control_data and test_data

merge_data = control_data.merge(test_data,
                               how="outer").sort_values(["Date"])

merge_data = merge_data.reset_index(drop=True)
print(merge_data.head())

      Campaign Name        Date  Spend [USD]  # of Impressions    Reach  \
0  Control Campaign   1.08.2019         2280           82702.0  56930.0   
1     Test Campaign   1.08.2019         3008           39550.0  35820.0   
2     Test Campaign  10.08.2019         2790           95054.0  79632.0   
3  Control Campaign  10.08.2019         2149          117624.0  91257.0   
4     Test Campaign  11.08.2019         2420           83633.0  71286.0   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               3038.0         1946.0             1069.0             894.0   
2               8125.0         2312.0             1804.0             424.0   
3               2277.0         2475.0             1984.0            1629.0   
4               3750.0         2893.0             2617.0            1075.0   

   # of Purchase  
0          618.0  
1          255.0  
2          275.0  
3   



In [11]:
## Check if the dataset has and equal number 

print(merge_data["Campaign Name"].value_counts())

Control Campaign    30
Test Campaign       30
Name: Campaign Name, dtype: int64


As we can see the dataset has the same total for Campaign Name.

### Find the best Marketing Strategy by A/B Testing

In [14]:
figure = px.scatter(data_frame = merge_data, 
                    x="# of Impressions",
                    y="Spend [USD]", 
                    size="Spend [USD]", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

From the result above, the "Control Campaign" has more impressions according to the amount spent. For example, we can see Test Campaign spent 3000 USD then get impression about 39.55k,  while "Control Campaign" spent 2939 USD and get 105.705k impressions.

In [15]:
# Number of searches performed on the website from both campaigns

label = ["Total Searches from Control Campaign", 
         "Total Searches from Test Campaign"]
counts = [sum(control_data["# of Searches"]), 
          sum(test_data["# of Searches"])]
colors = ['#9BBFE0',' #E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Searches')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

From the result above, the Test campaign have more searches on the website than total searches from Control Campaign. 

In [16]:
# Number of website clicks from control campaign and test campaign

label = ["Website Clicks from Control Campaign", 
         "Website Clicks from Test Campaign"]
counts = [sum(control_data["# of Website Clicks"]), 
          sum(test_data["# of Website Clicks"])]
colors = ['9BBFE0','#E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Website Clicks')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

From the resulted above Test Campaign have more total website clicks than total website click from Control Campaign.

In [17]:
# Number of total content view

label = ["Content Viewed from Control Campaign", 
         "Content Viewed from Test Campaign"]
counts = [sum(control_data["# of View Content"]), 
          sum(test_data["# of View Content"])]
colors = ['#9BBFE0','#E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Content Viewed')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

Total Content View of Control Campaign is higher then the Test Campaign. 

In [18]:
# Total products added to charts from both

label = ["Products Added to Cart from Control Campaign", 
         "Products Added to Cart from Test Campaign"]
counts = [sum(control_data["# of Add to Cart"]), 
          sum(test_data["# of Add to Cart"])]
colors = ['#9BBFE0','#E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Added to Cart')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

From the resulted above we can see that even Control Campaign have low website clicks, but  Control Campaign have more total Product Added to Cart than total Added to Cart from Test Campaign.

In [20]:
# Total amount spent

abel = ["Amount Spent in Control Campaign", 
         "Amount Spent in Test Campaign"]
counts = [sum(control_data["Spend [USD]"]), 
          sum(test_data["Spend [USD]"])]
colors = ['#9BBFE0','#E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Amount Spent')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

total amount of Test Campaign is higher then Control Campaign. But Control Campaign resulted in more content view and more products in the cart, it means control campaign is more efficient than the test campaign. 

In [21]:
# Number of Purchase

label = ["Purchases Made by Control Campaign", 
         "Purchases Made by Test Campaign"]
counts = [sum(control_data["# of Purchase"]), 
          sum(test_data["# of Purchase"])]
colors = ['#9BBFE0','#E8A09A']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Control Vs Test: Purchases')
fig.update_traces(hoverinfo='label+percent', textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, 
                              line=dict(color='black', width=3)))
fig.show()

There is a little bit difference about 0.5% between total purchases of Control Campaign and Test Campaign. 

In [23]:
# relationship between number of website clicks and content view

figure = px.scatter(data_frame = merge_data, 
                    x="# of View Content",
                    y="# of Website Clicks", 
                    size="# of Website Clicks", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

As we can see Test Campaign has more website clicks, but the engagement from website clicks is higher in the control campaign.

In [24]:
# Relationship between content view and total products add to cart

figure = px.scatter(data_frame = merge_data, 
                    x="# of Add to Cart",
                    y="# of View Content", 
                    size="# of Add to Cart", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

And here it is, Control Campaign is superior in this case. 

In [25]:
# Relationship between number of products added to cart and number of sales

figure = px.scatter(data_frame = merge_data, 
                    x="# of Purchase",
                    y="# of Add to Cart", 
                    size="# of Purchase", 
                    color= "Campaign Name", 
                    trendline="ols")
figure.show()

as we can see control campaign hase more sales and more products added to cart, but test campaign has higher conversation rate

## Conclusion

From the above A/B tests, we found that the control campaign resulted in more sales and engagement from the visitors. More products were viewed from the control campaign, resulting in more products in the cart and more sales. But the conversation rate of products in the cart is higher in the test campaign. The test campaign resulted in more sales according to the products viewed and added to the cart. And the control campaign results in more sales overall. So, the Test campaign can be used to market a specific product to a specific audience, and the Control campaign can be used to market multiple products to a wider audience.