# A/B TESTING USING PYTHON

In [29]:
# IMPORT NECESSARIES LIBRARIES
import pandas as pd
import datetime
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as exp
import plotly.io as io
io.templates.default = "plotly_white"

In [30]:
#READ DATA SET
control_data = pd.read_csv('/content/control_group.csv',sep = ';')
test_data = pd.read_csv('/content/test_group.csv',sep = ';')

In [31]:
control_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [32]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


## **DATA PREPARATION**

The data set having some common errors in column names, let's give new names before moving forward

In [33]:
control_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach", "Website Clicks",
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]

test_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach", "Website Clicks",
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]


now lets see how the columns looks a like

In [34]:
control_data.head()

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [35]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


Now lets see the dataset have null values or not

In [36]:
control_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    1
Reach                    1
Website Clicks           1
Searches Received        1
Content Viewed           1
Added to Cart            1
Purchases                1
dtype: int64

In [37]:
test_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

In [38]:
control_data["Number of Impressions"].fillna(value=control_data["Number of Impressions"].mean(),
                                             inplace=True)
control_data["Reach"].fillna(value=control_data["Reach"].mean(),
                                             inplace=True)
control_data["Website Clicks"].fillna(value=control_data["Website Clicks"].mean(),
                                             inplace=True)
control_data["Searches Received"].fillna(value=control_data["Searches Received"].mean(),
                                             inplace=True)
control_data["Content Viewed"].fillna(value=control_data["Content Viewed"].mean(),
                                             inplace=True)
control_data["Added to Cart"].fillna(value=control_data["Added to Cart"].mean(),
                                             inplace=True)
control_data["Purchases"].fillna(value=control_data["Purchases"].mean(),
                                             inplace=True)

Now let's see is there any null values in control data

In [39]:
control_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

Now let's merge two data to create a new dataset

In [40]:
new_data = control_data.merge(test_data,how = 'outer').sort_values(['Date'])
new_data = new_data.reset_index(drop = True)
new_data.head()


You are merging on int and float columns where the float values are not equal to their int representation.



Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Test Campaign,1.08.2019,3008,39550.0,35820.0,3038.0,1946.0,1069.0,894.0,255.0
2,Test Campaign,10.08.2019,2790,95054.0,79632.0,8125.0,2312.0,1804.0,424.0,275.0
3,Control Campaign,10.08.2019,2149,117624.0,91257.0,2277.0,2475.0,1984.0,1629.0,734.0
4,Test Campaign,11.08.2019,2420,83633.0,71286.0,3750.0,2893.0,2617.0,1075.0,668.0


Before moving forward, have a look if the dataset has equal number of samples

In [41]:
new_data['Campaign Name'].value_counts()

Control Campaign    30
Test Campaign       30
Name: Campaign Name, dtype: int64

The dataset has 30 campaign each lets start with A/B Testing

### **A/B TESTING TO FIND THE BEST MARKETING STRATEGY**

To get started with A/B testing, first analyze the relationship between the number of impressions & amount spent we got from both campaigns.

In [42]:
fg = exp.scatter(data_frame = new_data,x = 'Number of Impressions', y = 'Amount Spent', size = 'Amount Spent', color = 'Campaign Name', trendline = 'ols')
fg.show()

The control campaign resulted in more impression according to the amount spent on both campaigns.

In [46]:
label = ['Total searches from control campaign','Total searches from Test Campaign']
count = [sum(control_data['Searches Received']),sum(test_data['Searches Received'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Searches')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

The test campaign resulted in more search on websites

Now let's have a walk at the number of website clicks from both campaigns

In [47]:
label = ['Number of website clicks from control campaign','Number of website clicks from Test Campaign']
count = [sum(control_data['Website Clicks']),sum(test_data['Website Clicks'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Website clicks')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

The test campaign resulted is more website clicks

Now let’s have a look at the amount of content viewed after reaching the website from both campaigns

In [48]:
label = ['Content Viewed from control campaign','Content Viewed from Test Campaign']
count = [sum(control_data['Content Viewed']),sum(test_data['Content Viewed'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Content Viewed')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

the people of the control campaign viewed more content than the test campaign. there is not much difference, as the website clicks of the control campaign were low, its engagement on the website is higher than the test campaign


Now let’s have a look at the added to cart after reaching the content viewed from both campaigns

In [49]:
label = ['Added to Cart from control campaign','Added to Cart from Test Campaign']
count = [sum(control_data['Added to Cart']),sum(test_data['Added to Cart'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Added to Cart')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

Despite low website clicks more products were added to the cart from the control campaign. Now let’s have a look at the Purchases on both campaigns

In [50]:
label = ['audience Purchases from control campaign', 'audience Purchases from Test Campaign']
count = [sum(control_data['Purchases']),sum(test_data['Purchases'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Purchases')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

There’s only a slight difference in the purchases made from both ad campaigns.

Now let’s have a look at the amount spent on both campaigns



In [52]:
label = ['audience Amount Spent	 from control campaign', 'audience Amount Spent	 from Test Campaign']
count = [sum(control_data['Amount Spent']),sum(test_data['Amount Spent'])]
colors = ['Red','Blue']
fig = go.Figure(data=[go.Pie(labels=label,values=count)])
fig.update_layout(title_text='Control vs Test: Amount Spent	')
fig.update_traces(hoverinfo='label+percent',textinfo = 'value',textfont_size = 30,marker = dict(colors=colors,line = dict(color = 'black',width = 2)))

As the Control campaign resulted in more sales in less amount spent on marketing, the control campaign wins here!

Now let’s analyze some metrics to find out which ad campaign converts more.let's have a look at the relationship between the number of website clicks and content viewed from both campaigns

In [54]:
fg = exp.scatter(data_frame = new_data,x = 'Content Viewed', y = 'Website Clicks', size = 'Website Clicks', color = 'Campaign Name', trendline = 'ols')
fg.show()

The website clicks are higher in the test campaign, but the engagement from website clicks is higher in the control campaign. So the control campaign come first in

Now will analyze the relationship between the amount of content viewed and the number of products added to the cart from both campaigns

In [57]:
fg = exp.scatter(data_frame = new_data,x = 'Added to Cart', y = 'Content Viewed', size = 'Added to Cart', color = 'Campaign Name', trendline = 'ols')
fg.show()

Again, the control campaign comes first in Now let’s have a look at the relationship between the number of products added to the cart and the number of sales from both campaigns

In [58]:
fg = exp.scatter(data_frame = new_data,x = 'Purchases', y = 'Added to Cart', size = 'Purchases', color = 'Campaign Name', trendline = 'ols')
fg.show()

the conversation rate of the test campaign is higher

## **CONCLUSION:**
According to the A/B tests mentioned earlier, the control campaign resulted in more sales and engagement from the visitors. The control campaign saw an increase in the number of products seen, resulting in an increase in the number of products in the cart and an increase in sales. The test campaign has a higher conversation rate among the products in the cart. The test campaign resulted in more sales based on the products that were viewed and added to the cart. The control campaign has a positive impact on overall sales. So, the Test campaign can be used to market a specific product to a specific audience, and the Control campaign can be used to market multiple products to a wider audience.