# A/B Test analysis with Python 

A/B testing, also known as split testing, is a method used in marketing, product development, and website optimization to compare two versions of something to determine which one performs better. It is a controlled experiment where you expose two groups (Group A and Group B) to different variations of a product, webpage, or other elements to see which one yields better results. The primary goal of A/B testing is to make data-driven decisions and optimize your strategies.

**Goal of A/B testing**

A/B testing is commonly used for optimizing websites, marketing campaigns, product features, and user experiences to increase conversion rates, engagement, and overall performance.

In [1]:
#importing Necessary Libraries 
import pandas as pd
import datetime
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio

So we have 2 data sets one is the normal dataset(control dataset) and other is Campaign dataset (Test dataset)

In [3]:
#lets read control group 
normal_data = pd.read_csv("control_group.csv", sep = ";")
normal_data.head(5)

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [6]:
#lets read the test campaign data set 
test_data = pd.read_csv("test_group.csv", sep = ";")
test_data.head (5)

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


We can see both the data set has some errors in column names. lets Modify it

In [40]:
#lets first cah

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Campaign Name          30 non-null     object 
 1   Date                   30 non-null     object 
 2   Amount Spent           30 non-null     int32  
 3   Number of Impressions  30 non-null     float64
 4   Reach                  30 non-null     float64
 5   Website Clicks         30 non-null     float64
 6   Searches Received      30 non-null     float64
 7   Content Viewed         30 non-null     float64
 8   Added to Cart          30 non-null     float64
 9   Purchases              30 non-null     float64
dtypes: float64(7), int32(1), object(2)
memory usage: 2.4+ KB


In [33]:
#lets first change test data column names 
test_data.columns = ["Campaign Name", "Date", "Amount Spent", 
                        "Number of Impressions", "Reach", "Website Clicks", 
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]
test_data.head(3)

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578


Columns names are changed.

Now lest check Whether the data sets are having NULL values

In [34]:
#checking NULL values in Test data
test_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

Luckily there are no NULL values in Test data / Campaign data

In [35]:
#Checking NULL values in Normal data
normal_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

But we have NULL values in Normal dataset

**Lets fill the NULL Values with Mean of their respective columns**

In [36]:
#filling NULL values with MEAN 
normal_data["Number of Impressions"].fillna( value = normal_data["Number of Impressions"].mean(), inplace = True)
normal_data["Reach"].fillna(value = normal_data["Number of Impressions"].mean(), inplace = True)
normal_data["Website Clicks"].fillna(value = normal_data["Website Clicks"].mean(), inplace = True)
normal_data["Searches Received"].fillna(value = normal_data["Searches Received"].mean(), inplace = True)
normal_data["Content Viewed"].fillna(value = normal_data["Content Viewed"].mean(), inplace = True)
normal_data["Added to Cart"].fillna(value = normal_data["Added to Cart"].mean(), inplace = True)
normal_data["Purchases"].fillna(value = normal_data["Purchases"].mean(), inplace = True)

In [37]:
#lets Check whethger they are filled or not
normal_data.isnull().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

They are filled. Lets go further

In order to compare both data sets, Lets merge them


In [44]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # to diappear some user warnings

ab_test = normal_data.merge(test_data, how = "outer").sort_values(["Date"])
ab_test = ab_test.reset_index(drop=True)
ab_test.head()

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Test Campaign,1.08.2019,3008,39550.0,35820.0,3038.0,1946.0,1069.0,894.0,255.0
2,Test Campaign,10.08.2019,2790,95054.0,79632.0,8125.0,2312.0,1804.0,424.0,275.0
3,Control Campaign,10.08.2019,2149,117624.0,91257.0,2277.0,2475.0,1984.0,1629.0,734.0
4,Test Campaign,11.08.2019,2420,83633.0,71286.0,3750.0,2893.0,2617.0,1075.0,668.0


In [113]:
ab_test['Campaign Name'] = ab_test['Campaign Name'].replace('Control Campaign', 'Normal Campaign')

ab_test["Campaign Name"].value_counts()

Campaign Name
Normal Campaign    30
Test Campaign      30
Name: count, dtype: int64

Here we have equal number of samples for each campaign. Thats because we filled the null the values with MEAN in the above steps

**Now every thing is fine in the data, Lets start A/B testing**

# A/B Testing

**1. Lets start with Comparing Number of searches received from both Campaigns**

In [114]:
data = {'Category': ['Normal Campaign', 'Test Campaign'],
        'Values': [sum(normal_data["Searches Received"]), sum(test_data["Searches Received"])]}

df = pd.DataFrame(data)

# Create a pie chart using Plotly Express
fig = px.pie(df, names='Category', values='Values', title='Normal vs Test Campaign : Searches Received')
fig.update_traces(marker=dict(line=dict(color='black', width=3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
# Show the chart
fig.show()

**Conclusion : Therefore the Test campaign resulted in more Number of Searches**
    

**2. Comparing Website Clicks from both Campaigns**

In [115]:
data = {'Category': ["Normal Campaign", "Test Campaign"],
       'values' : [sum(normal_data["Website Clicks"]), sum(test_data["Website Clicks"])]}
df = pd.DataFrame(data)

#pie chart 
fig=px.pie(df, names = "Category", values = "values", title = "Normal vs Test Campaign : Website Clicks")
fig.update_traces(marker=dict(line=dict(color = "black", width = 3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
fig.show()

**Conclusion : Therefore the Test campaign Led to more Website Clicks**

**3. Comapring Content Views in both Campaigns** 

In [116]:
data = {'Category': ["Normal Campaign", "Test Campaign"],
       'values' : [sum(normal_data["Content Viewed"]), sum(test_data["Content Viewed"])]}
df = pd.DataFrame(data)

#pie chart 
fig=px.pie(df, names = "Category", values = "values", title = "Normal vs Test Campaign : Content Viewed")
fig.update_traces(marker=dict(line=dict(color = "black", width = 3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
fig.show()

**Conclusion :**

**a) The Normal campaign's audience viewed slightly more content than the test campaign.**

**b) Despite a minimal difference, the Normal campaign exhibited higher website engagement due to lower website clicks.**

**c) The test campaign, in comparison, had lower content views and less engagement on the website.**

**4. Comapring number of products added from both campaigns**

In [117]:
data = {'Category': ["Normal Campaign", "Test Campaign"],
       'values' : [sum(normal_data["Added to Cart"]), sum(test_data["Added to Cart"])]}
df = pd.DataFrame(data)

#pie chart 
fig=px.pie(df, names = "Category", values = "values", title = "Normal vs Test Campaign : Cart Adds")
fig.update_traces(marker=dict(line=dict(color = "black", width = 3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
fig.show()

**Conclusion : In spite of having a lower number of website clicks, the control campaign managed to accumulate more products in the cart.**

**5. Comparing amount spent on both campaigns.**

In [118]:
data = {'Category': ["Normal Campaign", "Test Campaign"],
       'values' : [sum(normal_data["Amount Spent"]), sum(test_data["Amount Spent"])]}
df = pd.DataFrame(data)

#pie chart 
fig=px.pie(df, names = "Category", values = "values", title = "Normal vs Test Campaign : Amount spent")
fig.update_traces(marker=dict(line=dict(color = "black", width = 3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
fig.show()

**Conclusion :**

**The test campaign has a higher expenditure compared to the Normal campaign.**

**However, upon closer examination, we observe that the Normal campaign has generated more content views and added more products to the cart, making it the more efficient of the two campaigns.**

**6. Comapring purchases made from both campaigns**

In [119]:
data = {'Category': ["Normal Campaign", "Test Campaign"],
       'values' : [sum(normal_data["Purchases"]), sum(test_data["Purchases"])]}
df = pd.DataFrame(data)

#pie chart 
fig=px.pie(df, names = "Category", values = "values", title = "Normal vs Test Campaign : Purchases")
fig.update_traces(marker=dict(line=dict(color = "black", width = 3)))
fig.update_traces(marker=dict(colors=["lightgreen", "lightblue"]),textfont_size=30)
fig.show()

**Conclusion :**

**With only a 1% difference in purchases, the Normal campaign wins because it achieved more sales with less marketing spending.**

**7. Analyzing Conversion Rates: Website Clicks vs. Content Views**

In [120]:
figure = px.scatter(data_frame = ab_test, 
                    x="Content Viewed",
                    y="Website Clicks", 
                    size="Website Clicks", 
                    color= "Campaign Name", 
                    trendline="ols",
                   title="Website Clicks vs. Content Views")
figure.show()

**Conclusion :**

**While the test campaign records more website clicks, it's the Normal campaign that exhibits higher engagement stemming from these clicks. As a result, the Normal campaign emerges as the winner in this regard.**

**8. Content Viewed vs. Cart Additions: Both Campaigns**

In [121]:
figure = px.scatter(data_frame = ab_test, 
                    x="Added to Cart",
                    y="Content Viewed", 
                    size="Website Clicks", 
                    color= "Campaign Name", 
                    trendline="ols",
                   title="Content Viewed vs. Cart Additions")
figure.show()

**Conclusions**

**Once more, the control campaign emerges as the winner.**

**9. Cart Additions vs. Sales : Both Campaigns**

In [124]:
figure = px.scatter(data_frame = ab_test, 
                    x="Purchases",
                    y="Added to Cart", 
                    size="Website Clicks", 
                    color= "Campaign Name", 
                    trendline="ols",
                   title="Content Viewed vs. Cart Additions")
figure.show()

**Conclusions :**

**Despite the control campaign generating higher sales and more cart additions, the test campaign boasts a higher conversion rate.**

**--------------------------------------------------------------------------------------------------------------------------------------------------------**

# Conclusion

**1) The Normal Campaign achieved higher sales and visitor engagement in the A/B tests.
More products were viewed and added to the cart in the Normal Campaign, leading to increased sales.**

**2) However, the Test Campaign boasts a higher conversion rate for products added to the cart.**

**3) The Test Campaign's sales are higher relative to products viewed and cart additions.**

**4) Overall, the Normal Campaign outperforms the Test Campaign in terms of total sales.**

**5) The Test Campaign is well-suited for targeted marketing of specific products to a particular audience.**

**6) The Normal Campaign is ideal for promoting a wider range of products to a broader audience.**

****

# **------------------------------------------------------------ A project by Rathan Guntuka**