## Data Processing

In [155]:
# first we import the libraries

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

In [156]:
# Load the data

control_data = pd.read_csv('control_group.csv', sep=";")
test_data = pd.read_csv('test_group.csv', sep=";")

# Display basic information about the datasets

print("Control Data Info:")
print(control_data.info())

print("\nTest Data Info:")
print(test_data.info())


Control Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Campaign Name        30 non-null     object 
 1   Date                 30 non-null     object 
 2   Spend [USD]          30 non-null     int64  
 3   # of Impressions     29 non-null     float64
 4   Reach                29 non-null     float64
 5   # of Website Clicks  29 non-null     float64
 6   # of Searches        29 non-null     float64
 7   # of View Content    29 non-null     float64
 8   # of Add to Cart     29 non-null     float64
 9   # of Purchase        29 non-null     float64
dtypes: float64(7), int64(1), object(2)
memory usage: 2.5+ KB
None

Test Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  

In [157]:
# Display the first few rows for both campaign data
control_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [158]:
test_data.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


In [159]:
# Check for nulls
control_data.isna().sum()

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       1
Reach                  1
# of Website Clicks    1
# of Searches          1
# of View Content      1
# of Add to Cart       1
# of Purchase          1
dtype: int64

In [160]:
test_data.isna().sum()

Campaign Name          0
Date                   0
Spend [USD]            0
# of Impressions       0
Reach                  0
# of Website Clicks    0
# of Searches          0
# of View Content      0
# of Add to Cart       0
# of Purchase          0
dtype: int64

In [161]:
# Rename the columns for more readability
control_data.columns = ["Campaign Name", "Date", "Amount Spent", 
                        "Number of Impressions", "Reach", "Website Clicks", 
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]


In [162]:
test_data.columns = ["Campaign Name", "Date", "Amount Spent", 
                        "Number of Impressions", "Reach", "Website Clicks", 
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]

In [163]:
# Fill the empty rows with the mean of the data

control_data["Number of Impressions"].fillna(value=control_data["Number of Impressions"].mean(),inplace=True)
control_data["Reach"].fillna(value=control_data["Reach"].mean(),inplace=True)
control_data["Website Clicks"].fillna(value=control_data["Website Clicks"].mean(),inplace=True)
control_data["Searches Received"].fillna(value=control_data["Searches Received"].mean(),inplace=True)
control_data["Added to Cart"].fillna(value=control_data["Added to Cart"].mean(),inplace=True)
control_data["Purchases"].fillna(value=control_data["Purchases"].mean(),inplace=True)
control_data["Content Viewed"].fillna(value=control_data["Content Viewed"].mean(),inplace=True)

In [164]:
control_data.isna().sum()

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64

In [165]:
# Add a new column to indicate the group
control_data['Group'] = 'Control'
test_data['Group'] = 'Test'

In [166]:
# Concatenate the datasets
df = pd.concat([control_data, test_data], ignore_index=True)
df.head()

Unnamed: 0,Campaign Name,Date,Amount Spent,Number of Impressions,Reach,Website Clicks,Searches Received,Content Viewed,Added to Cart,Purchases,Group
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0,Control
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0,Control
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0,Control
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0,Control
4,Control Campaign,5.08.2019,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103,Control


In [167]:

# find the sum of all records and group into a pivot-table like structure
result_df = df.groupby("Group").agg({
    "Purchases": "sum",
    "Number of Impressions": "sum",
    "Website Clicks": "sum",
    "Content Viewed": "sum",
    "Added to Cart": "sum"
}).reset_index()
result_df

Unnamed: 0,Group,Purchases,Number of Impressions,Website Clicks,Content Viewed,Added to Cart
0,Control,15683.793103,3286793.0,159623.793103,58313.793103,39000.0
1,Test,15637.0,2237544.0,180970.0,55740.0,26446.0


# Analysis

Is there a relationship between amount spent and number of impression?

In [168]:
import plotly.express as px


figure = px.scatter(data_frame = df, 
                    x="Number of Impressions",
                    y="Amount Spent", 
                    color= "Campaign Name",
                    trendline= 'ols')
figure.show()

What are the disparities between both campaigns?

In [169]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Define colors
green_color = "#2ecc71"
purple_color = "#9b59b6"


specs = [[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}], 
         [{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]]

fig = make_subplots(rows=2, cols=3, specs=specs, subplot_titles=['Number of Impressions', 'Website Clicks', "Content Viewed",
                                                                 "Added to Cart", "Amount Spent", "Reach"])

# Data values
labels = ["Control Campaign", "Test Campaign"]
values = [sum(control_data["Number of Impressions"]), sum(test_data["Number of Impressions"])]
values1 = [sum(control_data["Website Clicks"]), sum(test_data["Website Clicks"])]
values2 = [sum(control_data["Content Viewed"]), sum(test_data["Content Viewed"])]
values3 = [sum(control_data["Added to Cart"]), sum(test_data["Added to Cart"])]
values4 = [sum(control_data["Amount Spent"]), sum(test_data["Amount Spent"])]
values5 = [sum(control_data["Reach"]), sum(test_data["Reach"])]

# Add pie charts
fig.add_trace(go.Pie(labels=labels, values=values, marker=dict(colors=[green_color, purple_color])),
              1, 1)
fig.add_trace(go.Pie(labels=labels, values=values1, marker=dict(colors=[green_color, purple_color])),
              1, 2)
fig.add_trace(go.Pie(labels=labels, values=values2, marker=dict(colors=[green_color, purple_color])),
              1, 3)
fig.add_trace(go.Pie(labels=labels, values=values3, marker=dict(colors=[green_color, purple_color])),
              2, 1)
fig.add_trace(go.Pie(labels=labels, values=values4, marker=dict(colors=[green_color, purple_color])),
              2, 2)
fig.add_trace(go.Pie(labels=labels, values=values5, marker=dict(colors=[green_color, purple_color])),
              2, 3)

# Update layout
fig.update_layout(showlegend=True, width=700, height=400, title_x=0.5,
                  margin=dict(l=20, r=20, t=40, b=40))
fig.show()


The test campaign got more people to search on the website. It also won when it comes to the number of clicks on the website. However, the audience of the control campaign looked at more content than the test campaign. Even though the control campaign had fewer clicks, people engaged more with it on the website.

Surprisingly, even with fewer clicks, more products were added to the cart from the control campaign. The test campaign spent more money compared to the control campaign. But considering that the control campaign led to more content views and more products added to the cart, it seems like the control campaign is more effective than the test campaign.
The difference in purchases between the two ad campaigns is only around 1%. The control campaign resulted in more sales with less money spent on marketing.


What is the relationship between the number of website clicks and content viewed and added to cart from both campaigns?

In [170]:
# create scatter plot to show the relationship between variables

color_map = {'test': 'green', 'control': 'purple'}
figure = px.scatter(data_frame = df, 
                    x="Content Viewed",
                    y="Website Clicks", 
                    color= "Group", 
                    trendline="ols",
                    title="Content Viewed vs Website Clicks",
                    color_discrete_map=color_map)
figure.show()

figure = px.scatter(data_frame = df, 
                    x="Added to Cart",
                    y="Content Viewed", 
                    color= "Group", 
                    trendline="ols",
                    title="Added to Cart vs Content Viewed",
                    color_discrete_map=color_map)
figure.show()


figure = px.scatter(data_frame=df, 
                    x="Purchases",
                    y="Added to Cart", 
                    color="Group", 
                    trendline="ols",
                    title="Purchases vs Added to Cart",
                    color_discrete_map=color_map)

figure.show()

# Hypothesis

H0: M1=M2 The null hypothesis (H0) is that there's no significant difference between the groups.

H1: M1!=M2 The alternative hypothesis (H1) is that the test group performs significantly better.

Perform statistical analysis using t-test and calculate the p_value

In [171]:
control_group = df[df['Group'] == 'Control']['Purchases']
test_group = df[df['Group'] == 'Test']['Purchases']

# Perform t-test
t_stat, p_value = stats.ttest_ind(control_group, test_group)

print(f"T-statitic: {t_stat}")
print(f"P-value: {p_value}")

if p_value < 0.05: 
    print("\nThe Test Campaign significantly outperforms the Control Campaign.")
    print("\nReject the null hypothesis.")
else:
    print("\nThere is no significant difference between both Campaigns.")
    print("Null Hypothesis is not rejected")

T-statitic: 0.03066909523750146
P-value: 0.9756387309702421

There is no significant difference between both Campaigns.
Null Hypothesis is not rejected


According to AB test results, it can be seen there is no valuable differences between two methods. To get clear results, the number of observation can be increased. Other KPIs including conversion rate and click through rate can be taken into consideration.

Click-Through Rate (CTR) is the percentage of individuals who view a web page (impressions) and then click on a specific advertisement that appears on that page. It measures how successful an ad has been in capturing users' attention. The higher the click-through rate, the more successful the ad has been in generating interest.

Conversion Rate is the ratio of users who take a desired action (e.g., making a purchase) to the total number of users who clicked on the ad.

CPC (cost per click) is a metric that determines how much advertisers pay for the ads they place on websites or social media, based on the number of clicks the ad receives. CPC is important for marketers to consider, since it measures the price is for a brand's paid advertising campaigns

Return On Investment (ROI) provides an overview of the effectiveness of the advertising campaign.

In [172]:
df['CTR'] = ((df['Website Clicks'] / df['Number of Impressions']) * 100)
df['Conversion Rate'] = (df['Purchases'] / df['Website Clicks']) * 100
df['CPC'] = df['Amount Spent'] / df['Website Clicks']
df['ROI'] = ((df['Purchases'] - df['Amount Spent']) / df['Amount Spent']) * 100

df[['CTR', 'Conversion Rate','CPC','ROI']].head()

Unnamed: 0,CTR,Conversion Rate,CPC,ROI
0,8.483471,8.808438,0.324971,-72.894737
1,6.700264,6.300863,0.216646,-70.916335
2,4.941121,5.716042,0.360018,-84.122919
3,4.205659,11.092985,0.632953,-82.474227
4,4.856521,9.825473,0.344873,-71.509913


The CTR values range from approximately 4.20% to 8.48%. Higher CTR values generally indicate a higher proportion of clicks relative to the number of impressions.

Conversion Rate ranges from around 5.30% to 11.09%. A higher Conversion Rate is generally desirable, as it indicates a higher percentage of users completing the desired action.

The CPC values range from approximately 0.22 to 0.63. Lower CPC values suggest more cost-effective advertising, as it costs less on average for each click.

The ROI values are negative, indicating a negative return on investment. This suggests that the costs incurred (negative ROI) outweigh the benefits. A positive ROI is generally desired, indicating that the investment is profitable.

In [173]:
# define the KPIs
df_KPIs=df.groupby("Group").agg({"CTR":"mean","Conversion Rate":"mean","CPC":"mean","ROI":"mean"})
df_KPIs

Unnamed: 0_level_0,CTR,Conversion Rate,CPC,ROI
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Control,5.087893,11.422146,0.489907,-76.619613
Test,10.24226,9.231182,0.468718,-79.342253


## Conclusion and Recommendation

In conclusion, the A/B testing analysis reveals a non-significant difference between the two campaigns, as indicated by a p-value less than 0.05. This implies that the null hypothesis is not rejected, reinforcing the similarity between the test and control groups.

Examining specific metrics, both campaigns demonstrate comparable conversion rates, while the test group exhibits a higher Click-Through Rate (CTR), signaling heightened user engagement with ads. On the other hand, the control group boasts a slightly superior Conversion Rate, indicating a greater likelihood of purchases post-click.

Furthermore, the test campaign showcases a lower Cost Per Click (CPC), suggesting cost-effectiveness in generating more clicks and potential leads within the budget. This, in turn, contributes to a higher anticipated return on investment (ROI), promising profitability by surpassing expenses.

Delving into sales and engagement outcomes, the control campaign emerges as the frontrunner. It yields more product views, resulting in a higher number of items in the cart and, consequently, more sales. However, the test campaign excels in the conversion rate of products in the cart, making it particularly effective for generating sales based on viewed and carted products.

In light of these findings, a strategic approach can be adopted. The test campaign proves advantageous for marketing specific products to a targeted audience, leveraging its high conversion rate. Meanwhile, the control campaign is well-suited for marketing multiple products to a broader audience, harnessing its ability to drive overall sales. These insights provide valuable guidance for optimizing future advertising strategies.