
The watch company is embarking on a study to enhance its sales by investigating a new watch screen design. They've initiated two types of campaigns: a control campaign, likely featuring the existing watch screen design (A), and a test campaign with the new design (B).

# **Key Features of the Dataset:**
Campaign Name: Identifies each advertising campaign.
Date: Records the date of each entry, adding a temporal dimension.
Spend: Reflects the investment in each campaign (in dollars).
Impressions and Reach: Measure visibility and unique impressions achieved by the ads.
Website Clicks, Searches, View Content, Add to Cart, Purchase: Capture user interactions during the campaigns, providing insights into engagement and conversion.

# **Hypothesis:**
Our project revolves around a clear hypothesis:

**Null Hypothesis (H0)**: The new watch screen design does not significantly improve marketing campaign performance, measured by user engagement and sales metrics, compared to the existing design.

**Alternative Hypothesis (H1):** The new watch screen design surpasses the existing design, leading to a noticeable enhancement in user engagement and sales metrics in marketing campaigns.
<br />
<br />
To ascertain the more effective watch screen design for enhancing sales and engagement, conducting an A/B test is necessary. The resulting analysis will guide the company in determining how to showcase its watches, thus shaping future marketing strategies.

In [1]:
from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1sagNYoHyI7His5AWhfv7YVUyglK4hDw_',
                                    dest_path='/AB new watch dataset/control_group')

gdd.download_file_from_google_drive(file_id='19dSjrumw9k4fDzezIyY3l0eeRNH63Auz',
                                    dest_path='/AB new watch dataset/test_group')

Downloading 1sagNYoHyI7His5AWhfv7YVUyglK4hDw_ into /AB new watch dataset/control_group... Done.
Downloading 19dSjrumw9k4fDzezIyY3l0eeRNH63Auz into /AB new watch dataset/test_group... Done.


# Load Control Campaign Data

In [3]:
import pandas as pd
control_df = pd.read_csv('/AB new watch dataset/control_group', sep=';')
control_df.head(5)

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


# Load Control Campaign Data

In [5]:
test_df = pd.read_csv('/AB new watch dataset/test_group', sep=';')
test_df.head(5)

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


# Simplify Column Names for Control Campaign Data

In [7]:
control_df.columns = ['Campaign Name', 'Date', 'Amount Spent', 'Impressions', 'Reach', 'Number of Clicks', 'Number of Searches',
                    'Number of views', 'Number Added to cart', 'Purchase Number']
control_df.head(5)

Unnamed: 0,Campaign Name,Date,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


# Simplify Column Names for Test Campaign Data

In [8]:
test_df.columns = ['Campaign Name', 'Date', 'Amount Spent', 'Impressions', 'Reach', 'Number of Clicks', 'Number of Searches',
                    'Number of views', 'Number Added to cart', 'Purchase Number']
test_df.head(5)

Unnamed: 0,Campaign Name,Date,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768


# Check for Missing Values in Control Campaign Data

In [10]:
null_sum_controldf = control_df.isna().sum()
null_sum_controldf

Campaign Name           0
Date                    0
Amount Spent            0
Impressions             1
Reach                   1
Number of Clicks        1
Number of Searches      1
Number of views         1
Number Added to cart    1
Purchase Number         1
dtype: int64

# Handle Missing Values in Control Campaign Data
We're filling in missing values using the average for each column

In [11]:
control_df['Reach'].fillna(control_df['Reach'].mean(), inplace=True)
control_df['Impressions'].fillna(control_df['Impressions'].mean(), inplace=True)
control_df['Number of Clicks'].fillna(control_df['Number of Clicks'].mean(), inplace=True)
control_df['Number of Searches'].fillna(control_df['Number of Searches'].mean(), inplace=True)
control_df['Number of views'].fillna(control_df['Number of views'].mean(), inplace=True)
control_df['Number Added to cart'].fillna(control_df['Number Added to cart'].mean(), inplace=True)
control_df['Purchase Number'].fillna(control_df['Purchase Number'].mean(), inplace=True)


# Check for Missing Values in Test Campaign Data

In [12]:
null_sum_testdf = test_df.isnull().sum()
null_sum_testdf.head(5)

Campaign Name    0
Date             0
Amount Spent     0
Impressions      0
Reach            0
dtype: int64

# Understand Control Campaign Data Numbers
We want to find out important details like average spending, ad reach, clicks, and more.

In [13]:
control_describe = control_df.describe()
control_describe

Unnamed: 0,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2288.433333,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103
std,367.334451,21311.695472,21452.627592,1726.803732,851.025795,764.021907,400.371207,181.810508
min,1757.0,71274.0,42859.0,2277.0,1001.0,848.0,442.0,222.0
25%,1945.5,95191.25,75300.25,4122.25,1629.25,1249.0,942.5,375.5
50%,2299.5,112368.0,91418.0,5272.396552,2340.0,1979.5,1319.5,506.0
75%,2532.0,121259.0,101958.75,6609.5,2655.0,2360.5,1638.0,663.25
max,3083.0,145248.0,127852.0,8137.0,4891.0,4219.0,1913.0,800.0


# Understand Test Campaign Data Numbers

In [14]:
test_describe = test_df.describe()
test_describe

Unnamed: 0,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2563.066667,74584.8,53491.566667,6032.333333,2418.966667,1858.0,881.533333,521.233333
std,348.687681,32121.377422,28795.775752,1708.567263,388.742312,597.654669,347.584248,211.047745
min,1968.0,22521.0,10598.0,3038.0,1854.0,858.0,278.0,238.0
25%,2324.5,47541.25,31516.25,4407.0,2043.0,1320.0,582.5,298.0
50%,2584.0,68853.5,44219.5,6242.5,2395.5,1881.0,974.0,500.0
75%,2836.25,99500.0,78778.75,7604.75,2801.25,2412.0,1148.5,701.0
max,3112.0,133771.0,109834.0,8264.0,2978.0,2801.0,1391.0,890.0


# Understand Purchase Number Distribution

We conduct a Normality check on this metric for both groups and jot down the results.

In [16]:
from scipy.stats import shapiro

shapiro_results = pd.DataFrame(columns=['Group', 'Test Statistic', 'P-value'])

test_stat_control, p_value_control = shapiro(control_df['Purchase Number'])
shapiro_results = pd.concat([shapiro_results, pd.DataFrame({'Group': 'Control', 'Test Statistic': test_stat_control,
                                                           'P-value': p_value_control}, index=[0])], ignore_index=True)

test_stat_test, p_value_test = shapiro(test_df['Purchase Number'])
shapiro_results = pd.concat([shapiro_results, pd.DataFrame({'Group': 'Test', 'Test Statistic': test_stat_test,
                                                           'P-value': p_value_test}, index=[1])], ignore_index=True)

shapiro_results

Unnamed: 0,Group,Test Statistic,P-value
0,Control,0.943273,0.111445
1,Test,0.918189,0.024078


When the p-value is higher than 0.05 (common significance level), it means we don't have enough evidence to say the data significantly deviates from a normal distribution. On the other hand, if the p-value is lower than 0.05, it suggests the data significantly deviates from a normal distribution.

# Comparing Purchase Numbers with a T-Test

In [17]:
from scipy import stats

t_stat, p_value = stats.ttest_ind(control_df['Purchase Number'], test_df['Purchase Number'])
t_stat, p_value

(0.03066909523750146, 0.9756387309702421)

# Compare Cost per Conversion in Control and Test Campaigns

We've figured out the 'Cost per Conversion' for both control and test groups. Then we need to determine the average cost for each. Let's see which campaign is more budget-friendly for turning clicks into purchases.

In [18]:
control_df['Cost per Conversion'] = control_df['Amount Spent'] / control_df['Purchase Number']

test_df['Cost per Conversion'] = test_df['Amount Spent'] / test_df['Purchase Number']

average_cost_control = control_df['Cost per Conversion'].mean()
average_cost_test = test_df['Cost per Conversion'].mean()

average_cost_control,average_cost_test

(5.000927131911764, 5.899589404475941)

# Merge Datasets for In-Depth Analysis

 It seems there are no significant differences between the two methods, and in terms of cost, the control group appears to perform better. To get clearer results, increasing the number of observations might be helpful. Also, considering other key performance indicators (KPIs) like **Conversion Rate**, **Click-Through Rate (CTR)**, **Cost Per Click (CPC)**, and **Return On Investment (ROI)** could give us a better overall view.

But before that, we need to merge our datasets to include these new metrics.

In [19]:
merged = pd.concat([control_df, test_df], ignore_index=True)
merged.head(5)

Unnamed: 0,Campaign Name,Date,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number,Cost per Conversion
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0,3.68932
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0,3.438356
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0,6.298387
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0,5.705882
4,Control Campaign,5.08.2019,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103,3.509993


# Enhance Dataset with CTR, Conversion Rate, CPC, and ROI

We've been asked to include additional metrics like Click-Through Rate (CTR), Conversion Rate, Cost Per Click (CPC), and Return On Investment (ROI). These metrics will give us a clearer picture of our campaign performance.

**Click-Through Rate (CTR)**: This measures how successful an ad has been in capturing users' attention. The higher the CTR, the more successful the ad has been in generating interest.

**Conversion Rate**: This is the ratio of users who take a desired action (e.g., making a purchase) to the total number of users who clicked on the ad.

**CPC (Cost Per Click)**: This metric determines how much advertisers pay for ads based on the number of clicks. It's essential for marketers to understand the price for their paid advertising campaigns.

**Return On Investment (ROI)**: This provides an overview of the effectiveness of the advertising campaign.

In [20]:
merged['CTR'] = (merged['Number of Clicks'] / merged['Impressions']) * 100
merged['Conversion Rate'] = (merged['Purchase Number'] / merged['Number of Clicks']) * 100
merged['CPC'] = merged['Amount Spent'] / merged['Number of Clicks']
merged['ROI'] = ((merged['Purchase Number'] - merged['Amount Spent']) / merged['Amount Spent']) * 100
merged.head(5)

Unnamed: 0,Campaign Name,Date,Amount Spent,Impressions,Reach,Number of Clicks,Number of Searches,Number of views,Number Added to cart,Purchase Number,Cost per Conversion,CTR,Conversion Rate,CPC,ROI
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0,3.68932,8.483471,8.808438,0.324971,-72.894737
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0,3.438356,6.700264,6.300863,0.216646,-70.916335
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0,6.298387,4.941121,5.716042,0.360018,-84.122919
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0,5.705882,4.205659,11.092985,0.632953,-82.474227
4,Control Campaign,5.08.2019,1835,109559.758621,88844.931034,5320.793103,2221.310345,1943.793103,1300.0,522.793103,3.509993,4.856521,9.825473,0.344873,-71.509913


# Compare Metrics in Control vs. Test Campaigns

Let's conduct t-tests for each metric and provide insights into whether these differences are significant.

In [21]:
metrics_to_test = ['CTR', 'Conversion Rate', 'CPC']

results_list = []

for metric in metrics_to_test:
    t_stat, p_value = stats.ttest_ind(
        merged[merged['Campaign Name'] == 'Control Campaign'][metric],
        merged[merged['Campaign Name'] == 'Test Campaign'][metric]
    )

    metric_results = pd.DataFrame({
        'Metric': [metric],
        'T-Statistic': [t_stat],
        'P-Value': [p_value]
    })

    results_list.append(metric_results)

t_test_results = pd.concat(results_list, ignore_index=True)

t_test_results

Unnamed: 0,Metric,T-Statistic,P-Value
0,CTR,-3.99625,0.000184
1,Conversion Rate,1.488079,0.142147
2,CPC,0.410837,0.682706


In statistical hypothesis testing, the p-value is a measure of the evidence against a null hypothesis. A small p-value (typically ≤ 0.05) suggests strong evidence against the null hypothesis, leading to its rejection. Conversely, a larger p-value suggests weak evidence against the null hypothesis.

# Summary

Upon analyzing the data, we found no significant differences between the two methods when using "purchase number" as the metric. Additionally, in terms of cost, the control group appears to outperform the test group. Subsequently, we explored different metrics for further investigation, revealing that the test group did not show statistical significance in improving marketing campaign performance concerning conversion rate and cost per click. Notably, the analysis indicated a negative t-statistic with a p-value smaller than 0.05 for click-through rate, suggesting a significant difference between the groups, with the test group exhibiting lower values. Overall, the test group did not demonstrate benefits across all measured metrics (purchase number, conversion rate, and cost per click); in fact, it increased costs and decreased click-through rate.