# A/B test

A/B test is a controlled experiment designed to compare two versions of a variable or treatment, often referred to as the "A" version (the control) and the "B" version (the variant), to determine which one performs better according to a predefined metric, such as conversion rate, user engagement, or sales. In this experiment, users or subjects are randomly divided into two distinct groups, ensuring that each group receives one of the versions, and the outcomes are measured and compared. The purpose of an A/B test is to identify the most effective approach or treatment, based on data-driven insights, to optimize processes, improve performance, or guide strategic decisions. The randomization of subjects helps eliminate biases and ensures that the results are statistically valid and reliable.


In this A/B test, the primary objective of the analysis is to compare the two groups in terms of the frequency and distribution of key variables, such as the number of impressions, clicks, conversions, and other relevant metrics. The goal is to determine whether the differences observed between the control and test groups are statistically significant, providing insights into the effectiveness of the marketing strategies applied to each group. This analysis uses a well-known A/B test dataset sourced from Kaggle, which includes various performance indicators across multiple campaigns, allowing for a comprehensive evaluation of the factors influencing the campaign outcomes. By applying statistical methods, such as hypothesis testing and comparison of means, we aim to validate the hypothesis and draw actionable conclusions for marketing decision-making.

In [11]:
import kagglehub
import os
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Download the Kaggle dataset
path = kagglehub.dataset_download("amirmotefaker/ab-testing-dataset")
print("Path to dataset files:", path)

# Print the list of files in the downloaded directory
files = os.listdir(path)
print("Csv files:", files)

path1 = os.path.join(path,'control_group.csv')
path2 = os.path.join(path,'test_group.csv')

control_group = pd.read_csv(path1, sep=';')
test_group = pd.read_csv(path2, sep=';')


Path to dataset files: /root/.cache/kagglehub/datasets/amirmotefaker/ab-testing-dataset/versions/1
Csv files: ['control_group.csv', 'test_group.csv']


## Data preprocessing

Data preprocessing is a crucial step in data analysis. If we don't perform proper data preprocessing, the results will be meaningless. First, I renamed each column, corrected the 'date' column, and replaced any null values with the average value.

In [12]:
control_group.head()

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,


In [13]:
# first Rename columns
columns = ['group', 'date', 'cost', 'impression_count', 'reach_count', 'click_count',
           'search_count', 'view_count', 'cart_add_count', 'purchase_count']
control_group.columns = columns
test_group.columns = columns

# Check
print(control_group.columns)
print(test_group.columns)

Index(['group', 'date', 'cost', 'impression_count', 'reach_count',
       'click_count', 'search_count', 'view_count', 'cart_add_count',
       'purchase_count'],
      dtype='object')
Index(['group', 'date', 'cost', 'impression_count', 'reach_count',
       'click_count', 'search_count', 'view_count', 'cart_add_count',
       'purchase_count'],
      dtype='object')


In [14]:
control_group['date'] = pd.to_datetime(control_group['date'], dayfirst=True)
test_group['date'] = pd.to_datetime(test_group['date'], dayfirst=True)

print('control:', control_group['date'].dtype)
print('test:', test_group['date'].dtype)

control: datetime64[ns]
test: datetime64[ns]


In [15]:
control_group.isna().sum()

Unnamed: 0,0
group,0
date,0
cost,0
impression_count,1
reach_count,1
click_count,1
search_count,1
view_count,1
cart_add_count,1
purchase_count,1


In [16]:
test_group.isna().sum()

Unnamed: 0,0
group,0
date,0
cost,0
impression_count,0
reach_count,0
click_count,0
search_count,0
view_count,0
cart_add_count,0
purchase_count,0


In [17]:
control_group['impression_count'] = control_group['impression_count'].fillna(control_group['impression_count'].mean())
control_group['reach_count'] = control_group['reach_count'].fillna(control_group['reach_count'].mean())
control_group['click_count'] = control_group['click_count'].fillna(control_group['click_count'].mean())
control_group['search_count'] = control_group['search_count'].fillna(control_group['search_count'].mean())
control_group['view_count'] = control_group['view_count'].fillna(control_group['view_count'].mean())
control_group['cart_add_count'] = control_group['cart_add_count'].fillna(control_group['cart_add_count'].mean())
control_group['purchase_count'] = control_group['purchase_count'].fillna(control_group['purchase_count'].mean())

In [18]:
control_group.isna().sum()

Unnamed: 0,0
group,0
date,0
cost,0
impression_count,0
reach_count,0
click_count,0
search_count,0
view_count,0
cart_add_count,0
purchase_count,0


## Exploratory data analysis

In this section, I visualize the sum, average, and outliers of each variable by creating box plots.

### Visualize summary_statistics

In [None]:
#Check the data types of each column in the DataFrame
print(control_group.dtypes)
print(test_group.dtypes)

In [None]:
# Since the 'group' column is of object type, drop it along with the 'date' column and then run describe
control_clean = control_group.drop(columns=['group', 'date'])
test_clean = test_group.drop(columns=['group', 'date'])

# Now, running describe() will display only numeric columns
control_summary = control_clean.describe()
test_summary = test_clean.describe()

In [None]:
control_summary

In [None]:
test_summary

In [None]:
import matplotlib.pyplot as plt

# Draw box plots for each variable in Control Group and Test Group
num_vars = len(control_clean.columns)  # Number of variables
fig, axes = plt.subplots(nrows=num_vars, ncols=1, figsize=(8, 3 * num_vars), sharex=True)

# Create box plots for each variable
for i, var in enumerate(control_clean.columns):
    # Merge the data of Control Group and Test Group to create the box plot
    data = [control_clean[var], test_clean[var]]

    axes[i].boxplot(data, labels=['Control Group', 'Test Group'], patch_artist=True)
    axes[i].set_title(f"Box Plot of {var} in Control and Test Groups")
    axes[i].set_ylabel(var)

plt.tight_layout()
plt.show()


In [None]:
# Calculate the mean for each variable, excluding 'date' and 'group'
control_means = control_clean.mean()
test_means = test_clean.mean()

# Combine the means into a DataFrame for easier iteration
comparison_means_df = pd.DataFrame({'Control Group Mean': control_means, 'Test Group Mean': test_means})

# Set up the subplots
num_vars = len(comparison_means_df)
fig, axes = plt.subplots(nrows=num_vars, ncols=1, figsize=(8, 3 * num_vars), sharex=True)

# Plot each variable in a separate subplot
for i, var in enumerate(comparison_means_df.index):
    comparison_means_df.loc[var].plot(kind='bar', ax=axes[i], color=['skyblue', 'orange'])
    axes[i].set_title(f"Mean of {var} in Control and Test Groups")
    axes[i].set_ylabel("Mean")
    axes[i].set_xticklabels(['Control Group', 'Test Group'], rotation=0)

plt.tight_layout()
plt.show()



In [None]:
# Calculate the sum for each variable, excluding 'date' and 'group'
control_sums = control_clean.sum()
test_sums = test_clean.sum()

# Combine the sums into a DataFrame for easier iteration
comparison_df = pd.DataFrame({'Control Group': control_sums, 'Test Group': test_sums})

# Set up the subplots
num_vars = len(comparison_df)
fig, axes = plt.subplots(nrows=num_vars, ncols=1, figsize=(8, 3 * num_vars), sharex=True)

# Plot each variable in a separate subplot
for i, var in enumerate(comparison_df.index):
    comparison_df.loc[var].plot(kind='bar', ax=axes[i], color=['blue', 'orange'])
    axes[i].set_title(f"Sum of {var} in Control and Test Groups")
    axes[i].set_ylabel("Sum")
    axes[i].set_xticklabels(['Control Group', 'Test Group'], rotation=0)

plt.tight_layout()
plt.show()

## Checking Normality for Each Variable

In this section, I checked the normality of each variable by performing the Shapiro-Wilk test and Levene's test for both groups. Based on the results, I found that I can perform the t-test for the view count and reach count


In [None]:
from scipy import stats

# List of variables to check for normality and equal variance
variables = ['impression_count', 'reach_count', 'click_count', 'search_count', 'view_count', 'cart_add_count', 'purchase_count','cost']

# Loop through each variable and perform the Shapiro-Wilk test for normality and Levene's test for equal variances
for var in variables:
    control_data = control_group[var]
    test_data = test_group[var]

    # Shapiro-Wilk test for normality
    control_stat, control_p = stats.shapiro(control_data)
    test_stat, test_p = stats.shapiro(test_data)

    print(f"{var} - Control Group Normality - W Statistic: {control_stat}, P-value: {control_p}")
    print(f"{var} - Test Group Normality - W Statistic: {test_stat}, P-value: {test_p}")

    # Interpretation for normality
    if control_p > 0.05 and test_p > 0.05:
        print(f"{var}: Both groups follow a normal distribution.")
    else:
        print(f"{var}: At least one group does not follow a normal distribution.")

    # Levene's test for equal variance
    var_stat, var_p = stats.levene(control_data, test_data)
    print(f"{var} - Equal Variance Test (Levene's Test) - Statistic: {var_stat}, P-value: {var_p}")

    # Interpretation for equal variance
    if var_p > 0.05:
        print(f"{var}: Variances are equal between groups.\n")
    else:
        print(f"{var}: Variances are not equal between groups.\n")

impression_count - Control Group Normality - W Statistic: 0.9437122006853682, P-value: 0.11452370456781957
impression_count - Test Group Normality - W Statistic: 0.9484545305279277, P-value: 0.1536675855316964
impression_count: Both groups follow a normal distribution.
impression_count - Equal Variance Test (Levene's Test) - Statistic: 7.754060233281115, P-value: 0.007225907712495474
impression_count: Variances are not equal between groups.

reach_count - Control Group Normality - W Statistic: 0.9740168274197124, P-value: 0.6537947092601243
reach_count - Test Group Normality - W Statistic: 0.9324318945860866, P-value: 0.05699605107347176
reach_count: Both groups follow a normal distribution.
reach_count - Equal Variance Test (Levene's Test) - Statistic: 2.514853842854583, P-value: 0.11821567704023767
reach_count: Variances are equal between groups.

click_count - Control Group Normality - W Statistic: 0.9632073621656805, P-value: 0.373095593451384
click_count - Test Group Normality - W

## A/B Tetsing

In [21]:
from scipy import stats
import numpy as np

# Split data for the control and test groups
control_group_clicks = control_group['impression_count']
test_group_clicks = test_group['impression_count']

# Check if the variances are equal using Levene's test
stat, p_val_levene = stats.levene(control_group_clicks, test_group_clicks)
print("Levene's Test p-value:", p_val_levene)

# Perform Welch's t-test or regular t-test depending on Levene's test result
if p_val_levene < 0.05:
    # If variances are not equal, use Welch's t-test
    print("Performing Welch's t-test (variances are unequal)")
    t_stat, p_value = stats.ttest_ind(control_group_clicks, test_group_clicks, equal_var=False)
else:
    # If variances are equal, use regular t-test
    print("Performing regular t-test (variances are equal)")
    t_stat, p_value = stats.ttest_ind(control_group_clicks, test_group_clicks, equal_var=True)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpretation of results
if p_value < 0.05:
    print("The difference between the two groups is statistically significant.")
else:
    print("The difference between the two groups is not statistically significant.")

# Print the means for comparison
mean_control = np.mean(control_group_clicks)
mean_test = np.mean(test_group_clicks)
print(f"Control Group Mean Impressions: {mean_control}")
print(f"Test Group Mean Impressions: {mean_test}")

Levene's Test p-value: 0.007225907712495474
Performing Welch's t-test (variances are unequal)
T-statistic: 4.969499291621205
P-value: 8.136360620352422e-06
The difference between the two groups is statistically significant.
Control Group Mean Impressions: 109559.75862068965
Test Group Mean Impressions: 74584.8


In [23]:
from scipy import stats
import numpy as np

# Split data for the control and test groups
control_group_reach = control_group['reach_count']
test_group_reach = test_group['reach_count']

# Perform t-test assuming normal distribution
t_stat, p_value = stats.ttest_ind(control_group_reach, test_group_reach)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpretation of results
if p_value < 0.05:
    print("The difference between the two groups is statistically significant.")
else:
    print("The difference between the two groups is not statistically significant.")

# Print the means for comparison
mean_control = np.mean(control_group_reach)
mean_test = np.mean(test_group_reach)
print(f"Control Group Mean Reach: {mean_control}")
print(f"Test Group Mean Reach: {mean_test}")

T-statistic: 5.392566808869295
P-value: 1.3332074556983057e-06
The difference between the two groups is statistically significant.
Control Group Mean Reach: 88844.93103448277
Test Group Mean Reach: 53491.566666666666


Both the Control Group and Test Group follow a normal distribution for both Impression Count and Reach Count, as confirmed by the Shapiro-Wilk test. For Impression Count, Levene's test shows that the variances between the two groups are not equal, so Welch's t-test was applied, resulting in a p-value significantly below 0.05. This indicates a statistically significant difference, with the Control Group demonstrating a higher number of impressions than the Test Group. For Reach Count, Levene's test confirms that the variances are equal between groups. The t-test results indicate a statistically significant difference in reach, with the Control Group reaching more users than the Test Group.


# Bayesian analysis

In [24]:
import numpy as np
import pandas as pd
import scipy.stats as stats

# Prepare control and test group data
control_df = control_group
test_df = test_group

# Calculate click rate for each group
control_df['click_rate'] = control_df['click_count'] / control_df['impression_count']
test_df['click_rate'] = test_df['click_count'] / test_df['impression_count']

# Prior distribution for click rate of control group (Beta(1, 1))
alpha_control = 1
beta_control = 1

# Prior distribution for click rate of test group (Beta(1, 1))
alpha_test = 1
beta_test = 1

# Click counts and impression counts for both groups
clicks_control = control_df['click_count'].values
impressions_control = control_df['impression_count'].values

clicks_test = test_df['click_count'].values
impressions_test = test_df['impression_count'].values

# Sum up clicks and impressions for posterior distribution calculation
total_clicks_control = np.sum(clicks_control)
total_impressions_control = np.sum(impressions_control)

total_clicks_test = np.sum(clicks_test)
total_impressions_test = np.sum(impressions_test)

# Calculate posterior distribution: Beta distribution
# The parameters of Beta distribution are updated as Beta(1, 1) + Binomial(total clicks, click rate)
alpha_post_control = alpha_control + total_clicks_control
beta_post_control = beta_control + total_impressions_control - total_clicks_control

alpha_post_test = alpha_test + total_clicks_test
beta_post_test = beta_test + total_impressions_test - total_clicks_test

# Draw samples from the posterior distribution
control_samples = np.random.beta(alpha_post_control, beta_post_control, 10000)
test_samples = np.random.beta(alpha_post_test, beta_post_test, 10000)

# Calculate the difference in click rates between the two groups
delta_samples = test_samples - control_samples

# Calculate the mean difference and the 95% credibility interval
mean_diff = np.mean(delta_samples)
cred_interval = np.percentile(delta_samples, [2.5, 97.5])

print(f"Mean difference in click rates: {mean_diff}")
print(f"95% credibility interval for the difference: {cred_interval}")

# Determine which group has a higher click rate based on the mean difference
if mean_diff > 0:
    print("The Test group has a higher click rate than the Control group.")
else:
    print("The Control group has a higher click rate than the Test group.")

Mean difference in click rates: 0.032316821727123864
95% credibility interval for the difference: [0.03189745 0.0327364 ]
The Test group has a higher click rate than the Control group.


The analysis used Bayesian statistics to compare the click rates of the Control and Test groups. Both groups were assumed to have a uniform prior distribution (Beta(1, 1)) for click rates. After updating the priors with observed data, posterior distributions were derived, and 10,000 samples were drawn from each group's posterior. The mean difference in click rates was calculated, with the Test group showing a higher click rate than the Control group. The 95% credibility interval for the difference did not include zero, confirming a statistically significant result. Therefore, the Test group has a higher click rate than the Control group.