### Importing the libraries 

In [16]:
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind
import math

### function to calculate the t score 

In [25]:
# t-score function 
def t_score(mu_1, var_1, n_1, mu_2, var_2, n_2): # mu = mean of the column, var = variance of the column, n =  number of samples
    numerator = mu_1 - mu_2
    denominator = math.sqrt( (var_1/n_1) + (var_2/n_2))
    return numerator/denominator

# degree of freedom 
def degree_of_freedom(var_1, n_1, var_2, n_2):
    numerator = (var_1/n_1 + var_2/n_2) * (var_1/n_1 + var_2/n_2)
    denominator_1 = (var_1/n_1) * (var_1/n_1) / (n_1 - 1)
    denominator_2 = (var_2/n_2) * (var_2/n_2) / (n_2 - 1)
    full_denominator = denominator_1 + denominator_2
    return numerator/full_denominator

### Downloading the datasets from the Kaggle

In [3]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("amirmotefaker/ab-testing-dataset")

print("Path to dataset files:", path)

  from .autonotebook import tqdm as notebook_tqdm


Downloading from https://www.kaggle.com/api/v1/datasets/download/amirmotefaker/ab-testing-dataset?dataset_version_number=1...


100%|██████████| 2.07k/2.07k [00:00<?, ?B/s]

Extracting files...
Path to dataset files: C:\Users\skoma\.cache\kagglehub\datasets\amirmotefaker\ab-testing-dataset\versions\1





In [4]:
path_to_this_folder = 'C:/Users/skoma/OneDrive/Dokumen/Data Analytics/Python/AB-testing'
control_group = pd.read_csv(path_to_this_folder + '/control_group.csv', delimiter = ';')
test_group = pd.read_csv(path_to_this_folder + '/test_group.csv', delimiter = ';')

In [5]:
test_group

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Test Campaign,1.08.2019,3008,39550,35820,3038,1946,1069,894,255
1,Test Campaign,2.08.2019,2542,100719,91236,4657,2359,1548,879,677
2,Test Campaign,3.08.2019,2365,70263,45198,7885,2572,2367,1268,578
3,Test Campaign,4.08.2019,2710,78451,25937,4216,2216,1437,566,340
4,Test Campaign,5.08.2019,2297,114295,95138,5863,2106,858,956,768
5,Test Campaign,6.08.2019,2458,42684,31489,7488,1854,1073,882,488
6,Test Campaign,7.08.2019,2838,53986,42148,4221,2733,2182,1301,890
7,Test Campaign,8.08.2019,2916,33669,20149,7184,2867,2194,1240,431
8,Test Campaign,9.08.2019,2652,45511,31598,8259,2899,2761,1200,845
9,Test Campaign,10.08.2019,2790,95054,79632,8125,2312,1804,424,275


In [6]:
control_group

Unnamed: 0,Campaign Name,Date,Spend [USD],# of Impressions,Reach,# of Website Clicks,# of Searches,# of View Content,# of Add to Cart,# of Purchase
0,Control Campaign,1.08.2019,2280,82702.0,56930.0,7016.0,2290.0,2159.0,1819.0,618.0
1,Control Campaign,2.08.2019,1757,121040.0,102513.0,8110.0,2033.0,1841.0,1219.0,511.0
2,Control Campaign,3.08.2019,2343,131711.0,110862.0,6508.0,1737.0,1549.0,1134.0,372.0
3,Control Campaign,4.08.2019,1940,72878.0,61235.0,3065.0,1042.0,982.0,1183.0,340.0
4,Control Campaign,5.08.2019,1835,,,,,,,
5,Control Campaign,6.08.2019,3083,109076.0,87998.0,4028.0,1709.0,1249.0,784.0,764.0
6,Control Campaign,7.08.2019,2544,142123.0,127852.0,2640.0,1388.0,1106.0,1166.0,499.0
7,Control Campaign,8.08.2019,1900,90939.0,65217.0,7260.0,3047.0,2746.0,930.0,462.0
8,Control Campaign,9.08.2019,2813,121332.0,94896.0,6198.0,2487.0,2179.0,645.0,501.0
9,Control Campaign,10.08.2019,2149,117624.0,91257.0,2277.0,2475.0,1984.0,1629.0,734.0


### <b><u> Brief procesdures for A/B testing </u></b>

#### 1. Define hypothesis
- <b> Null Hypothesis (H₀) </b>: There is no difference in the number of website clicks between the Control and Test campaigns. <br>
- <b> Alternative Hypothesis (H₁) </b>: There is a difference in website clicks between the two campaigns.<br>

#### 2. Identify Variants
Separate the dataset into two groups:
- <b>Control Campaign </b> <br>

- <b> Test Campaign </b><br>



#### 3. Check Data Consistency
Ensure each group has a comparable number of records and similar distribution of impressions, spend, etc., to make a fair comparison.


### 4. Statistical Testing
Conduct a hypothesis test (e.g., t-test or Mann-Whitney U test) to determine whether the difference in website clicks between the two campaigns is statistically significant.

- <b> Null hypothesis (H₀) </b>: No difference in website clicks between campaigns.
- <b> Alternative hypothesis (H₁) </b>: A difference exists.

### 5. Interpret Results

- If p-value < 0.05: reject H₀ → significant difference. <br>
- If p-value ≥ 0.05: fail to reject H₀ → no significant difference.

### 6. Make a Recommendation
Based on the analysis, recommend which campaign is more effective and whether the Test Campaign should replace or complement the Control.

In [7]:
# Summary statistics
print("Control Campaign Stats:")
print(control_group[['# of Impressions', '# of Purchase']].describe())

print("\nTest Campaign Stats:")
print(test_group[['# of Impressions', '# of Purchase']].describe())

Control Campaign Stats:
       # of Impressions  # of Purchase
count         29.000000      29.000000
mean      109559.758621     522.793103
std        21688.922908     185.028642
min        71274.000000     222.000000
25%        92029.000000     372.000000
50%       113430.000000     501.000000
75%       121332.000000     670.000000
max       145248.000000     800.000000

Test Campaign Stats:
       # of Impressions  # of Purchase
count         30.000000      30.000000
mean       74584.800000     521.233333
std        32121.377422     211.047745
min        22521.000000     238.000000
25%        47541.250000     298.000000
50%        68853.500000     500.000000
75%        99500.000000     701.000000
max       133771.000000     890.000000


In [8]:
# if there is any missing values on dataset
column_names = control_group.columns
for col in column_names:
    if sum(control_group[col].isnull()) > 0:
        print('control_group has', sum(control_group[col].isnull()) , 'missing vals for ', str(col))

    if sum(test_group[col].isnull()) > 0:
        print('testing_group has missing vals for ', str(col))

control_group has 1 missing vals for  # of Impressions
control_group has 1 missing vals for  Reach
control_group has 1 missing vals for  # of Website Clicks
control_group has 1 missing vals for  # of Searches
control_group has 1 missing vals for  # of View Content
control_group has 1 missing vals for  # of Add to Cart
control_group has 1 missing vals for  # of Purchase


In [9]:
# we can see that the some columns have missing values for control group, row 5
control_group.iloc[4]

Campaign Name          Control Campaign
Date                          5.08.2019
Spend [USD]                        1835
# of Impressions                    NaN
Reach                               NaN
# of Website Clicks                 NaN
# of Searches                       NaN
# of View Content                   NaN
# of Add to Cart                    NaN
# of Purchase                       NaN
Name: 4, dtype: object

<i> Due to the small size of the dataset, imputing would distort analysis. Therefore, we opted to drop rows with missing values. This resulted in the Control group having 29 records and the Test group having 30.</i>

In [10]:
# dropping the row 5 for control_group
control_group = control_group.dropna()

In [11]:
# A/B Testing - Independent T-tests
print("\nT-test for # of Impressions:")
t_stat_imp, p_val_imp = ttest_ind(control_group['# of Impressions'], test_group['# of Impressions'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")

print("\nT-test for Reach:")
t_stat_imp, p_val_imp = ttest_ind(control_group['Reach'], test_group['Reach'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")

print("\nT-test for # of Website Clicks:")
t_stat_imp, p_val_imp = ttest_ind(control_group['# of Website Clicks'], test_group['# of Website Clicks'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")

print("\nT-test for # of Searches:")
t_stat_imp, p_val_imp = ttest_ind(control_group['# of Searches'], test_group['# of Searches'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")

print("\nT-test for # of View Content:")
t_stat_imp, p_val_imp = ttest_ind(control_group['# of View Content'], test_group['# of View Content'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")

print("\nT-test for # of Add to Cart:")
t_stat_imp, p_val_imp = ttest_ind(control_group['# of Add to Cart'], test_group['# of Add to Cart'], equal_var=False)
print(f"T-statistic: {t_stat_imp:.4f}, P-value: {p_val_imp:.4f}")



T-test for # of Impressions:
T-statistic: 4.9161, P-value: 0.0000

T-test for Reach:
T-statistic: 5.3251, P-value: 0.0000

T-test for # of Website Clicks:
T-statistic: -1.5761, P-value: 0.1205

T-test for # of Searches:
T-statistic: -1.1244, P-value: 0.2678

T-test for # of View Content:
T-statistic: 0.4740, P-value: 0.6374

T-test for # of Add to Cart:
T-statistic: 4.2375, P-value: 0.0001


<b> Interpretation: </b>
There is a statistically significant difference in the number of impressions between the Control and Test campaigns.
This means the two campaigns reached substantially different audience sizes — possibly due to different budgets, targeting, or ad design.

### Manual calculations to achieve the same goals 

In [26]:
# Impressions for control group
mu_1 = sum(control_group['# of Impressions']) / len(control_group['# of Impressions'])
st_1 = control_group['# of Impressions'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Impressions for test group
mu_2 = sum(test_group['# of Impressions']) / len(test_group['# of Impressions'])
st_2 = test_group['# of Impressions'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

4.916124133015365


In [27]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression


np.float64(51.04566838503724)

#### calculation for "Impression" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*  
- For *degrees of freedom = 60*, the critical t-value is *2.000*  

Since our degrees of freedom are approximately *51*, we can estimate the critical value to be around *2.005*.  
Our **calculated t-score is 4.91**, which is **greater than** the critical t-value.

This means the observed difference is **statistically significant**.  
Therefore, we **reject the null hypothesis (H₀)**.
​



In [28]:
# Reach for control group
mu_1 = sum(control_group['Reach']) / len(control_group['Reach'])
st_1 = control_group['Reach'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Reach for test group
mu_2 = sum(test_group['Reach']) / len(test_group['Reach'])
st_2 = test_group['Reach'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

5.325114270075589


In [29]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression

np.float64(53.97636638642217)

#### Calculation for "Reach" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*  
- For *degrees of freedom = 60*, the critical t-value is *2.000*  

Since our degrees of freedom are approximately *53.9*, we can estimate the critical value to be around *2.004*.  
Our **calculated t-score is 5.32**, which is **greater than** the critical t-value.

This means the observed difference is **statistically significant**.  
Therefore, we **reject the null hypothesis (H₀)**.

In [None]:
# # of Website Clicks for control group
mu_1 = sum(control_group['# of Website Clicks']) / len(control_group['# of Website Clicks'])
st_1 = control_group['# of Website Clicks'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Reach for test group
mu_2 = sum(test_group['# of Website Clicks']) / len(test_group['# of Website Clicks'])
st_2 = test_group['# of Website Clicks'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

-1.5761438703924608


In [31]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression

np.float64(56.777245205959275)

#### Calculation for "# of Website Clicks" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*  
- For *degrees of freedom = 60*, the critical t-value is *2.000*  

Since our degrees of freedom are approximately *56.77*, we can estimate the critical value to be around *2.004*.  
Our **calculated t-score is -1.576**, which is **less than** the critical t-value.

This means the observed difference is **statistically insignificant**.  
Therefore, we **fail to reject the null hypothesis (H₀)**.

In [32]:
# # of Website Clicks for control group
mu_1 = sum(control_group['# of Searches']) / len(control_group['# of Searches'])
st_1 = control_group['# of Searches'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Reach for test group
mu_2 = sum(test_group['# of Searches']) / len(test_group['# of Searches'])
st_2 = test_group['# of Searches'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

-1.124368541836539


In [33]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression

np.float64(38.55600287603084)

#### Calculation for "# of Searches" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*   

Since our degrees of freedom are approximately *38.55*, we can estimate the critical value to be around *2.0249*.  
Our **calculated t-score is -1.12**, which is **less than** the critical t-value.

This means the observed difference is **statistically insignificant**.  
Therefore, we **fail to reject the null hypothesis (H₀)**.

In [34]:
# # of Website Clicks for control group
mu_1 = sum(control_group['# of View Content']) / len(control_group['# of View Content'])
st_1 = control_group['# of View Content'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Reach for test group
mu_2 = sum(test_group['# of View Content']) / len(test_group['# of View Content'])
st_2 = test_group['# of View Content'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

0.47404676948319135


In [35]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression

np.float64(52.56220099495453)

#### Calculation for "# of View Content" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*
- For *degrees of freedom = 60*, the critical t-value is *2.000*    

Since our degrees of freedom are approximately *52.56*, we can estimate the critical value to be around *2.004*.  
Our **calculated t-score is 0.474**, which is **less than** the critical t-value.

This means the observed difference is **statistically insignificant**.  
Therefore, we **fail to reject the null hypothesis (H₀)**.

In [36]:
# # of Website Clicks for control group
mu_1 = sum(control_group['# of Add to Cart']) / len(control_group['# of Add to Cart'])
st_1 = control_group['# of Add to Cart'].std()
var_1 = st_1 * st_1
n_1 = 29 # since we dropped one row

# Reach for test group
mu_2 = sum(test_group['# of Add to Cart']) / len(test_group['# of Add to Cart'])
st_2 = test_group['# of Add to Cart'].std()
var_2 = st_2 * st_2
n_2 = 30

t_score_for_impression = t_score(mu_1, var_1, n_1, mu_2, var_2, n_2) # mu = mean of the column, var = variance of the column, n =  number of samples
print(t_score_for_impression)

4.237529720775083


In [37]:
# degree of freedom
degree_of_freedom_for_impression = degree_of_freedom(var_1, n_1, var_2, n_2)
degree_of_freedom_for_impression

np.float64(54.980305679654634)

#### Calculation for "# of View Content" above
At a significance level of **α = 0.05**:  
- For *degrees of freedom = 40*, the critical t-value is *2.012*
- For *degrees of freedom = 60*, the critical t-value is *2.000*    

Since our degrees of freedom are approximately *54.98*, we can estimate the critical value to be around *2.004*.  
Our **calculated t-score is 4.237**, which is **greater than** the critical t-value.

This means the observed difference is **statistically significant**.  
Therefore, we **reject the null hypothesis (H₀)**.

### <u>Recommendations </u>

<b>Statistically Significant Differences (p < 0.05):</b><br>
"# of Impressions": T = 4.9161 → The Test campaign significantly differed in ad visibility.

"Reach": T = 5.3251 → A significantly higher number of unique users were reached.

"# of Add to Cart": T = 4.2375 → A strong positive impact on shopping behaviour.


<b>Not Statistically Significant (p > 0.05):</b><br>
"# of Website Clicks": T = -1.5761 → No meaningful difference.

"# of Searches": T = -1.1244 → No significant difference in search behavior.

"# of View Content": T = 0.4740 → Content views were statistically similar.


<i> Overall, the findings suggest that the Test campaign is more effective at improving visibility, reaching a broader audience, and encouraging users to add items to their cart — all strong indicators of customer interest and intent.
However, despite this increased visibility, the campaign did not lead to a significant rise in engagement actions such as website clicks or on-site searches.</i>