In [None]:
!pip install statsmodels


# A/B Testing Exploration

This notebook contains a brief exploratory analysis of a marketing A/B test.
The goal is to compare control and treatment groups and understand conversion performance
before building dashboards and deeper statistical models.


In [2]:
import pandas as pd
import os

file_path = os.path.join("..", "data", "sample_model_ready_marketing_ab_testing_dataset.csv")
df = pd.read_csv(file_path)
df.head()

Unnamed: 0.1,Unnamed: 0,user_id,variant,converted,total ads,most ads day,most ads hour,timestamp,channel,device,clicked,revenue,Page,Time
0,2,1144181,B,1,21,Tuesday,18,10-03-2023 05:19,Social Media,Desktop,1,64.246527,Page A,1.18
1,7,1496843,B,0,17,Sunday,18,13-02-2023 21:55,Organic,Desktop,0,0.0,Page B,2.53
2,8,1448851,B,0,21,Tuesday,19,04-03-2023 10:36,Referral,Desktop,0,0.0,Page B,1.49
3,11,1637531,B,0,47,Wednesday,13,04-01-2023 13:25,Email,Desktop,1,0.0,Page B,2.23
4,12,1081965,B,1,61,Tuesday,20,09-01-2023 06:49,Email,Desktop,1,240.986829,Page A,1.18


In [3]:
# Remove unnecessary index column
df = df.drop(columns=["Unnamed: 0"])

# Check again
df.head()


Unnamed: 0,user_id,variant,converted,total ads,most ads day,most ads hour,timestamp,channel,device,clicked,revenue,Page,Time
0,1144181,B,1,21,Tuesday,18,10-03-2023 05:19,Social Media,Desktop,1,64.246527,Page A,1.18
1,1496843,B,0,17,Sunday,18,13-02-2023 21:55,Organic,Desktop,0,0.0,Page B,2.53
2,1448851,B,0,21,Tuesday,19,04-03-2023 10:36,Referral,Desktop,0,0.0,Page B,1.49
3,1637531,B,0,47,Wednesday,13,04-01-2023 13:25,Email,Desktop,1,0.0,Page B,2.23
4,1081965,B,1,61,Tuesday,20,09-01-2023 06:49,Email,Desktop,1,240.986829,Page A,1.18


In [4]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   user_id        9999 non-null   int64  
 1   variant        9999 non-null   object 
 2   converted      9999 non-null   int64  
 3   total ads      9999 non-null   int64  
 4   most ads day   9999 non-null   object 
 5   most ads hour  9999 non-null   int64  
 6   timestamp      9999 non-null   object 
 7   channel        9999 non-null   object 
 8   device         9999 non-null   object 
 9   clicked        9999 non-null   int64  
 10  revenue        9999 non-null   float64
 11  Page           9999 non-null   object 
 12  Time           9999 non-null   float64
dtypes: float64(2), int64(5), object(6)
memory usage: 1015.6+ KB


In [5]:
df["variant"].value_counts()


variant
B    9547
A     452
Name: count, dtype: int64

The variants have unequal sample sizes. 
However, a two-sample z-test for proportions is appropriate 
for large sample sizes and accounts for this imbalance.


In [6]:
conversion_summary = (
    df.groupby("variant")
    .agg(conversion_rate=("converted", "mean"))
    .reset_index()
)

conversion_summary


Unnamed: 0,variant,conversion_rate
0,A,0.099558
1,B,0.105792


### Initial Observation

The treatment group shows a higher conversion rate compared to the control group.
This suggests a potential positive uplift, which will be validated further
using statistical significance testing and dashboard analysis.


## Statistical Significance Test (Z-test)

To determine whether the observed difference in conversion rates
is statistically significant, a two-sample z-test for proportions
is performed.


In [10]:
from statsmodels.stats.proportion import proportions_ztest

summary = df.groupby("variant")["converted"].agg(["sum", "count"])

successes = summary["sum"].values
trials = summary["count"].values

z_stat, p_value = proportions_ztest(successes, trials)

z_stat, p_value


(np.float64(-0.42161541085729465), np.float64(0.6733057558769835))

### Statistical Interpretation

The p-value obtained from the z-test is 0.67, which is greater than the
commonly used significance level of 0.05.

This indicates that the observed difference in conversion rates between
Variant A and Variant B is not statistically significant and may be due
to random variation.


## Business Recommendation

Although Variant B shows a slightly higher conversion rate, the difference
is not statistically significant based on the z-test results.

It is recommended to either run the experiment for a longer duration,
increase sample balance, or analyze additional metrics such as revenue
or user segments before making a final rollout decision.
