**Author**: Chertova Nataliya

**Date**: 05.06.2025

---

# A/B Testing Results Analysis

Imagine you've been approached by representatives of the online store BitMotion Kit, which sells gamified products for those leading a healthy lifestyle. It has its own target audience, and even some bestsellers have emerged: an expander with a counter and reminder, and an under-desk exercise bike with Bluetooth.

In the future, the company wants to expand its product range. But first, one problem needs to be solved. The online store's interface is too complex for users — customer feedback indicates this.

To attract new customers and increase sales, the store owners developed a new version of the website and tested it on a portion of users. The idea is that this solution will demonstrably increase the number of users who make a purchase.

Your task is to evaluate the results of the A/B test. You have at your disposal:
* user action data and their distribution into groups,
* technical specification.

Evaluate the correctness of the test execution and analyze its results.

## 1. Describe the research goals.

The goal of the project is to evaluate the A/B test data provided by the BitMotion Kit online store.
To conduct the evaluation, it is necessary to:
* assess the sufficiency and adequacy of the data,
* clean the data of duplicates,
* identify users participating in the test,
* check compliance with the technical specification requirements,
* check the uniform distribution of users across test groups,
* check for the absence of overlaps with a competing test,
* determine the analysis horizon,
* assess the sufficiency of the sample to obtain statistically significant results,
* calculate for each group the number of visitors who made a purchase, the total number of visitors, and the conversion rate,
* evaluate the change in conversion using an appropriate statistical test, considering all stages of hypothesis testing,
* describe the conclusions from the conducted A/B test results evaluation.

---

## 2. Load and Assess Data Integrity

In [86]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind
import numpy as np
from scipy.stats import f_oneway
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportions_ztest


In [67]:
participants = pd.read_csv('https://code.s3.yandex.net/datasets/ab_test_participants.csv')
events = pd.read_csv('https://code.s3.yandex.net/datasets/ab_test_events.zip',
                     parse_dates=['event_dt'], low_memory=False)

In [68]:
participants.info()

participants.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14525 entries, 0 to 14524
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  14525 non-null  object
 1   group    14525 non-null  object
 2   ab_test  14525 non-null  object
 3   device   14525 non-null  object
dtypes: object(4)
memory usage: 454.0+ KB


Unnamed: 0,user_id,group,ab_test,device
0,0002CE61FF2C4011,B,interface_eu_test,Mac
1,001064FEAAB631A1,B,recommender_system_test,Android
2,001064FEAAB631A1,A,interface_eu_test,Android
3,0010A1C096941592,A,recommender_system_test,Android
4,001E72F50D1C48FA,A,interface_eu_test,Mac
5,002412F1EB3F6E38,B,interface_eu_test,Mac
6,002540BE89C930FB,B,interface_eu_test,Android
7,0031F1B5E9FBF708,A,interface_eu_test,Android
8,003346BB64227D0C,B,interface_eu_test,Android
9,00341D8401F0F665,A,recommender_system_test,iPhone


The analysis data is sufficient and integral. There are no missing values. Let's check for duplicates:

In [69]:
duplicates_mask = participants.duplicated()
duplicated_rows = participants[duplicates_mask]
print(duplicated_rows)

Empty DataFrame
Columns: [user_id, group, ab_test, device]
Index: []


Duplicates are absent.

In [70]:
events.info()

events.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 787286 entries, 0 to 787285
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   user_id     787286 non-null  object        
 1   event_dt    787286 non-null  datetime64[ns]
 2   event_name  787286 non-null  object        
 3   details     249022 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 24.0+ MB


Unnamed: 0,user_id,event_dt,event_name,details
0,GLOBAL,2020-12-01 00:00:00,End of Black Friday Ads Campaign,ZONE_CODE15
1,CCBE9E7E99F94A08,2020-12-01 00:00:11,registration,0.0
2,GLOBAL,2020-12-01 00:00:25,product_page,
3,CCBE9E7E99F94A08,2020-12-01 00:00:33,login,
4,CCBE9E7E99F94A08,2020-12-01 00:00:52,product_page,
5,AA346F4D22148024,2020-12-01 00:01:46,registration,-2.0
6,7EF01D0E72AF449D,2020-12-01 00:02:06,registration,-5.0
7,9A6276AD14B14252,2020-12-01 00:02:20,registration,-2.0
8,9B186A3B1A995D36,2020-12-01 00:02:37,registration,-3.5
9,9A6276AD14B14252,2020-12-01 00:02:53,login,


The analysis data is sufficient and integral. Data is only missing in the `details` column, which is acceptable. Let's check for duplicates.

In [71]:
# Finding duplicates
duplicates_mask = events.duplicated()
duplicated_rows = events[duplicates_mask]

# Output the number of duplicates
num_duplicates = events.duplicated().sum()

# Output the duplicate rows themselves
print("\nDuplicate rows themselves:")
print(duplicated_rows)

# Calculate the total number of rows
total_rows = len(events)

# Calculate the percentage of duplicates
percentage_duplicates = (num_duplicates / total_rows) * 100
print(f"\nTotal number of rows: {total_rows}")
print(f"Number of duplicates (second and subsequent occurrences): {num_duplicates}")
print(f"Percentage of duplicates: {percentage_duplicates:.2f}%")


Duplicate rows themselves:
                 user_id            event_dt    event_name details
50      A39D63750BBE9B34 2020-12-01 00:08:24         login     NaN
132     631020621D23464A 2020-12-01 00:25:03         login     NaN
278     AD6541E75198ABEF 2020-12-01 00:48:46         login     NaN
390     ADBBC43BED1249C8 2020-12-01 01:06:02  product_cart     NaN
446     928AD890A8E7BDE7 2020-12-01 01:11:10         login     NaN
...                  ...                 ...           ...     ...
787141            GLOBAL 2020-12-31 23:21:18      purchase    4.49
787200            GLOBAL 2020-12-31 23:36:55      purchase    4.49
787209            GLOBAL 2020-12-31 23:39:41  product_cart     NaN
787217  F9C2F1ECC9624248 2020-12-31 23:42:17  product_page     NaN
787263            GLOBAL 2020-12-31 23:53:42      purchase    4.49

[36318 rows x 4 columns]

Total number of rows: 787286
Number of duplicates (second and subsequent occurrences): 36318
Percentage of duplicates: 4.61%


Less than 5% of the data are complete duplicates. Let's remove them from the analysis.


In [72]:
events_no_duplicates = events.drop_duplicates()

## 3. Evaluate the Correctness of the Test Using the `ab_test_participants` Table

### 3.1 Identify Participants and Verify Test Conditions:

- **Check compliance with the technical specification.**
- **Verify uniform distribution of users across test groups.**
- **Ensure no overlap with competing tests** (no users participating in two test groups simultaneously).

We are testing the hypothesis that simplifying the interface will lead to at least a three percentage point increase in the conversion rate of registered users to buyers within seven days of system registration.

In [73]:
participants.head()

Unnamed: 0,user_id,group,ab_test,device
0,0002CE61FF2C4011,B,interface_eu_test,Mac
1,001064FEAAB631A1,B,recommender_system_test,Android
2,001064FEAAB631A1,A,interface_eu_test,Android
3,0010A1C096941592,A,recommender_system_test,Android
4,001E72F50D1C48FA,A,interface_eu_test,Mac


In [74]:
# Let's assess the distribution of users across groups
A = participants[(participants['ab_test'] == 'interface_eu_test') & (participants['group'] == 'A')]['user_id']

B = participants[(participants['ab_test'] == 'interface_eu_test') & (participants['group'] == 'B')]['user_id']

intersection = list(set(A) & set(B))
print(f"Intersection size: {len(intersection)}")
print(f"Users in intersection: {intersection}")

Intersection size: 0
Users in intersection: []


There is no user overlap between the groups; the groups are independent.

In [75]:
# assess the absence of overlaps with a competing test
eu_test = participants[participants['ab_test'] == 'interface_eu_test']['user_id']

system_test = participants[participants['ab_test'] == 'recommender_system_test']['user_id']

intersection = list(set(eu_test) & set(system_test))
print(f"Intersection size: {len(intersection)}")

Intersection size: 887


Users who participated in two tests simultaneously are present in the data. Let's evaluate the size of the groups in our test and remove these overlapping users.

In [76]:
print(f"Size of group A of the test of interest (before removal): {len(A)}")
print(f"Size of group B of the test of interest (before removal): {len(B)}")

# Removing overlapping users
users_to_exclude_set = set(intersection) 
eu_test_cleaned_df = participants[
    (~participants['user_id'].isin(users_to_exclude_set)) & 
    (participants['ab_test'] == 'interface_eu_test') 
]

A_cleaned = eu_test_cleaned_df[eu_test_cleaned_df['group'] == 'A']['user_id']

B_cleaned = eu_test_cleaned_df[eu_test_cleaned_df['group'] == 'B']['user_id']

# Displaying group sizes after removing overlaps
print(f"Size of group A (after removing overlaps): {len(A_cleaned)}")
print(f"Size of group B (after removing overlaps): {len(B_cleaned)}")

Size of group A of the test of interest (before removal): 5383
Size of group B of the test of interest (before removal): 5467
Size of group A (after removing overlaps): 4952
Size of group B (after removing overlaps): 5011


The data obtained after processing shows that the groups are evenly distributed.

---

### 3.2 Analyze User Activity Data from the `ab_test_events` Table:

* Keep only events related to users participating in the test under study;


In [77]:
# Extract user_ids of all users participating in the test
participating_user_ids = eu_test_cleaned_df['user_id'].unique()

# Filter ab_test_events, keeping only the relevant users
events_filtered = events[events['user_id'].isin(participating_user_ids)]

events_filtered.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 73815 entries, 64672 to 780371
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   user_id     73815 non-null  object        
 1   event_dt    73815 non-null  datetime64[ns]
 2   event_name  73815 non-null  object        
 3   details     19450 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 2.8+ MB


- **Determine the analysis horizon:** calculate the time (lifetime) of a user's event after registration and keep only those events that occurred within the first seven days from registration;


In [78]:
# Let's display all unique events for users
events_filtered['event_name'].unique()

array(['registration', 'login', 'product_page', 'purchase',
       'product_cart'], dtype=object)

In [79]:
# Determine the registration date for each user
registration_dates = events_filtered[events_filtered['event_name'] == 'registration'].groupby('user_id').agg(
    registration_dt=('event_dt', 'min')
).reset_index()

# Join the registration date to each user's event
events_with_registration_dt = pd.merge(
    events_filtered,
    registration_dates,
    on='user_id',
    how='left'
)

# Calculate lifetime in days
events_with_registration_dt['lifetime_days'] = (
    events_with_registration_dt['event_dt'].dt.date - events_with_registration_dt['registration_dt'].dt.date
).dt.days

# Keep only those events that occurred within the first seven days
events_within_7_days = events_with_registration_dt[events_with_registration_dt['lifetime_days'] <= 7].copy()

events_within_7_days.head()

Unnamed: 0,user_id,event_dt,event_name,details,registration_dt,lifetime_days
0,5F506CEBEDC05D30,2020-12-06 14:10:01,registration,0.0,2020-12-06 14:10:01,0
1,51278A006E918D97,2020-12-06 14:37:25,registration,-3.8,2020-12-06 14:37:25,0
2,A0C1E8EFAD874D8B,2020-12-06 17:20:22,registration,-3.32,2020-12-06 17:20:22,0
3,275A8D6254ACF530,2020-12-06 19:36:54,registration,-0.48,2020-12-06 19:36:54,0
4,0B704EB2DC7FCA4B,2020-12-06 19:42:20,registration,0.0,2020-12-06 19:42:20,0


* **Evaluate the sufficiency of the sample size** for obtaining statistically significant A/B test results. Given parameters:
    * Baseline conversion rate — 30%,
    * Test power — 80%,
    * Test confidence — 95%.

In [81]:
# Given parameters
p1 = 0.30  # Baseline conversion
p2 = p1 + 0.03  # Target conversion (p1 + MDE, which is three percentage points according to the initial hypothesis)
alpha = 0.05  # Significance level (for 95% confidence)
power = 0.80  # Test power

# Calculate effect size (Cohen's h)
effect_size = 2 * (np.arcsin(np.sqrt(p2)) - np.arcsin(np.sqrt(p1)))

# Initialize NormalIndPower class
power_analysis = NormalIndPower()

# Calculate sample size
sample_size = power_analysis.solve_power(
    effect_size=effect_size,
    alpha=alpha,
    power=power,
    ratio=1,  # Equal sample distribution
)

print(f"\nRequired sample size for each group: {int(sample_size)}")
print(f"Total required sample size for the test: {int(sample_size * 2)}")
print(f"\nProvided size of group A: {len(A_cleaned)}")
print(f"Provided size of group B: {len(B_cleaned)}")
print(f"\nConclusion: the obtained sample is sufficient for conducting a statistical evaluation")


Required sample size for each group: 3761
Total required sample size for the test: 7523

Provided size of group A: 4952
Provided size of group B: 5011

Conclusion: the obtained sample is sufficient for conducting a statistical evaluation


* **Calculate** for each group the **number of visitors who made a purchase** and the **total number of visitors**.


In [82]:
# Merging both tables
merged_df = pd.merge(
    eu_test_cleaned_df,
    events_within_7_days,
    on='user_id',
    how='left'
)

merged_df.head()

Unnamed: 0,user_id,group,ab_test,device,event_dt,event_name,details,registration_dt,lifetime_days
0,0002CE61FF2C4011,B,interface_eu_test,Mac,2020-12-07 04:37:31,registration,-2.38,2020-12-07 04:37:31,0
1,0002CE61FF2C4011,B,interface_eu_test,Mac,2020-12-07 04:37:49,login,,2020-12-07 04:37:31,0
2,0002CE61FF2C4011,B,interface_eu_test,Mac,2020-12-07 04:37:57,login,,2020-12-07 04:37:31,0
3,0002CE61FF2C4011,B,interface_eu_test,Mac,2020-12-07 04:38:54,login,,2020-12-07 04:37:31,0
4,0002CE61FF2C4011,B,interface_eu_test,Mac,2020-12-08 22:15:35,login,,2020-12-07 04:37:31,1


In [83]:
# Total number of visitors (unique user_ids) for each group
total_visitors_per_group = merged_df.groupby('group')['user_id'].nunique()
print("\n--- Total number of visitors in each group ---")
print(total_visitors_per_group)

# Number of visitors who made a purchase (unique user_ids with event_name == 'purchase')
purchases_df = merged_df[merged_df['event_name'] == 'purchase']

purchasers_per_group = purchases_df.groupby('group')['user_id'].nunique()
print("\n--- Number of visitors who made a purchase in each group ---")
print(purchasers_per_group)


--- Total number of visitors in each group ---
group
A    4952
B    5011
Name: user_id, dtype: int64

--- Number of visitors who made a purchase in each group ---
group
A    1411
B    1519
Name: user_id, dtype: int64


In [84]:
# Calculate and display conversion for each group
for group_name in total_visitors_per_group.index:
    total = total_visitors_per_group[group_name]
    purchasers = purchasers_per_group[group_name] 

    conversion_rate = (purchasers / total) * 100
    print(f"Group '{group_name}':")
    print(f"  Total visitors: {total}")
    print(f"  Purchasers: {purchasers}")
    print(f"  Conversion: {conversion_rate:.2f}%")

Group 'A':
  Total visitors: 4952
  Purchasers: 1411
  Conversion: 28.49%
Group 'B':
  Total visitors: 5011
  Purchasers: 1519
  Conversion: 30.31%


**Preliminary General Conclusion on User Activity Change in the Test Group Compared to the Control Group:**

According to the results obtained, an increase in conversion is observed in the test group compared to the control group (from 28.5% to 30.3%). It is necessary to evaluate whether such a change is statistically significant.

## 4. Evaluate A/B Testing Results:

* **Check the change in conversion using an appropriate statistical test, considering all stages of hypothesis testing.**

For calculating statistical significance, we'll use the **Z-test for proportions**.

We'll assume the **null hypothesis** is formulated as: there are no significant differences in user conversion between group A and group B.

In that case, the **alternative hypothesis** is: conversion in test group B is greater than conversion in control group A.


In [85]:
# Sample sizes A and B
n_a = total_visitors_per_group['A']
n_b = total_visitors_per_group['B']

# Number of successes (users with purchases) in groups A and B
m_a = purchasers_per_group['A']
m_b = purchasers_per_group['B']

p_a = m_a / n_a  # calculate proportion of successes for group A
p_b = m_b / n_b  # calculate proportion of successes for group B

print(f'n_a={n_a}, n_b={n_b}')
print(f'm_a={m_a}, m_b={m_b}')
print(f'p_a={p_a:.4f}, p_b={p_b:.4f}')

# Check assumption of sufficient data
# Condition: number of successes and failures in each group must be > 10
if (m_a > 10) and ((n_a - m_a) > 10) and (m_b > 10) and ((n_b - m_b) > 10):
    print('Assumption of sufficient data is met!')
else:
    print('Assumption of sufficient data is NOT met!')

n_a=4952, n_b=5011
m_a=1411, m_b=1519
p_a=0.2849, p_b=0.3031
Assumption of sufficient data is met!


In [87]:
alpha = 0.05 ## significance level for testing the hypothesis of equal probabilities

stat_ztest, p_value_ztest = proportions_ztest(
    [m_a, m_b],
    [n_a, n_b],
    alternative='smaller' # since H_1: p_a < p_b
)
# Note: The 'smaller' alternative for proportions_ztest [count1, count2], [nobs1, nobs2]
# tests if prop1 < prop2. In our case, if p_a < p_b.
# If H1 is p_b > p_a, then the test should be alternative='larger' when testing [m_b, m_a], [n_b, n_a]
# or alternative='smaller' if testing [m_a, m_b], [n_a, n_b] for the first prop being smaller than the second.
# Let's re-align the alternative with the hypothesis H1: p_B > p_A
# So, we are testing if the proportion of the first sample (p_a) is smaller than the proportion of the second sample (p_b).

print(f"P-value: {p_value_ztest}")

if p_value_ztest > alpha:
    print(f'pvalue={p_value_ztest} > {alpha}')
    print('The null hypothesis is confirmed!')
    print('There are no significant differences in conversion between users in group A and group B.')
else:
    print(f'pvalue={p_value_ztest} < {alpha}')
    print('The null hypothesis is not confirmed!')
    print('Conversion in test group B is greater than conversion in control group A, and this difference is statistically significant.')

P-value: 0.023117327881967534
pvalue=0.023117327881967534 < 0.05
The null hypothesis is not confirmed!
Conversion in test group B is greater than conversion in control group A, and this difference is statistically significant.


## Сonclusion on A/B Test Results
Describe the conclusions from the A/B test evaluation. What can be said about the A/B test results? Was the expected effect on conversion achieved?

---

### A/B Test Evaluation Summary

The A/B test evaluation leads to the following conclusions:

* **The hypothesis under investigation was confirmed; the expected effect was achieved.**
* **Simplifying the interface led to an increase in the conversion of registered users into buyers within seven days of registration, and this change is statistically significant.**