In [5]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

In [6]:
df_AB_test = pd.read_csv('ab_test_data.csv', sep = ';')

In [7]:
df_AB_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 404770 entries, 0 to 404769
Data columns (total 3 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    404770 non-null  int64 
 1   revenue    404770 non-null  int64 
 2   testgroup  404770 non-null  object
dtypes: int64(2), object(1)
memory usage: 9.3+ MB


In [8]:
df_AB_test.head()

Unnamed: 0,user_id,revenue,testgroup
0,1,0,b
1,2,0,a
2,3,0,a
3,4,0,b
4,5,0,b


<h1> Utilization of Bootstrap in A/B Testing: Metric Analysis and Hypothesis Testing

There are results of an A/B test in which two groups of users were presented with different sets of promotional offers

<h2> Calculating ARPU (Average Revenue Per User)

In [9]:
# Grouping by the test group, counting unique users, and summing up revenue.
ARPU_df = df_AB_test.groupby("testgroup", as_index = False)\
                                .agg({"user_id": pd.Series.nunique, "revenue": "sum"})\
                                .rename(columns = {"user_id": "users_amount"})

# Calculation of ARPU using the formula 
ARPU_df["ARPU"] = ARPU_df.revenue / ARPU_df.users_amount
ARPU_df

Unnamed: 0,testgroup,users_amount,revenue,ARPU
0,a,202103,5136189,25.41372
1,b,202667,5421603,26.751287


In [10]:
ARPU_diff = abs(ARPU_df.ARPU[0] - ARPU_df.ARPU[1]).round(2)
ARPU_diff_percentage = (ARPU_diff/ARPU_df.ARPU[0]*100).round(2)
print(f'The difference between the means in the two test groups is {ARPU_diff}, which is {ARPU_diff_percentage}%')

The difference between the means in the two test groups is 1.34, which is 5.27%


<h2> Checking the statistical significance in the difference of ARPU.

<h4> Null Hypothesis H0: Promotional offers in groups a and b did not significantly impact revenue.<h4>
Alternative Hypothesis H1: Promotional offers significantly increased revenue in group b.<h4>

In [11]:
# Splitting df into group a and group b
a_group = df_AB_test.query("testgroup == 'a'")
b_group = df_AB_test.query("testgroup == 'b'")

In [12]:
# Checking for intersection between sets
group_a = set(a_group.user_id)
group_b = set(b_group.user_id)

group_a.intersection(group_b)

set()

<h6> Bootstrap ARPU

In [13]:
arpu_median_a = a_group['revenue'].median()
arpu_median_b = b_group['revenue'].median()
print(f'a group median revenue = {arpu_median_a}, b group median revenue = {arpu_median_b}')

a group median revenue = 0.0, b group median revenue = 0.0


Comparing medians for ARPU is not advisable, as a significant number of users did not make any purchases, resulting in median values of 0 for both Group A and Group B. Therefore, in this case, we compare the means.

In [16]:
# Bootstrap from for means
mean_diff = []
for i in range(10000):
    sample_data_a_arpu = a_group['revenue'].sample(frac=1, replace=True)
    sample_mean_a = sample_data_a_arpu.mean()
    
    sample_data_b_arpu = b_group['revenue'].sample(frac=1, replace=True)
    sample_mean_b = sample_data_b_arpu.mean()
    
    sample_mean_diff = sample_mean_b - sample_mean_a
    mean_diff.append(sample_mean_diff)
    
left_arpu = pd.Series(mean_diff).quantile(0.025)
right_arpu = pd.Series(mean_diff).quantile(0.975)
print(f'Interval [{left_arpu} {right_arpu}]')

Interval [-2.9456185993691983 5.337996674210463]


<h6> CONCLUSION ARPU:

In the case of ARPU (considering users who did not make purchases), we cannot claim a statistically significant difference in revenue between group A and group B, since the interval Bootstrap_ARPU includes 0. In other words, the null hypothesis is not rejected.

<h2> Calculation of ARPPU (Average Revenue Per Paying User)

In the case of ARPPU (Average Revenue Per Paying User), we exclude users who did not make any purchases and focus solely on paying users. This involves filtering out users with zero revenue, grouping by the test group, and calculating the number of unique users and the sum of their revenue in each group.

In [17]:
ARPPU_df = df_AB_test.query("revenue != 0")\
                    .groupby("testgroup", as_index = False)\
                    .agg({"user_id": pd.Series.nunique, "revenue": "sum"})\
                    .rename(columns = {"user_id": "paying_users_amount"})

# Calculation of ARPPU
ARPPU_df["ARPPU"] = (ARPPU_df.revenue / ARPPU_df.paying_users_amount)
ARPPU_df

Unnamed: 0,testgroup,paying_users_amount,revenue,ARPPU
0,a,1928,5136189,2663.998444
1,b,1805,5421603,3003.658172


In [18]:
ARPPU_diff = abs(ARPPU_df.ARPPU[0] - ARPPU_df.ARPPU[1]).round(2)
ARPPU_diff_percentage = (ARPPU_diff / ARPPU_df.ARPPU[0] * 100).round(2)
print(f'The difference between the averages in the two test groups is {ARPPU_diff}, which is {ARPPU_diff_percentage}%')

The difference between the averages in the two test groups is 339.66, which is 12.75%


<h2> Checking the statistical significance in the difference of ARPPU

<h4> Null Hypothesis H0: Revenue among paying users in group a and group b does not significantly differ. <h4>
Alternative Hypothesis H1: Paying users from group b significantly contribute more revenue.<h4>

In [19]:
# Splitting df into group a and group b
ARPPU_a = df_AB_test.query("revenue != 0 & testgroup == 'a'")
ARPPU_b = df_AB_test.query("revenue != 0 & testgroup == 'b'")

<h6> Bootstrap ARPPU

In [20]:
arppu_median_a = ARPPU_a['revenue'].median()
arppu_median_b = ARPPU_b['revenue'].median()
print(f'a group median revenue = {arppu_median_a}, b group median revenue = {arppu_median_b}')

a group median revenue = 311.0, b group median revenue = 3022.0


In this case, since there are no zero values in revenue, it is possible to check the medians.

In [21]:
ARPPU_median_diff = abs(arppu_median_b - arppu_median_a)
print(f'The difference between the means in the two test groups is {ARPPU_median_diff}')

The difference between the means in the two test groups is 2711.0


In [22]:
# Bootstrap for medians 
median_diff = []
for i in range(10000):
    sample_data_a = ARPPU_a['revenue'].sample(frac=1, replace=True)
    sample_median_a = sample_data_a.median()
    
    sample_data_b = ARPPU_b['revenue'].sample(frac=1, replace=True)
    sample_median_b = sample_data_b.median()
    
    sample_median_diff = sample_median_b - sample_median_a
    median_diff.append(sample_median_diff)
    
left = pd.Series(median_diff).quantile(0.025)
right = pd.Series(median_diff).quantile(0.975)
print(f'Interval [{left} {right}]')

Interval [2655.0 2760.0]


<h6> CONCLUSION ARPPU:

In the case of ARPPU (considering only paying users), we can claim a statistically significant difference in revenue between group A and group B, since the interval Bootstrap_ARPU does not include 0. In other words, the null hypothesis H0 is rejected and the alternative hypothesis H1 is accepted.

<h2> Calculation of Conversion Rate

In [23]:
# Calculation of the total number of users in the groups
all_users = df_AB_test.groupby("testgroup", as_index = False)["user_id"].nunique()\
                        .rename(columns = {"user_id": "users_amount"})

In [24]:
# Calculation of the number of users who made a purchase
paying_users = df_AB_test.query("revenue!=0").groupby("testgroup", as_index = False)["user_id"].nunique()\
                                            .rename(columns = {"user_id": "paying_users_amount"})

In [25]:
# Merging dataframes and creating a cross-table.
users_df = all_users.merge(paying_users, on = 'testgroup')

In [26]:
users_df

Unnamed: 0,testgroup,users_amount,paying_users_amount
0,a,202103,1928
1,b,202667,1805


It is necessary to keep the users_df without the CR column as a cross-table for conducting the Chi2 test. Therefore, to add the CR column, a copy of users_df - conversion is created.

In [27]:
conversion = users_df.copy()
conversion["CR_%"] = conversion.paying_users_amount / conversion.users_amount * 100

In [28]:
conversion

Unnamed: 0,testgroup,users_amount,paying_users_amount,CR_%
0,a,202103,1928,0.953969
1,b,202667,1805,0.890624


<h2> Checking the statistical significance in the difference of Conversion rate

<h4> Null Hypothesis H0: The number of users making a purchase has not changed significantly. <h4>
Alternative Hypothesis H1: The number of users making a purchase has significantly decreased in group b.<h4>
p = 0.05

<h6> Chi-square Test

In [29]:
users_df = users_df.set_index('testgroup')

In [30]:
Chi2 = chi2_contingency(users_df)

p_chi2 = Chi2[1]
print(f'P = {p_chi2}')

P = 0.03824373651044168


<h6> CONCLUSION CR:

For the Conversion Rate, we can conclude that there is a statistically significant difference in the number of users making purchases between group a and group b, as P < P value. This provides grounds to reject the null hypothesis H0, which assumes no change in the number of users making purchases, and accept the alternative hypothesis that in group b, people are making fewer purchases.

<h2> OVERALL CONCLUSION 

To determine which set of offers can be considered the best, decisions should align with the objectives of the A/B test.

1) If the goal of the test was to increase conversion and ARPU, the test group (b) did not show a significant difference in ARPU. However, there was a statistically significant decrease in conversion, indicating that for these goals, promotional offers in group b did not perform well.

2) If the goal of the test was to increase the average revenue per paying user (ARPPU), even if the numbers of users  decrease, the promotional offers in group b successfully achieved this objective.