# A/B Testing
https://robbiegeoghegan.medium.com/implementing-a-b-tests-in-python-514e9eb5b3a1

A/B testing is a general methodology used online when testing product changes and new features. You take two sets of users, show one set of users the changed product (experiment group) and show the second set of users the original product or set of features (control group). You then compare the two groups to determine which version of the product is better.

A/B Testing is important because it:

    Removes the need for guessing or relying on intuition
    Provides accurate answers quickly
    Allows for rapid iteration on ideas
    Establishes causal relationships — not just correlations

z-tests are a statistical way of testing a hypothesis when either:

    We know the population variance, or
    We do not know the population variance but our sample size is large n ≥ 30

If we have a sample size of less than 30 and do not know the population variance, then we must use a t-test.

t-tests are a statistical way of testing a hypothesis when:

    We do not know the population variance
    Our sample size is small, n < 30


In [96]:
from IPython.display import display, Image, SVG, Math, YouTubeVideo
Image(url ='https://www.stefanjaspers.com/wp-content/uploads/2020/07/hypothesistest2.jpg', width=500, height=500)

In [97]:
# Packages imports
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import *

%matplotlib inline

# Some plot styling preferences
plt.style.use('seaborn-whitegrid')

In [98]:
url="https://raw.githubusercontent.com/ozlerhakan/ab-test/master/ab_data.csv"

In [99]:
# load data into a dataframe
df_ab= pd.read_csv(url,  index_col= None, na_values='?')
df_ab.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [100]:
df_ab.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294478 non-null  int64 
 1   timestamp     294478 non-null  object
 2   group         294478 non-null  object
 3   landing_page  294478 non-null  object
 4   converted     294478 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [101]:
#Check how many duplicated users exist

session_counts = df_ab['user_id'].value_counts(ascending=False)
multi_users = session_counts[session_counts > 1].count()


print("Total records = " ,df_ab["user_id"].count())
print("Number of unique users = ",df_ab["user_id"].nunique())

print(f'There are {multi_users} users that appear multiple times in the dataset')

Total records =  294478
Number of unique users =  290584
There are 3894 users that appear multiple times in the dataset


In [102]:
#drop duplicated users
users_to_drop = session_counts[session_counts > 1].index

df = df_ab[~df_ab['user_id'].isin(users_to_drop)]
print(f'The updated dataset now has {df.shape[0]} entries')

The updated dataset now has 286690 entries


In [103]:
# To make sure all the control group are seeing the old page and viceversa

pd.crosstab(index=[df['converted'],df_ab['group']], columns=df['landing_page'])


Unnamed: 0_level_0,landing_page,new_page,old_page
converted,group,Unnamed: 2_level_1,Unnamed: 3_level_1
0,control,0,126073
0,treatment,126372,0
1,control,0,17220
1,treatment,17025,0


In [104]:
#Show the % split between users who saw new vs old page
#Calculate pooled probability
mask = (df["group"] == "control")
conversions_control = df["converted"][mask].sum()
total_users_control = df["converted"][mask].count()

mask = (df["group"] == "treatment")
conversions_treatment = df["converted"][mask].sum()
total_users_treatment = df["converted"][mask].count()

#count number of users who converted in each group
print("Number of control users who converted on old page: ", conversions_control)
print("Percentage of control users who converted: ", round((conversions_control / total_users_control) * 100, 2), "%")
print("Number of treatment users who converted on new page: ", conversions_treatment)
print("Percentage of treatment users who converted: ", round((conversions_treatment/ total_users_treatment) * 100, 2), "%")

Number of control users who converted on old page:  17220
Percentage of control users who converted:  12.02 %
Number of treatment users who converted on new page:  17025
Percentage of treatment users who converted:  11.87 %


https://medium.com/@RenatoFillinich/ab-testing-with-python-e5964dd66143

Let’s imagine you work on the product team at a medium-sized online e-commerce business. The UX designer worked really hard on a new version of the product page, with the hope that it will lead to a higher conversion rate. The product manager (PM) told you that the current conversion rate is about 13% on average throughout the year, and that the team would be happy with an increase of 2%, meaning that the new design will be considered a success if it raises the conversion rate to 15%.

In [105]:
from IPython.display import display, Image, SVG, Math, YouTubeVideo
Image(url ='https://www.stefanjaspers.com/wp-content/uploads/2020/07/hypothesistest.jpg', width=500, height=500)

The alternative hypothes (H0): The new page is worse, or only as good, as the old
p - pₒ = 0   (no difference between two proportions)

The alternative hypothes (H1): The new page is better than the old.
p -pₒ != 0    (a statistical difference between two proportions)

where p and pₒ stand for the conversion rate of the new and old design, respectively. We’ll also set a confidence level of 95%:

In [106]:
#Check what sample size is required
baseline_rate = conversions_control / total_users_control  # less than 13%
target_rate = 0.15  # 15%
confidence_level = 0.05 #user defined, for a 95% confidence interval
sensitivity = 0.8 #user defined

effect_size = sms.proportion_effectsize(baseline_rate, target_rate)
sample_size = sms.NormalIndPower().solve_power(
    effect_size = effect_size, 
    power = sensitivity, 
    alpha = confidence_level, 
    ratio=1)
print("Required sample size: ", round(sample_size), " per group")

Required sample size:  2056  per group


In [107]:
control_sample = df[df['group'] == 'control'].sample(n=round(sample_size), random_state=22)
treatment_sample = df[df['group'] == 'treatment'].sample(n=round(sample_size), random_state=22)

ab_test = pd.concat([control_sample, treatment_sample], axis=0)
ab_test.reset_index(drop=True, inplace=True)

In [108]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

control_results = ab_test[ab_test['group'] == 'control']['converted']
treatment_results = ab_test[ab_test['group'] == 'treatment']['converted']

n_con = control_results.count()
n_treat = treatment_results.count()
successes = [control_results.sum(), treatment_results.sum()]
nobs = [n_con, n_treat]

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)

if pval <0.05:
  print("we reject null hypothesis")
else:
  print("we accept (do not reject) null hypothesis")

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')

we accept (do not reject) null hypothesis
z statistic: 0.53
p-value: 0.597
ci 95% for control group: [0.108, 0.136]
ci 95% for treatment group: [0.103, 0.131]


#### new design did not perform significantly different (let alone better) than old one