# A/B test analysis

A company selling a single product through their online application has conducted an A/B pricing test for that product. The traffic for the test was split 50%/50% between variant A/B. The price points are 4.99 (control) and 5.99 (test variant). Conduct an analysis of this experiment and provide a recommendation for choosing either of the two variants.

## Statistical assumptions

<b> H0 - Null Hypothesis: </b> two conversion rates - for the control version and the new tested version - are equal. <br/>
<b> H1 - Alternative Hypothesis: </b> two conversion rates - for the control version and the new tested version - are different. <br/>
Intuitively - there is a risk that after increasing the price, the conversion may be lower.

<b> Statistical Significance: </b> 95% <br/>
<b> Significance Level (alfa): </b> 0.05 -> 0.05 (5%) as the Significance Level of the A/B test means that there is 5% chance of the Type I Error (= rejecting the Null Hypothesis while the Alternative Hypothesis is false). <br/> <br/>
    
<b> Baseline Conversion Rate: </b> 25% <br/>
<b> Sample Size per Variation: </b> 50,000 <br/>
So - <b> Minimum Detectable Effect: </b>  3-4% <br/>
<b> Statistical Power (beta): </b> 0.8 - 1 -> 0.8 (80%) as the Statistical Power of the A/B test means that there is 20% chance of the Type II Error (= not rejecting the Null Hypothesis while the Alternative Hypothesis is true) <br/> <br/>

According to the following calculators: <br/>
https://www.optimizely.com/sample-size-calculator/?conversion=25&effect=4&significance=95 <br/>
https://www.abtasty.com/sample-size-calculator/ <br/> <br/>
I assume that <b> the duration </b>of the test was adequately determined and that there was no special effects that could come from <b> seasonality or holiday period </b> (Black Friday, Cyber Mondays etc.) or <b> huge marketing promotions and campaigns </b> on the website.

<img src="https://www.invespcro.com/blog/images/blog-images/f9.jpg" alt="statistics">

## Load Python libraries

In [3]:
import pandas as pd
import numpy as np
from scipy.stats import norm
from math import sqrt
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st

## Load input file

In [48]:
df = pd.read_csv("pricing_test_1.csv") 

## Basic quality checks and preprocessing

In [49]:
# Leave only columns that are needed (remove index column)
df = df[['userId', 'variant', 'price', 'converted']]

In [50]:
# Look at the first 5 rows in the dataset
df.head()

Unnamed: 0,userId,variant,price,converted
0,876112.0,A,4.99,1
1,876114.0,A,5.99,0
2,876116.0,A,5.99,0
3,876117.0,A,4.99,1
4,876118.0,A,4.99,0


In [51]:
print(f"The dataset contains {df.shape[0]} rows and {df.shape[1]} columns")

Dataset contains 100000 rows and 4 columns


In [52]:
# Look at the basic info about the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   userId     100000 non-null  float64
 1   variant    100000 non-null  object 
 2   price      100000 non-null  float64
 3   converted  100000 non-null  int64  
dtypes: float64(2), int64(1), object(1)
memory usage: 3.1+ MB


<b> Observations: </b> <br/>
-> Non-Null Count in each column = Number of entries -> no missing values <br/>
-> Dtype of userId -> should be integer, not float <br/>
-> Dtype of other columns -> correct

In [53]:
# Convert Dtype of the userId column

df['userId'] = df['userId'].apply(np.int64)
print(f"Dtype of userId column after conversion: {df['userId'].dtypes}")

Dtype of userId column after conversion: int64


In [54]:
# Check if there are any duplicates in the dataset

print(f"Number of duplicated entries within the dataset: {df.loc[df.duplicated()].shape[0]}")
#df.drop_duplicates(inplace=True)

Number of duplicated entries within the dataset: 0


## Variants cleaning

There is an inconsistency in the data here. There is no clear connection between variant and price. <br/><br/>
<b> Assumptions: </b> <br/>
-> variant A should always has assigned price = 4.99, <br/>
-> variant B should always has assigned price = 5.99. 

In [55]:
df.tail()

Unnamed: 0,userId,variant,price,converted
99995,976104,B,5.99,1
99996,976106,B,4.99,0
99997,976108,B,5.99,0
99998,976109,B,4.99,1
99999,976111,B,5.99,1


In [10]:
# Create a new column with clear assignment of variants

df['variant_cleaned'] = np.where(df['price'] == 4.99, 'A', 'B')

In [11]:
# Create separate DataFrames for each variant

df_a = df.loc[df['variant_cleaned'] == 'A']
df_b = df.loc[df['variant_cleaned'] == 'B']

In [15]:
# Re-check if df_a DataFrame contains only 4.99 prices

df_a['price'].value_counts()

4.99    49926
Name: price, dtype: int64

In [16]:
# Re-check if df_b DataFrame contains only 5.99 prices

df_b['price'].value_counts()

5.99    50074
Name: price, dtype: int64

In [58]:
variant_a_perc = round(df_a.shape[0] / df.shape[0] * 100, 2)
print(f"Variant A - percentage of the whole sample: {variant_a_perc}")

Variant A - percentage of the whole sample: 49.93


In [59]:
variant_b_perc = round(df_b.shape[0] / df.shape[0] * 100, 2)
print(f"Variant B - percentage of the whole sample: {variant_b_perc}")

Variant B - percentage of the whole sample: 50.07


## Conversions

In [81]:
# Conversion before the price increase (price = 4.99)

df_a['converted'].value_counts()

0    37201
1    12725
Name: converted, dtype: int64

In [82]:
# Conversion after the price increase (price = 5.99)

df_b['converted'].value_counts()

0    37253
1    12821
Name: converted, dtype: int64

In [79]:
variant_a_conversion = round(df_a.loc[df_a['converted'] == 1].shape[0] / df_a['converted'].shape[0] * 100, 4)

variant_b_conversion = round(df_b.loc[df_b['converted'] == 1].shape[0] / df_b['converted'].shape[0] * 100, 4)

print(f"Variant A - conversion [%]: {variant_a_conversion}")
print(f"Variant B - conversion [%]: {variant_b_conversion}")

Variant A - conversion [%]: 25.4877
Variant B - conversion [%]: 25.6041


In [84]:
relative_conversion_change = round(100 - (variant_a_conversion * 100) / variant_b_conversion, 4)

print(f"Relative change in conversion [%]: {relative_conversion_change}")

Relative change in conversion [%]: 0.4546


<b> MDE = 3-4% </b> <br/>
MDE is essentially the sensitivity of a test. In other words, it is the smallest relative change in conversion rate we can detect. <br/> <br/>
For example, if our baseline conversion rate was 25%, and we set an MDE of 4%, our test would detect any changes that move the conversion rate outside the absolute range of 24% to 26% (a 4% relative effect is a 1% absolute change in conversion rate in this example). <br/> <br/>

In the results -> Relative change in conversion: 0.4546% < 4% <br/>


In [119]:
# Population
n_control = 49926
n_variant = 50074

# Conversion rate
crv_control = 0.254877
crv_variant = 0.256041

# Variance
var_control = crv_control * (1-crv_control) 
var_variant = crv_variant * (1-crv_variant)

conversions_control = crv_control * n_control
conversions_variant = crv_variant * n_variant

print('N - Control: {:0.0f} , Variant: {:0.0f}'.format(n_control, n_variant))
print('CRV - Control: {:0.4f} , Variant: {:0.4f}'.format(crv_control, crv_variant))
print('Conversions -  Control: {:0.0f} , Variant: {:0.0f}'.format(conversions_control, conversions_variant))
print('Var -  Control: {:0.4f} , Variant: {:0.4f}'.format(var_control, var_variant))

# Create combined random variable S
mean_control = crv_control
mean_variant = crv_variant
S_mean = mean_variant - mean_control
S_var = (var_control/n_control) + (var_variant/n_variant)

print('------------')
Z_score = S_mean / np.sqrt(S_var)
print('Z-score: {:0.4f}'.format(Z_score))
print('------------')
p_value_1_tail = 1-st.norm.cdf(0.4220)
p_value_2_tail = p_value_1_tail * 2

print('p-value - two-tailed test: {:0.6f}'.format(p_value_2_tail))

N - Control: 49926 , Variant: 50074
CRV - Control: 0.2549 , Variant: 0.2560
Conversions -  Control: 12725 , Variant: 12821
Var -  Control: 0.1899 , Variant: 0.1905
------------
Z-score: 0.4220
------------
p-value - two-tailed test: 0.673025


<b> p-value - one-sided test = 0.336049 </b> <br/>
<b> p-value - two-sided test: 0.673025 </b> <br/>
<b> p-value > 0.05 </b> <br/>
No statistically significant difference was found. <br/>
It indicates the Null Hypothesis is very likely. <br/>

In this case, we have no reason to reject the null hypothesis, so we cannot say (with 95% certainty) that version B will change the conversion. It should be emphasized here that we have no grounds to reject the null hypothesis, and we do not accept the null hypothesis.<br/>
I would assume that the price increase (from 4.99 to 5.99) should not have a negative impact on the conversion.

By the way, if the priority was to keep revenue at a similar level, by increasing the price, we can afford a slight decrease in conversion (in our case, up to a maximum of 16%).

In [96]:
# Count relative price increase

relative_price_change = round(1 - (4.99/5.99), 4)
relative_price_change

0.1669

In [104]:
# Number of converted users - assumption for testing

users_converted = 20000

In [102]:
# Control (base) revenue based on old price 4.99

base_revenue = users_converted * 4.99
base_revenue

99800.0

In [105]:
# Target revenue based on decreased conversion and new price 5.99

target_revenue = (users_converted - (users_converted * relative_price_change)) * 5.99
target_revenue

99805.38

https://www.analytics-toolkit.com/tools/stats/statistical-significance.php?testType=0&marginRange=0&margin=0&variants=1&metricType=2&baselineFile=&MAX_FILE_SIZE=31457280&ssc0=49926&conv0=12725&cr0=25.487721828305894&ssm0=&mean0=&sd0=&ssc1=50074&conv1=12821&cr1=25.604105923233618&ssm1=&mean1=&sd1=&ssc2=&conv2=&cr2=&ssm2=&mean2=&sd2=&ssc3=&conv3=&cr3=&ssm3=&mean3=&sd3=&ssc4=&conv4=&cr4=&ssm4=&mean4=&sd4=&ssc5=&conv5=&cr5=&ssm5=&mean5=&sd5=&ssc6=&conv6=&cr6=&ssm6=&mean6=&sd6=&confidence=95

<b> Sources: </b> <br/>
https://www.invespcro.com/blog/how-to-analyze-a-b-test-results/ <br/>
https://www.invespcro.com/blog/calculating-sample-size-for-an-ab-test/ <br/>
https://www.conversion.pl/blog/testy-ab/ <br/>
https://analyticsmayhem.com/digital-analytics/statistical-significance-ab-testing/