# **AB Testing with Python**
![abtest](https://www.ideasoft.com.tr/wp-content/uploads/2018/01/facebook-ab-testi-1-1024x536.jpg)

Image Source [http://www.ideasoft.com.tr/facebook-ab-testi/](http://www.ideasoft.com.tr/facebook-ab-testi/)

A gaming company has long decided to make changes to the interface of a mobile game on the market.
As a result of the change, the user wants to measure their experience. It analyzes the conversion rates of its new interface with A/B test.

One of their customers has decided to test this new feature and wants to do an A/B test to see if new interface converts more than existing.

In this A/B test, the client randomly splits its audience into two equally sized groups, e.g. the test and the control group. It is the current interface control group in the analysis. The new interface that has been modified will be the test group.

**Data Understanding**

Impression – Number of ad views

Click – Specifies the number of clicks on the ad that appears.

Purchase – Specifies the number of products purchased after clicked ads.

Earning – Gain after purchased products




In [1]:
# # loading necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, pearsonr, spearmanr, kendalltau, \
    f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [2]:
!pip install openpyxl

Collecting openpyxl
  Downloading openpyxl-3.0.7-py2.py3-none-any.whl (243 kB)
[K     |████████████████████████████████| 243 kB 1.1 MB/s 
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.7


In [3]:
# Reading the data
Control_df = pd.read_excel("../input/ab-testing/ab_testing.xlsx", sheet_name="Control Group", usecols="A:D")
Test_df = pd.read_excel("../input/ab-testing/ab_testing.xlsx", sheet_name="Test Group", usecols="A:D")

In [4]:
print(Control_df.head())
print(Test_df.head())

    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752


In [5]:
print(Control_df.describe().T)
print(Test_df.describe().T)

              count         mean         std         min         25%  \
Impression 40.00000 101711.44907 20302.15786 45475.94296 85726.69035   
Click      40.00000   5100.65737  1329.98550  2189.75316  4124.30413   
Purchase   40.00000    550.89406   134.10820   267.02894   470.09553   
Earning    40.00000   1908.56830   302.91778  1253.98952  1685.84720   

                   50%          75%          max  
Impression 99790.70108 115212.81654 147539.33633  
Click       5001.22060   5923.80360   7959.12507  
Purchase     531.20631    637.95709    801.79502  
Earning     1975.16052   2119.80278   2497.29522  
              count         mean         std         min          25%  \
Impression 40.00000 120512.41176 18807.44871 79033.83492 112691.97077   
Click      40.00000   3967.54976   923.09507  1836.62986   3376.81902   
Purchase   40.00000    582.10610   161.15251   311.62952    444.62683   
Earning    40.00000   2514.89073   282.73085  1939.61124   2280.53743   

                  

In [6]:
#checking for missing values
Control_df.isnull().sum()
Test_df.isnull().sum()

Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64

In [7]:
# Focus purchase averages
control = pd.DataFrame(Control_df["Purchase"])
test = pd.DataFrame(Test_df["Purchase"])

In [8]:
# Purchases in the control and test group are combined.
ct = pd.concat([control, test], axis=1)
ct.columns = ["Purchase_c", "Purchase_t"]
ct.head()

Unnamed: 0,Purchase_c,Purchase_t
0,665.21125,702.16035
1,315.08489,834.05429
2,458.08374,422.93426
3,487.09077,429.03353
4,441.03405,749.86044


Control Group has 550 purchases in average and test group has 582 purchases in average, meaning Test group has more purchases in average. However, we need to check if this is a significant difference.

In [9]:
print(ct["Purchase_c"].mean())

print(ct["Purchase_t"].mean())

550.8940587702316
582.1060966484677


* Group A: Existing method:   Control Group
* Group B: New method:   Test Group

Hypothesis Testing

Is there a statistically significant difference between control and test group product purchase averages?

We have independent and paired sample groups, we can use T-testing for Hypothesis test.
A t-test is a statistic method used to determine if there is a significant difference between the means of two groups based on a sample of data.

The common assumptions made when doing a t-test include normality of data distribution and equality of variance in standard deviation.

* Normality of data distribution
* Equality of variances

**Normality of data distribution**

Shapiro test is applied for checking the first assumption. 

H0 : The population from which the sample comes from dissipates normally.

H1 : The population from which the sample comes from does not dissipate normally.

In [10]:
test_stat, pvalue = shapiro(ct["Purchase_c"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [11]:
test_stat, pvalue = shapiro(ct["Purchase_t"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


p-value > 0.05 → Fail to Reject H0
As a result of Shapiro test, it is seen that the normality assumption has been satisfied.

**Equality of variances**

Levene test is applied for checking the first assumption. 

H0: The variances are homogeneous.

H1: The variances are non-homogeneous

In [12]:
test_stat, pvalue = levene(ct["Purchase_c"], ct["Purchase_t"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


p-value > 0.05 → Fail to Reject H0 

The hypothesis that "variances are homogeneous" cannot be rejected. So variances can be considered homogeneous.

As the normality and equal variance of distributions assumptions are satisfied, we can use Independent Two-Sample T-Test for testing the hypothesis.

**Independent Two-Sample T-Test**

H0: The product purchasing averages of the control group and the test group are equal to each other. (μ1=μ2)

H1: The product purchasing averages of the control group and the test group are not equal. (μ1 ≠ μ2)

In [13]:
est_stat, pvalue = ttest_ind(ct["Purchase_c"], ct["Purchase_t"], equal_var=True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.3493


p-value > 0.05 → Fail to Reject H0

Result

The hypothesis that the averages of the control group and the test group to purchase products are equal cannot be rejected. The initial comment was by chance.

There is no statistically significant difference between control and test group product purchase averages.

**Analyzing Other Metrics by Hypothesis Testing**

CTR (Click through rate)  and Conversion Rate metrics are proportions. Obtained by the other part of one value. In two independent groups, independent 2 sample proportion tests are used when comparing rates.

**CTR**

Click-through rate (CTR) is a metric, shown as a percentage, that measures how many people clicked your ad to visit a website or landing page.

To calculate the click-through rate on a paid ad, divide the total number of clicks on the ad by the total number of impressions (i.e. the total number of people who saw the ad).


In [14]:
ctr_control = Control_df["Click"].sum() / Control_df["Impression"].sum()
ctr_test = Test_df["Click"].sum() / Test_df["Impression"].sum()

In [15]:
print(ctr_control)
print(ctr_test)

0.05014831092596444
0.032922333085396625


When looking at the click rates, the values of the control group are higher than the test group. The click-to-click rate of the current system looks better. It should be tested whether this situation contains statistical signiability.

Test to Be Applied: Independent 2 Sample Proportion Tests 

The assumption has been made. (n≥ 30)


H0: There is no statistically significant difference between control and test group CTR rates.

H1: There is a statistically significant difference between control and test group CTR rates.

In [16]:
clicks = np.array([Control_df["Click"].sum(), Test_df["Click"].sum()])
impressions = np.array([Control_df["Impression"].sum(), Test_df["Impression"].sum()])

test_stat, pvalue = proportions_ztest(count=clicks, nobs=impressions, alternative="two-sided")
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 129.3305, p-value = 0.0000


As a result, the p-value found smaller than 0.05 meaning that we reject the H0 hypothesis.
When bidding methods are examined, there is a statistically significant difference between CTR rates.
And this difference is in favor of the control group.

**CR (Conversion rate)**

Conversion rate is a metric, shown as a percentage that displays how many website or app visitors complete an action out of the total number of visitors.
To calculate the Conversion Rate, you’ll divide the total number of visitors to your website or landing page by the number of completed goals.

Similarly, if I want to calculate how many website visitors convert into paying customers, the conversion rate formula will look like this:

* Number of Actions / Total Traffic to Site = Conversion Rate

In [17]:
cr_control = Control_df["Purchase"].sum() / Control_df["Impression"].sum()
cr_test = Test_df["Purchase"].sum() / Test_df["Impression"].sum()

In [18]:
print(cr_control)
print(cr_test)

0.00541624432470298
0.004830258461839089


At first glance, we see that the conversion rate is in favor of the control group. So when we say how many of the viewers have bought it, the current rate is better.

But does this make a statistically significant difference? Or was it just a coincidence?

Test to Be Applied: Independent 2 Sample Proportion Tests

The assumption has been made. (n≥ 30)

H0: There is no statistically significant difference between control and test group CR values.

H1: There is a statistically significant difference between control and test group CR values.

In [19]:
purchase = np.array([Control_df["Purchase"].sum(), Test_df["Purchase"].sum()])
impression = np.array([Control_df["Impression"].sum(), Test_df["Impression"].sum()])

test_stat, pvalue = proportions_ztest(count=purchase, nobs=impression, alternative="two-sided")
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 12.2212, p-value = 0.0000


Result: p value < 0.05 → Fail to reject H0

We reject H0 meaning that there is statistically meaningful difference between two groups values and the difference is in favor of the exisitng method.

# **Conclusion**

There was no significant difference between the control group and the test group product purchase averages.

Compared to the control group and the test group CTR ratios, the situation was found to be in favor of the current situation.

Compared to the control group and test group conversion rates, the situation was found to be in favor of the current situation


**Customer Recommendations**

As a result, it doesn’t bring more conversions than existing interface. I recommend to the client to continue with existing.

In addition, the test can be repeated with more data to be obtained in the future.

You can find more resources on the subject on [Veri Bilimi Okulu Pages](http://www.veribilimiokulu.com/istatistiksel-a-b-testleri-nasil-yapilir/)