# A/B TESTS in Python

Prerequisites 
1. Python 3
2. Statistics

![AB Test](./ABTestPython.png)

## What is an A/B Test?

A/B Testing is a decision making support and methodology which allows you to measure the impact of a potential change in a product based on the sample of your data.

### Characteristics of an A/B Test 
1. It is based on a randomized set of users. Randomness prevents bias between the groups.

2. It is a controlled deliberate experiment. We have a control group and a treatment group. There can be more than two groups. If a company has added features to an app, The control group is exposed to the original app without changes while the treatment group is shown the updated app features.

3. An A/B test has a defined hypothesis  and the ability to measure success.

## Problem statement
MwangiKaranja Co. is a mid-sized e-commerce company that deals in the sale of the best footwear brands in the world: Nike, Adidas, Puma, Converse, New Balance, Reebok, Vans, Under Armour, etc. The marketing team feels like the landing home page needs a revamp to increase the conversion rates of customers visiting the website. The UI/UX team is tasked with a project to give the landing page an uplift. The page will have high quality 4K images of both pretty models and shoes. The website is also expected to be super responsive to clicks with smooth animated transitions. The baseline conversion rate is 13%. The new design will be considered a success if it raises the conversion rate to 15%

![AB Test2](./ABTestfig2.jpg)

### Steps to conduct a successful A/B Test

1. Define your clear and measurable goal. Improve the conversion rate from 13% to 15%.
2. Choose a relevant hypothesis metric. The hypothesis is that revamping the landing page will lead to a higher conversion rate. Conversion rate is relevant metric. It can be measured and it directly contributes to the company revenue.
3. Select your variables. I will have two groups. A control group and a treatment group. The control group will be exposed to the old landing page. The treatment group will be exposed to the new revamped landing page. 
4. Determine the sample size. The sample size determines the statistical power of your test. Calculate an appropriate sample size that ensures the results are statistically significant.
5. Randomize your sample. Ensure there is no bias the people in the two groups. Let it be completely random.
6. Determine how much time you will run your tests.
7. Analyze the results of the A/B test. Determine whether the results are significant and calculate the confidence levels of the results.
8. Draw conclusions and take action. 

### statistical terms

1. **Hypothesis**: This is an assumption or statements about a probability distribution which may either be true or false. The hypothesis in this case is that revamping the landing page will have an effect on the conversion rate. There are two types of hypothesis.

    * __Null hypothesis__ HO-  One which is formulated to show there is no significant difference between two variables. It is the default assumption in a hypothesis test and is used to determine the alternative hypothesis should be accepted or rejected. 
    * __Alternative hypothesis__ - It shows there is a given difference between two variables. 

2. **Sample** - A fraction of the total population. Instead of experimenting on all the customers who visit the landing page, a certain number is chosen.

3. **Statistical significance** - The likelyhood that an observed difference or relationship between two variables is not due to chance. When a result is statistically significant, it indicates the observed difference or relationship between two variables is unlikely to have occurred by random chance and is likely to be a real effect.

4. **Error** - Errors when interpreting reults of a test. There is Type I and Type II error
    * _Type I error_: (false positive) occurs when a null hypothesis is rejected when it is actually true. It occurs when we conclude there is a significant difference between two groups while infact there isn't. The probability of making a type I error is denoted by alpha Î± is typically between 0.01 to 0.05, to minimize the risk of making such an error.

    * _Type II error_: (false negative) occurs when a null hypothesis is accepted when it is actually false. It occurs when we fail to detect a significant difference between two groups, when infact there is such a difference. The probability of making a type II error is denoted by beta &beta;

5. **Confidence level** - Expressed as a percentage eg 95%, 99%. This ranges from industry to industry. For instance, in medicine, they choose high levels. It means that whatever conversion rate we observe for the new design, we want to be 95% confident it is statistically different from the conversion rate of the old design., before we reject the null hypothesis. (1-&alpha;)


6. alpha &alpha; - The probability of rejecting a null hypothesis when it is actually true. Typically a small value 0.05 or 0.01 indicating the maximum acceptable probability of making a Type I error (rejecting the null hypothesis when it is true)

7. beta &beta; - The probability of accepting a null hypothesis when it is actually false. Beta represents the risk of failing to detect a true effect or relationship . Consequences of a type II error are more severe than type I error. In medicine, Type II error could lead to failing to detect a treatment that is actually effective resulting in harm to patients. Usually, &beta; is 10-20% for digital experiments.

8. Power - The probability of rejecting the null hypothesis when it is false. The ability of a statistical test to detect a true effect or difference between groups and to avoid a Type II error. Power is expressed as a probability ranging from 0 to 1. A higher power indicates a greater ability to detect a true effect while a lower power indicates a high risk of type II error. Power of 80% or 0.80 is considered desirable. (1-&beta;)

9. **Effect size** - a measure of the strength of magnitude between two groups or variables. It provides information about the practical significance of the results. Is the difference big enough to be important in real life, not just by chance. The effect size is a standardized measure of the difference between the two proportions, which can be useful for comparing the effect of different interventions or treatments.

10. **p-value** - A statistical measure that helps determine whether the observed results of a study are statistically significant or not. If the P-value is less than a predetermined level of significance, typically 0.05 then the observed results are considered statistically significant. This means the probability of obtaining the observed results or more extreme results by chance alone, assuming the null hypothesis is true, is less than 5%. In this case, researchers reject the null hypothesis and conclude that there is evidence of a significant difference between groups. If the P-value is greater than the level of significance being studied, then the observed values are not statistically significant. This means the probability of obtaining the observed results or more extreme results by chance alone, assuming the null hypothesis is true, is greater than the predetermined level of significance. Researchers fail to reject the null hypothesis and conclude that there is no evidence of a significant difference between groups being studied. 


|NULL HYPOTHESIS| Null Hypothesis is True| Null hypothesis is False|
|---------------|------------------------|-------------------------|
|We do not reject the null hypothesis |<span style="color:green">Correct </span>True Negative. Probability = (1-&alpha;) | <span style=" color:red">Type II Error False Negative. Probability = &beta;</span> |
|We reject the null hypothesis | <span style="color:red">Type I Error False Positive. Probability = &alpha;</span> | <span style="color:green">Correct </span>True Positive. Probability = (1-&beta;) |

### Project procedure

1. Design the hypothesis
2. Get Dataset. Prepare and Inspect
3. Run and visualize the test
4. Test the hypothesis
5. Draw conclusions 

#### 1. Design the hypothesis
In this project:

* &alpha &alpha; = 0.05

* confidence level (1-&alpha;) = 0.95 or 95%

* power (1-&beta;) = 0.8 or 80%

* baseline value = 0.13 or 13% conversion rate

* target value = 0.15 or 15% conversion rate

* sample size - (unknown) To be calculated



In [None]:
# package imports
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil

#### 2. Get the dataset

Description:

The data has 294478 rows. The groups are two: control & treatment. The control group is shown the old page while the treatment group is shown the new page. The landing pages are two: old page & new page. The conversion is either there 1 or not there 0. Ater customers visited the websited, if they made a purchase, the conversion is recoreded as 1.

In [None]:
df = pd.read_csv('ab_data.csv')

In [None]:
# view attributes
print(df.shape, '\n')
print(df.info(), '\n')
df.head()

In [None]:
# uniqueness of the variables 
print(f"Unique groups {df['group'].unique()}", '\n')
print(f"Unique landing pages {df['landing_page'].unique()}", '\n')
print(f"Converted? No/Yes {df['converted'].unique()}", '\n')


#### 3. Run and visualize the A/B test

In [None]:
# calculate effect size based on  expected rates
effect_size = sms.proportion_effectsize(0.13,0.15)
print(f"The Effect size is {effect_size}")

sample_number = sms.NormalIndPower().solve_power(
    effect_size,
    power = 0.8,
    alpha = 0.05,
    ratio = 1
)

sample_number = ceil(sample_number)
print(f"The Sample size for the A/B test is {sample_number}")

View a contingency table to ascertain that both groups have a chance to view the old and new page. Show the frequency distribution of two categorical variables, landing page & group. Each cell in the contingency table represents the number of observations that fall into the corresponding combination of categories. 

In [None]:
# To make sure all the control group are seeing the old page and vice versa
pd.crosstab(df['group'], df['landing_page'])

Get rid of users that appear more than once while preparing the data by taking only user ids with one count.

In [None]:
# sample the data. Ensure no users have beeen sampled multiple times
session_counts = df['user_id'].value_counts(ascending=False)
multi_users = session_counts[session_counts > 1].count()
print(f'There are {multi_users} users that appear multiple times in the dataset')

users_to_drop = session_counts[session_counts > 1].index 
users_to_drop

df = df[~df['user_id'].isin(users_to_drop)]
print(f'The updated dataset now has {df.shape[0]} entries')


#### Sampling
The data is now ready for sampling. We should have 4720 entries in each group. This means that the data we will conduct the A/B test on will have 9440 entries. 

In [None]:
control_sample = (df
                  .query('group == "control"')
                  .sample(n=sample_number, random_state=22)
                  )

treatment_sample = (df
                    .query('group == "treatment"')
                    .sample(n=sample_number, random_state=22)
                    )

# concat the two samples.
ab_test = pd.concat([control_sample, treatment_sample], axis=0)
ab_test.reset_index(drop=True, inplace=True)
ab_test

In [None]:
# confirm the groups have equal numbers. 
ab_test['group'].value_counts()

#### Run the test

In [None]:
conversion_rates = ab_test.groupby('group')['converted']

std_p = lambda x: np.std(x, ddof=0) # std.deviation of the proportion ddof = degrees of freedom.
se_p = lambda x: stats.sem(x, ddof=0) # std.error of the proportion (std / sqrt(n))

conversion_rates = conversion_rates.agg([np.mean, std_p, se_p])
conversion_rates.columns = ['conversion_rates', 'std_deviation', 'std_error']

conversion_rates

#### Visualize results

In [None]:
# Some plot styling preferences
plt.style.use('seaborn-whitegrid')
font = {'family' : 'DejaVu Sans',
        'weight' : 'bold',
        'size'   : 14}

mpl.rc('font', **font)


plt.figure(figsize=(6,4))
sns.barplot(x=ab_test['group'], y=ab_test['converted'], ci=False)
plt.ylim(0,0.17)
plt.title('Conversion rate by group', pad=20)
plt.xlabel('Group', labelpad=15)
plt.ylabel('Converted (proportion)', labelpad=15);

Our two designs performed very similarly, with our new design performing slightly better, approx. 12.3% for the old website vs. 12.6% conversion rate for the new website design. 

#### 4. Test the Hypothesis

Note that both are below the baseline conversion rate of 13%. The treatment's value is higher than the control. Is the difference statistically significant?

We have a large sample. >30 We can calculate the p-value using a z-test. 

In [None]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

control_results = ab_test[ab_test['group'] == 'control']['converted']
treatment_results = ab_test[ab_test['group'] == 'treatment']['converted']

n_con = control_results.count() #4720
n_treat = treatment_results.count() #4720

successes = [control_results.sum(), treatment_results.sum()]
nobs = [n_con, n_treat]

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes,nobs=nobs, alpha=0.05)


print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')

#### 5. Draw conclusions

Our p-value is 0.732. This is higher than &alpha; = 0.05. This means we cannot reject the Null Hypothesis. Our new design did not perform significantly better than the old design. It was not even better. It had a 12.5% conversion rate. 

Looking at the confidence level of the treatment group, '11.6% to 13.5%' we note that 

    a) The baseline value is included.
    b) The target value of 15% is not included. 

This means the new design conversion rate is similar to the baseline and not close to the 15% target. The new design is not an improvement and we need to go back and come up with new ideas!