![AB Testing](https://blog.thomasnet.com/hs-fs/hubfs/AB%20Testing.png?width=900&name=AB%20Testing.png)

# WHAT IS THE AB TESTING ?

A/B testing, also known as split testing, is a method of comparing two variants(A and B) to determine which one performs better. In A/B testing, two variants (A and B) are created with one element of the design or content being different between the two variants. The two versions are then shown randomly to users, and their behavior is monitored and compared to determine which variant performs better based on a predefined goal, such as the conversion rate, click-through rate, or engagement rate.

A/B testing is commonly used in website optimization, digital marketing, and product development to determine which variant is more effective at achieving a specific goal. It helps businesses make data-driven decisions by providing insight into what resonates better with their audience and what changes can be made to improve the user experience and increase conversions.

# Hypothesis Testing

Hypothesis tests are analytical tools used as a statistical method. These tests aim to test a hypothesis (claim) about a population using sample data. Hypothesis tests help us make inferences about whether a hypothesis is true or not by collecting statistical evidence.

Generally, two hypotheses are formulated: the null hypothesis (H0) and the alternative hypothesis (H1 or HA). The null hypothesis represents the existing condition or the accepted assumptions, while the alternative hypothesis claims a specific change or effect.

Hypothesis tests can be categorized into the following types:

- **One-Sample T Test:** Used to test if the mean of a population is significantly different from a specific value.

- **Independent Two-Sample T Test:** Used to test if there is a significant difference between the means of two independent groups.

- **Paired Two-Sample T Test:** Used to test if there is a significant difference between the means of two related or paired groups.

- **Analysis of Variance (ANOVA):** Used to test if there is a significant difference among the means of three or more groups.

- **Chi-Square Test:** Used to test if there is a dependency or association between two categorical variables.

- **Regression Analysis:** Used to analyze the relationship between dependent and independent variables.

These are just some popular types of hypothesis tests, and there are many more methods available in statistics. Hypothesis tests are used in various fields, ranging from scientific research and marketing strategies to medical studies and industrial quality control processes. These tests enable us to make objective decisions based on data and evaluate statistical significance.

- Hypothesis tests are statistical methods used to test a belief or proposition.

- Within the scope of hypothesis testing, there are group comparisons.

- The main objective in group comparisons is to determine whether observed differences are due to chance or if there is a genuine difference.

**For example:**
- Did the average daily usage time of users increase after a user interface change in a mobile application?

- The result we obtain from this analysis, based on the sample we have, will help us determine if the observed outcome occurred by chance or if there is indeed a significant difference.

- We will strive to understand this through statistical calculations and analysis. We will provide evidence to support our findings.

# A/B Testing (Independent Two-Sample T Test) (Comparing Two Group Means)

- It is used when a comparison between two group averages is desired.
- A/B testing, also known as independent two-sample t-test, is a statistical method used to compare the means of two independent groups.

- In A/B testing, we aim to determine if there is a significant difference between the average values of a particular metric or outcome variable in two distinct groups. This method is commonly employed in various fields such as marketing, product development, and user experience research.

- By dividing participants or subjects into two groups, we expose one group to a certain treatment or condition (group A) and the other group to a different treatment or condition (group B). We then measure the desired outcome or metric of interest for both groups. The independent two-sample t-test allows us to assess whether the observed difference in means between the two groups is statistically significant or if it could have occurred by chance.

- Through A/B testing, we can make data-driven decisions by determining which treatment or condition leads to better outcomes, whether it's an increase in conversion rates, engagement levels, or any other relevant measure.

- Overall, A/B testing provides valuable insights into the effectiveness of different approaches, helping us optimize strategies and make informed choices based on statistical evidence.

# Road Map

1. Set up Hypotheses
2. Assumption Check
  - 2.1. Assumption of Normality
  - 2.2. Variance Homogeneity
3. Implementation of the Hypothesis
  - 1. If the assumptions are met, independent two sample t test (parametric test)
  - 2. If assumptions are not met, mannwhitneyu test (non-parametric test)
4. Interpret results according to p-value

In [1]:
# import Required Libraries

import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu,pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

In [2]:
# Adjusting Row Column Settings

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

# Exercise 1: Is There a Statistically Significant Difference Between The Mean Calculations of Smokers and Non-Smokers?

In [3]:
# Loading the Data Set

df = sns.load_dataset("tips")

In [4]:
# Preliminary examination of the data set

def check_df(dataframe, head=5):
    print('##################### Shape #####################')
    print(dataframe.shape)
    print('##################### Types #####################')
    print(dataframe.dtypes)
    print('##################### Head #####################')
    print(dataframe.head(head))
    print('##################### Tail #####################')
    print(dataframe.tail(head))
    print('##################### NA #####################')
    print(dataframe.isnull().sum())
    print('##################### Quantiles #####################')
    print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df)

##################### Shape #####################
(244, 7)
##################### Types #####################
total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
dtype: object
##################### Head #####################
   total_bill     tip     sex smoker  day    time  size
0    16.99000 1.01000  Female     No  Sun  Dinner     2
1    10.34000 1.66000    Male     No  Sun  Dinner     3
2    21.01000 3.50000    Male     No  Sun  Dinner     3
3    23.68000 3.31000    Male     No  Sun  Dinner     2
4    24.59000 3.61000  Female     No  Sun  Dinner     4
##################### Tail #####################
     total_bill     tip     sex smoker   day    time  size
239    29.03000 5.92000    Male     No   Sat  Dinner     3
240    27.18000 2.00000  Female    Yes   Sat  Dinner     2
241    22.67000 2.00000    Male    Yes   Sat  Dinner     2
242    17.82000 1.75000    Male     No   Sat  

In [5]:
df.groupby("smoker").agg({"total_bill": "mean"})

Unnamed: 0_level_0,total_bill
smoker,Unnamed: 1_level_1
Yes,20.75634
No,19.18828


In [6]:
# There seems to be a mathematical difference between the two groups. But is this difference a chance occurrence or is it statistically significant?

# 1- Establish Hypothesis

In [7]:
# H0: M1 = M2 => There is no difference between the averages of the accounts to be paid by the two groups.
# H1: M1 != M2 => There is

# 2- Assumption Control

In [8]:
# Normality Assumption
# Homogeneity of Variance

# Normality Assumption

In [9]:
# H0: The assumption of normal distribution is satisfied.
# H1: The assumption of normal distribution is not satisfied.

- **The shapiro test** tests whether the distribution of a variable is normally distributed.

In [10]:
test_stat, pvalue = shapiro(df.loc[df["smoker"] == "Yes", "total_bill"])

In [11]:
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9367, p-value = 0.0002


In [12]:
# H0 REJ if p-value < 0.05.
# H0 CANNOT BE REJECTED if p-value < 0.05.

- Since the p-value is less than 0.05, hypothesis H0 is rejected.
- The normality assumption is not satisfied.

In [13]:
test_stat, pvalue = shapiro(df.loc[df["smoker"] == "No", "total_bill"])

In [14]:
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9045, p-value = 0.0000


- Since the p-value is less than 0.05, hypothesis H0 is rejected.
- The normality assumption is not satisfied.
- We cannot use a parametric test because the assumption of a normal distribution is not satisfied. we need to use a non-parametric test.

# If the assumption of normal distribution is satisfied (if H0 cannot be rejected)

# Homogeneity of Variance

In [15]:
# H0: Variances are Homogeneous
# H1: Variances are Not Homogeneous

- Levene test is applied to examine the assumption of homogeneity of variance.

In [16]:
test_stat, pvalue = levene(df.loc[df["smoker"] == "Yes", "total_bill"],
                           df.loc[df["smoker"] == "No", "total_bill"])

In [17]:
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 4.0537, p-value = 0.0452


In [18]:
# H0 RED if p-value < 0.05.
# H0 CANNOT BE REJECTED if p-value < 0.05.

- Since the p-value is less than 0.05, H0 is rejected.
- Variances are not homogeneous.

# 3- Implementation of Article 3 and 4

In [19]:
# 1. Independent two-sample t-test (parametric test) if the assumptions are satisfied.
# 2. Mann-Whitneyu test (non-parametric test) if assumptions are not satisfied.

# Independent two sample t-test (parametric test) if assumptions are satisfied.

- If the normality assumption is met, ttest can be used.

- If the assumption of normality is met and the assumption of homogeneity of variance is met, ttest can be used.

- If the assumption of normality is met and homogeneity of variance is not met, ttest can be used. In this case, equal_var=False if the assumption of variance homogeneity is not satisfied.

In [20]:
test_stat, pvalue = ttest_ind(df.loc[df["smoker"] == "Yes", "total_bill"],
                              df.loc[df["smoker"] == "No", "total_bill"],
                              equal_var=True)

In [21]:
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 1.3384, p-value = 0.1820


In [22]:
# H0 RED if p-value < 0.05.
# H0 CANNOT BE REJECTED if p-value < 0.05.

- Since the p-value is greater than 0.05, H0 cannot be rejected.
- There is no statistically significant difference between the mean calculations of smokers and non-smokers. 

# Mann-Whitneyu test (non-parametric test) if assumptions are not satisfied.

- The mannwhitneyu test is a non-prametric mean comparison median comparison test.

In [23]:
test_stat, pvalue = mannwhitneyu(df.loc[df["smoker"] == "Yes", "total_bill"],
                                 df.loc[df["smoker"] == "No", "total_bill"])

In [24]:
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 7531.5000, p-value = 0.3413


In [25]:
# H0 RED if p-value < 0.05.
# H0 CANNOT BE REJECTED if p-value < 0.05.

- Since the p-value is greater than 0.05, H0 cannot be rejected.
- There is no statistically significant difference between the mean calculations of smokers and non-smokers. 

# We either REFUSE or CANNOT REFUSE hypothesis H0. 

# There is no comment that we ACCEPT hypothesis H1. This comment is WRONG.