# About
* Objective
    * concise guide and execution of A/B Testing

# 1. Intro

## What is A/B testing?
* A/B testing is a basic randomized control experiment. It is a way to compare the two versions of a variable to find out which performs better in a controlled environment.
* e.g. you want to increase web traffic to your website
    * You can use random experiments and see how they affect the web traffic (very broad, takes a long time)
    * You can apply scientific an statistical methods -> A/B testing
    
    ### Why is it called A/B testing?
    * In the scenario above, we divide the the experiments into two parts
        * A is our control so it will remain the same
        * B is where we make signficant changes
    * It then becomes a hypothesis testing problem and we make decisions that estimate population parameters based on sample statistics
        * Population: All the users going to your website
        * Sample: The number of customers that participated in the test
 
--- 
## Hypothesis Testing Primer [ref 2]
* Hypothesis testing underlies A/B testing so let's do a quick recap of the major concepts

### Example: NSFG data
* NSFG study (National survery of family growth)
    * \information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men's and women's health. 
    * The survey results are used ... to plan health services and health education programs, and to do statistical studies of families, fertility, and health."
    * We will use data collected by this survey to investigate <b>whether first-born babies tend to come late, and other questions</b>
* What we're trying to suss out is if an effect i.e. the observed differences between first babies and others, is a <b> real difference</b> or might appear in the sample </b> by chance</b>
    * A few ways we can formulate this Q
        * Fisher null hypothesis testing
        * Neyman-Pearson decision theory
        * Bayesian inference
        
### Classical Hypothesis testing
* The goal of classical hypothesis testing is to answer the question, <b> Given a sample and an apparent e ect, what is the probability of seeing such an effect by chance?</b> 
* Here's how we answer that question
    1. Quantify the size of apparent effect by choosing a <b> test statistic </b>
        * in the NSFG data, the apparent effect is a difference in pregnancy length between first babies and others, so test statistic-> difference in means between the two groups
    2. Define the <b>null hypothesis </b>
        * null hypothesis: a model of the system based on the assumption that the apparent effect is not real
        * In the NSFG data, the null hypothesis is that there is no difference between first babies and others i.e. that pregnancy lengths for both groups have the same distribution
     3. Compute a p-value
         * p-value: the probability of seeing the apparent e ect if the null hypothesis is true.
         * we would compute the <b>actual difference in means</b>, then compute the probability of seeing a difference as big, or bigger, under the null hypothesis.
     4. Interpret the result
         * If the p-value is low, the effect is said to be statistically signi cant, which means that it is unlikely to have occurred by chance. 
         * In that case we infer that the e ect is more likely to appear in the larger population.
* In plainspeak
    * To test a hypothesis like "This effect is real," we assume, temporarily, that it is not. That's the null hypothesis. 
    * Based on that assumption, we compute the probability of the apparent effect. That's the p-value. 
    * If the p-value is low, we conclude that the null hypothesis is unlikely to be true.
---
## Hypothesis Testing and A/B testing
1. Null hypothesis
    * The null hypothesis is the one that states that sample observations result purely from chance. 
    * From an A/B test perspective, the null hypothesis states that there is no difference between the control and variant groups. 
    * It states the default position to be tested or the situation as it is now, i.e. the status quo. 
    * Here our H0 is <i>”there is no difference in the conversion rate in customers receiving newsletter A and B”.</i>

2. Alternative hypothesis
    * The alternative hypothesis challenges the null hypothesis and is basically a hypothesis that the researcher believes to be true. 
    * The alternative hypothesis is what you might hope that your A/B test will prove to be true.
    * For example, the Ha is “the conversion rate of newsletter B is higher than those who receive newsletter A“.
    
----

# 2. Create Control Group and Test Group
* Once we are ready with our null and alternative hypothesis, the next step is to decide the group of customers that will participate in the test. Here we have two groups – The Control group, and the Test (variant) group.

* The Control Group is the one that will receive newsletter A and the Test Group is the one that will receive newsletter B.
* For this experiment, we randomly select 1000 customers – 500 each for our Control group and Test group.
* <b> Random sampling: </b>
    * Randomly selecting the sample from the population
    * It is a technique where each sample in a population has an equal chance of being chosen. 
    * random sampling is important because it eliminates sampling bias
* We have to eliminate bias because we want the results of the A/B test to be representative of the entire population rather than the sample itself
* <b> Sample size </b>
    * It is imperative that we determine the minimum sample size for our a/b test because conducting it
    * This is so we can eliminate undercoverage bias -> the bias from sampling too few observations
----
# 3. Conduct the A/B test and collect the data
* If our performance metric that we want to improve is conversion rate:
* We can calculate daily conversion rates for both the treatment and the control groups
    * the conversion rate in a group on a certain day represents a single data point, the sample size becomes the number of days
    * So we will be testing the difference between the mean of daily conversion rates in each group across the testing period.
* example data
    * run experiment for 1 month
    * mean conversion rate for control group: 16%
    * mean conversion rate for test group: 19%
----
# 4. Statistical significance of the Test
* To be able to derive a conclusion from these results i.e. the test group is working better than the control group / rejecting our null hypothesis, we need to prove the <b> Statistical Significance </b>

## Types of Errors that may occur in hypothesis testing

1. Type I error: 
    * We reject the null hypothesis when it is true. That is we accept the variant B when it is not performing better than A
2. Type II error: 
    * We failed to reject the null hypothesis when it is false. It means we conclude variant B is not good when it performs better than A

* <b> To avoid these errors we must calculate the statistical significance of our test </b>

### What does 'Statistically Significant' entail?
* An experiment is considered to be statistically significant when we have enough evidence to prove that the result we see in the sample also exists in the population.

### In our case
* We need to determine if the difference between the control version and test version is due to some error/chance or not. 
* To prove statistical significance, we can use a <b> Two-sample T-test </b>

## T-


## Two-sample t-test
* What is the two-sample t-test?
    * The two-sample t-test (also known as the independent samples t-test) is a method used to test whether the unknown population means of two groups are equal or not.

# References
1.  https://www.analyticsvidhya.com/blog/2020/10/ab-testing-data-science/ 
2. Allen Downey Think Stats v2, Chapter 9: Hypothesis testing