# A/B Testing  

This notebook contains my notes on A/B Testing from Udacity's free online course as well as books like Lean Analytics, Lean Product Playbook and Trustworthy Online Experiments: A Practical Guide to A/B Testing (TOE). 

## Definition:

In simple terms, **A/B tests** or **Online Controlled Experiments** are mechanisms to test new product features, backend changes, UI changes or any strategic decisions by implementing the change only on a subset of users and analyzing the impact on chosen metrics to determine the impact of such a change before officially rolling it out. 

A/B Tests are best used when introducing enhancements or improvements to products with an existing user base (an available population to conduct tests on). Introduction of new products could benefit more from qualitative methods like consumer interviews and surveys. 

## Some considerations before using A/B Testing on your product:

- Do you have an **experimental unit** to test on? - like users, cookies, etc
- Do you have enough users to test on? - TOE recommends thousands of experimental units for the experiment to be useful
- Do you have a key metric(s), aka **Overall Evaluation Criteria**? - a goal that is agreed upon and can be practically evaluated. 
- Are the changes you want to test easy to make? - you would need to weigh the costs of making software changes against the benefits potentially realised from the test
- For UX testing - do you have repeat users whose experience (in terms of satisfaction, engagement, etc) can be improved/ evaluated? - testing impact of changes on engagement for one-off purchase websites like home rentals may not be ass effective as say a social network or e-commerce site. 

## Components of an A/B Test:

TOE defines the components of an A/B test as follows:

- **_Parameter_**: Aka experimental variable/factor. This is what determines the difference between the A and B groups of your test. In case of a simple UI change to the colour of the checkout button, the parameter can be defined as the colour. The original colour is often assigned to group A - known as the **Control Group** and the new colour(s) are assigned the the other groups - known as the **Treatment(s)**. Often times, multiple variations of the parameter are tested in a single test in a process known as **Multivariate testing** 


- **_Variant_**: This is simply another name for the group assignment - Treatment vs Control or A/B/C... It is a function of the parameter or experimental variable. The distinction between parameter and variant is more apparent when there are 2 or more experimental variables interacting. For example if we are testing for both colour and shape (m and n of each) of a button, we have 2 parameters - colour and shape. We have m times n variants - each combination of colour and shape. 


- **_Randomization Unit_**: This is also often called the _unit of diversion_. It is the unit on which a group (Treatment or Control) is randomly assigned. The most common unit (especially for user visible changes) is user id. However, for some non user visible (like backend changes) cookie or page view based diversion may also be applied. 



## Metrics

This warrants a whole section to itself because choosing the right metric can change the whole inference and impact of your experiment. 

- Choosing the right metrics 
- Normalizing metrics - size may vary across variants 
- Some common metrics for different business types 
- Variability


## Statistics for Online Controlled Experiments 

In the this section notebook, I will go through some of the common statistical methods involved in planning and analysing A/B Tests as well as some snippets of code used to obtain results.

- Sizing and Power Analysis 
- Common distributions - Binomial and Normal Distribution - (common metrics and their distributions, a table)
- Hypothesis testing - Means vs Proportions 
- Variability 
- Empirical vs Analytical estimates and confidence intervals 
- A/A Tests 
- Sensitivity vs Robustness 
- Analyzing Ratios 

In [36]:
#Sample Size Calculations 

#inputs 
power <- 80
conflevel <- 95
baseline <- 8
effectsize <- 2
relative <- F

#function to calculate sample size 
sample_size <- function(power, conflevel, baseline, effectsize, relative){
    if (relative == T){
        effectsize <- (effectsize/100)*(baseline)
    }
    else {
        effectsize <- effectsize
    }
    p1 <- baseline/100
    p2 <- p1 - effectsize/100
    p <- (p1+p2)/2
    q <- 1 - p
    q1 <- 1 - p1
    q2 <- 1 - p2
    alpha <- 1 - conflevel/100
    n <- (sqrt(p*q*(2))*qnorm(1-alpha/2) + sqrt(p1*q1 + p2*q2)*qnorm(power/100))^2/(effectsize/100)^2
    return(round(n))
}

In [37]:
sample_size(power, conflevel, baseline, effectsize, relative)

More information about the formula used here can be found in this [article](https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a) published by Towards Data Science. There are also various online sample size calculators that provide close estimates. 

- Optimizely 
- One more calculator link

## Designing an Experiment 

- Unit of diversion - considerations 
- Size - considerations 
- Population vs Cohort vs Target Population
- Duration vs Exposure - considerations to limit exposure 

## Ethics of A/B Testing and Online Experimentation

## Common Pitfalls 