# About
* objectives: Get a clear understanding of the concepts behind A/B testing and apply them on practice data

# What is A/B testing?
* It is part of user experience research methodology
* It involves a randomized experiment wherein two or more versions of a variable (web page, page element etc) are shown to different segments of website visitors at the same time to determine which leaves the maximum impact and drive business metrics

# Why A/B testing?
* Essentially, A/B testing eliminates all the guesswork out of website optimization and enables experience optimizers to make data-backed decisions. 
* In A/B testing, A refers to ‘control’ or the original testing variable. Whereas B refers to ‘variation’ or a new version of the original testing variable. 
* The version that moves your business metric(s) in the positive direction is known as the ‘winner.’ Implementing the changes of this winning variation on your tested page(s) / element(s) can help optimize your website and increase business ROI. 
* metrics for conversion differ across websites/ uses i.e. sale if products vs generation of leads
* A/B testing is one of the components of the overarching process of Conversion Rate Optimisation
    * this can be used to gather qualitative and quantitative user insights

# Critical Steps Summary
1. Determining Minimum Detectable Effect
2. Calculating Sample Size
3. Analyzing Results for Statistical Significance

---
# Example of task/scenario
* You are part of a video game development company, NightFort games
* Recently, you have recognized a lot of users quitting after one particular level
* The product team has posed a hypothesis <b> that the level is too hard, and making it easier will allow for less user frustration and hence more players continuing to play the game </b>
* From this we can extract a few things
    * Independent variable (that which we change) : Difficulty of the level
    * Dependent variable (that which we monitor) : number of users/ proportion of users continuing to player
    * Desired objective: an increase in the number of players continuing to play the game

## Sample Size Calculation
### Desired business effect
>Our video game company has made some tweaks to the level to make it easier, effectively changing the difficulty from hard to medium. Before blindly rolling an update out to all the users, the product team wants to test the changes on a small sample to see if they make a positive impact.
* The desired metric is the <b> percentage of players who continue playing after reaching our level in question </b>. * Currently, <b>70% of users continue to play</b> where the <b>remaining 30% stop playing</b>. 
* The product team has decided increasing this to <b>75%</b> would warrant deploying the changes and making an update to the game.

### Minimum Detectale Effect
* 70 -> 75 : helps us calcualte the minimum detectable effect
    * one of the inputs when calculating sample size
* as we are comparing two proportions, we will use the `proportion_effectsize` function in statsmodels

In [2]:
import statsmodels.api as sm

In [3]:
init_prop = 0.7
mde_prop = 0.75
effect_size = sm.stats.proportion_effectsize(
    init_prop, 
    mde_prop
)
print(f'For a change from {init_prop:.2f} to {mde_prop:.2f} - the effect size is {effect_size:.2f}.')

For a change from 0.70 to 0.75 - the effect size is -0.11.


### Sample Size
* Now that we have ascertained the minimum desirable effect, we can calculate smaple size
* to do this for a proportion problem, `zt_ind_solve_power` from the statsmodels libraryy is used

<u> `zt_ind_solve_power` </u> 
* `statsmodels.stats.power.zt_ind_solve_power(effect_size=None, nobs1=None, alpha=None, power=None, ratio=1.0, alternative='two-sided')¶`
* solve for any one parameter of the power of a two sample z-test
* for z-test the keywords are:
    * `effect_size, nobs1, alpha, power, ratio`
* exactly one needs to be None, all others need numeric values

    <u> Parameters </u>
    * effect_size: float
        * standardized effect size, difference between the two means divided by the standard deviation. 
        * If ratio=0, then this is the standardized mean in the one sample test.
    * nobs1 : int or float
        * number of observations of sample 1. 
        * The number of observations of sample two is ratio times the size of sample 1, i.e. nobs2 = nobs1 * ratio ratio can be set to zero in order to get the power for a one sample test.
    * alphafloat in interval (0,1)
        * significance level, e.g. 0.05, is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.
    * powerfloat in interval (0,1)
        * power of the test, e.g. 0.8, is one minus the probability of a type II error. 
        * Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.
    * ratiofloat
        * ratio of the number of observations in sample 2 relative to sample 1. see description of nobs1 The default for ratio is 1; to solve for ration given the other arguments it has to be explicitly set to None.

    <u> Returns </u>
    * value:float


In [5]:
from math import ceil

sample_size = sm.stats.zt_ind_solve_power(
    effect_size=effect_size, 
    nobs1=None, 
    alpha=0.05, 
    power=0.8
)
print(f'{ceil(sample_size)} sample size required given power analysis and input parameters.')

1250 sample size required given power analysis and input parameters.


## Experiment Data Ovre