### A/B Testing: Frequentist and Bayesian Approaches

**OBJECTIVES**

- Implement A/B testing with frequentist hypothesis tests
- Implement A/B testing using a Bayesian perspective
- Implement Bandit Solutions to A/B testing

### The Frequentist Approach


![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Marey_-_birds.jpg/440px-Marey_-_birds.jpg)


To start, we have an example derived from Meridith and Krushcke's [BEST](https://cran.r-project.org/web/packages/BEST/vignettes/BEST.pdf) documentation.  These two arrays represent reaction times for the groups, where group 1 receives a treatment and group 2 is the placebo group.

In [None]:
import numpy as np
import scipy.stats as stats
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import pymc3 as pm
import arviz as az

In [None]:
treatment = np.array([5.77, 5.33, 4.59, 4.33, 3.66, 4.48])
control = np.array([3.88, 3.55, 3.29, 2.59, 2.33, 3.59])

**HYPOTHESIS TEST REVIEW**

- State the Null and Alternative Hypothesis
- State a significance level 
- Run the test

In [None]:
#null and alternative hypothesis?


In [None]:
#significance level?


In [None]:
#test


### Multiple Groups


![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Rinnhofer-foto-muschel.jpg/440px-Rinnhofer-foto-muschel.jpg)


The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.

- The samples are independent.

- Each sample is from a normally distributed population.

- The population standard deviations of the groups are all equal. This property is known as homoscedasticity.

-------

The example below has data on mussel shell lengths from five locations around the world.  

In [None]:
tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735,
             0.0659, 0.0923, 0.0836]
newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835,
           0.0725]
petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105]
magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764,
           0.0689]
tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045]

In [None]:
#use f_oneway


### Bayesian Approach

In [None]:
treatment = [5.77, 5.33, 4.59, 4.33, 3.66, 4.48]
control = [3.88, 3.55, 3.29, 2.59, 2.33, 3.59]

In [None]:
import pymc3 as pm

In [None]:
#define the model Normal and Half Normal Priors


In [None]:
#define the difference


In [None]:
#sample


In [None]:
#plot the means


In [None]:
#posterior plot


In [None]:
#forest plot


In [None]:
#summary


In [None]:
#plot the difference in means posterior


### Adjusting our Priors

Now, let's revisit this with the `StudentT` distribution.  Here, we need to pay attention to an additional parameter $\nu$, and that the parameterization of the T distribution in pymc3 uses sensitivity or 1/$\sigma^2$ as the `lam` parameter.

We can use an `Exponential` distribution to model the $\nu$ parameter for the distributions and simply define $\lambda$.

Specify the model:

- $\mu_1, \mu_2 \rightarrow$ `Normal`
- $\sigma_1, \sigma_2 \rightarrow$ `Uniform`
- $\nu \rightarrow$ `Exponential`

In [None]:
treatment = [5.77, 5.33, 4.59, 4.33, 3.66, 4.48]
control = [3.88, 3.55, 3.29, 2.59, 2.33, 3.59]

#define the model 




In [None]:
#defin likelihoods and lambdas


In [None]:
#determine the difference in means



In [None]:
#sample


In [None]:
#plot the means


In [None]:
#plot the difference


In [None]:
#forest plot of means


### Problem

The information below represents the number of clicks two variations of an advertisement received.  Here we use `Beta` priors and `Binomial` likelihoods to describe the difference in performance to these ads:

```
n = 500 #number of views for each
ad1 = 210 #ad1 click
ad2 = 264 #ad2 clicks
```

In [None]:
with pm.Model() as adtest:
    #priors for ad1 and ad2
    
    
    #likelihoods for ad1 and ad2 -- n is n, probabilities are priors
    
    
    #difference in priors
   
    
    #sample
    

In [None]:
#Are they different?  Why?


### Many Groups

Beyond the two group example, suppose we have many groups within which we want to compare a difference.  For demonstration purposes, we use the easily accessible `penguins` data in seaborn, and will learn distributions to see if different species have different flipper lengths.

In [None]:
#load the penguins
penguins = sns.load_dataset('penguins').dropna()

In [None]:
penguins.head()

In [None]:
#create numeric category tag
penguins['enc'] = penguins['species'].factorize()[0]

In [None]:
#define the model


In [None]:
#plot trace


In [None]:
#look at the summary


In [None]:
#forest plot of means


### Bandit Problems

In probability theory, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. -- [Source](https://en.wikipedia.org/wiki/Multi-armed_bandit)

Our Algorithm:

1. Sample $X_b$ from prior of all bandits
2. Select bandit with largest sample $B$
3. Pull bandit $B$ and update prior
4. Repeat

Note that using a Beta$(\alpha = 1, \beta = 1)$ (flat) prior and Binomial likelihood we update the Beta posterior as follows; Beta($\alpha = 1 + X, \beta = 1 + 1 - X$).

### More Examples

- Solve regression problem with PyMC3
- Solve classification problem with PyMC3
- Solve A/B testing problem with PyMC3

In [None]:
treatment = np.array([101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106,
        109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104,
        96,103,124,101,101,100,101,101,104,100,101])
control = np.array([99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100,
           104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100,
           101,100,99,101,100,102,99,100,99])