# A/B Testing 

## An introduction to statistical hypothesis testing for test/control scenarios
___
Mike Smith   
Nielsen, 07/03/2019
___

In this notebook, we'll review the basics of the statistics behind A/B tests, a bit of the coding behind applying those statistical properties in python, and lastly put those practices to use with some data.

### Contents:
#### 1) Statistical Background
#### 2) Building some tools
#### 3) Demo: Testing Effectivenenss of a Marketing Campaign
#### 4) Exercises
____

## Statistical Background

The basis behind the A/B test is a practice in statistics where a person wants to compare a mean of some metric across two groups for similarity.  More precicely, we want to test the hypothesis that they're the same.  Contradictory results of the test would indicate that there is an underlying difference between the two groups.

#### T-Tests
Let's assume we have two groups, A and B (seems appropriate).  Each group is made up of people and we record their heights as $h$.  Then, the mean of the heights of people in group A might be called $\bar{h}_A$, and for group B, $\bar{h}_B$.   
   
If we then want to test the assertion that people's heights in group A are statistically similar to group B, we would want to test the hypothesis ($H_0$) that:  
$$H_0: \bar{h}_A = \bar{h}_B$$   
versus   
$$H_1: \bar{h}_A \neq \bar{h}_B$$   

To do this, we need to gather some details about each group:  The means ($\bar{h}_A$, $\bar{h}_B$) and the number of people in each group ($n_A$, $n_B$).  With these, we can calculate the sample standard deviations of the groups: ($S_A$, $S_B$) and use them to calculate our t-statistics.  The formula is   
$$t = \frac{\bar{h}_A - \bar{h}_B}{\sqrt{\frac{S_A}{(n_A)^2} + \frac{S_B}{(n_B)^2}}}$$   

Lastly, we need the confidence interval ($\alpha$) and degrees of freedom ($df$) for our test.  This is standardly:  
$$df = n_A + n_B - 2$$   
and   
$$\alpha = .05$$

This would mean there's a 95% confidence that the conclusion of the test is valid.

With all these facts, we can use the good-ol' t-table you probably recall from the back of your favorite intro-to-stats class to find the critical t-value from the distribution with which to compare our derived t-value.  If our derived t-value is greater than the critical one from the distribution, we conclude that our null hypothesis is false - there is a statisticaly significant difference between the two groups' average heights.  If it's less than or equal to, then we accept the null hypothesis that the group means are equal.

## Coding

Now, let's put this theory to use.  First, we need some tools.  Let's create some functions that will take care of some of the algebra defined above. We're going to assume we're working with a pandas dataframe called 'df' (one of the most pouplar data-handling libraries in python for doing data science).  This dataframe object is essentially a table (rows and columns) with built in functionality.  Each column is considered a 'series' and we can perform operations on one or more series together.

For the sake of demonstration, let's continue with the height example from above.

First up - averages ($\bar{h}$).  To get the average of the column 'height' of all the people who have A in their 'group' column, we can simply say   

```
(1) mean_height_a = df[df.group=='A'].height.mean()
```

Second is counts ($n$).  To get the total number of people in group A, we could do something like this   
```
(2) n_a = df[df.group=='A'].height.count()
```

Third, the standard deviation ($S$):  
```
(3) std_height_a = df[df.group=='A'].height.std()
```

Fourth, the t-value ($t$): For this one, we're going to assume you've used the above code to generate $\bar{h}_A$,  $\bar{h}_B$, $S_A$, $S_B$, $n_A$, and $n_B$.  We're then going to build a function to give you back the t-score for a two-sided test:
```
(4) def t_value(mean1, mean2, n1, n2, sd1, sd2):
        from math import sqrt
        t = (mean1 - mean2) / sqrt((sd1/n1^2) + (sd2/n2^2))
        return t
```

And lastly, we'll need a utility for testing our t-value against the t-distribution.  We extend our knowledge from above a little bit here.  Instead of setting the alpha to a fixed point and comparing the t-value to the t-distribution at the given alpha, we will simply pass the computed t-value and the degrees of freedom to a function which returns us the probability that our computed value is less than or equal to the 'critical t-value'.  Thus, our function will return us a value that should be less than or equal to our set $\alpha%$ for us to assert there is a difference in the means.
```
(5) def test_t(t_value, alpha, df):
        from scipy import stats
        p = 1 - stats.t.cdf(t,df=df)
        return 2*p
```

## Demo 

Now let's put this to work on some example data.  In the below dataframe we have person-level data corresponding to spending at a certain restaurant (Mike's Sushi) and across all 'casual dining' restaurants.  In addition, we have the amount of times a person was exposed to a campaign Mike's Sushi ran and placed across various web pages.  Below are the columns we have available:

    - exposure_count - this is the number of times a person was exposed to the advertising campaign
    - MikesSushi_spend - the number of dollars spent at Mikes Sushi while the campaign was running
    - MikesSushi_trans - the number of transactions at Mikes Sushi while the campaign was running
    - Casual_Dining_spend - the number of dollars spent across all Casual Dining restaurants while the campaign was running
    - MikesSushi_trans - the number of transactions across all Casual Dining restaurants while the campaign was running

Let's actually read this data into a pandas DataFrame:

In [29]:
import pandas as pd

In [11]:
dataset = pd.read_csv('/home/notebook/ab_demo_data.csv')

In [12]:
dataset.head()

Unnamed: 0,exposures,MikesSushi_trans,MikesSushi_spend,CasualDining_spend,CasualDining_trans
0,6,6.0,6.0,183.980047,8.0
1,0,9.0,9.0,294.415975,11.0
2,0,5.0,5.0,248.352395,9.0
3,5,6.0,6.0,111.927492,7.0
4,0,5.0,5.0,139.15086,6.0


Now, let's run through a sample scenario where we want to test if being exposed to the advertising campaign had an effect on average spending at Mike's Sushi.   For this excersize, our two groups would be    
Group A - where number of exposures > 0 (Exposed)   
Group B - where number of exposures = 0 (Unexposed)

Let's first just take a look at the means:



In [13]:
meanA = dataset[dataset.exposures > 0].MikesSushi_spend.mean()
meanB = dataset[dataset.exposures == 0].MikesSushi_spend.mean()
print('Exposed Average Spending = {0}'.format(meanA))
print('Unexposed Average Spending = {0}'.format(meanB))

Exposed Average Spending = 5.296992481203008
Unexposed Average Spending = 5.282051282051282


It doesn't seem that being exposed to the campaign had much effect on spending at MikesSushi.  To set up our A/B test though, we need to formulate our hypothesis to test this theory. 

Our hypotheses would then be:
$$H_0: \bar{spend}_A = \bar{spend}_B$$   
versus   
$$H_1: \bar{spend}_A \neq \bar{spend}_B$$  

So let's start accumulating our required facts (we already have our averages from the above cell)...

In [14]:
#Let's get the counts
n_A = dataset[dataset.exposures > 0].MikesSushi_spend.count()
n_B = dataset[dataset.exposures == 0].MikesSushi_spend.count()

In [15]:
#And then the Standard Deviations
stdA = dataset[dataset.exposures > 0].MikesSushi_spend.std()
stdB = dataset[dataset.exposures == 0].MikesSushi_spend.std()

In [16]:
#Lastly, our Degrees of freedom:
d_f = n_A + n_B - 2

Now we can use our two functions we wrote above (t_value and test_t) to come up with our t and p-values!

In [26]:
def t_value(mean1, mean2, n1, n2, sd1, sd2):
        from math import sqrt
        t = (mean1 - mean2) / sqrt((sd1/n1**2) + (sd2/n2**2))
        return t
def test_t(t_value, df):
        from scipy import stats
        p = 1 - stats.t.cdf(t,df=df)
        return 2*p

In [28]:
t = t_value(meanA, meanB, n_A, n_B, stdA, stdB)
p = test_t(t, d_f)
print('T-Value is: {0}'.format(t))
print('P-Value is: {0}'.format(p))

T-Value is: 1.4124266486717738
P-Value is: 0.1584489741290278


We see that the P-Value given back is ABOVE our confidence threashold ($\alpha = 0.05$) and thus, we ACCEPT our Null hypothesis that the two group means are equal and thus we conclude that there wasn't significant change in spending due to the being exposed to the advertising campaign.  

### Exercises:

Try out some variations on the above exercise and see what you can come up with.

1) How about transactions?  Maybe people came to the restaurant more times but had cupons so they ended up spending the same amount of dollars.  Test if there was significant change in the transactions at Mike's Sushi

2) How about the rest of the category?  Did the campaign get people interested in eating out more generally?  Maybe they saw the ad but went to their own favorite restaurant instead.  Test if the Casual Dinng category saw significant change in spend for exposed vs unexposed people.