<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Example-Together" data-toc-modified-id="Example-Together-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Example Together</a></span><ul class="toc-item"><li><span><a href="#Question" data-toc-modified-id="Question-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Question</a></span></li><li><span><a href="#Loading-the-Data" data-toc-modified-id="Loading-the-Data-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Loading the Data</a></span></li><li><span><a href="#Some-Exploration-to-Better-Understand-our-Data" data-toc-modified-id="Some-Exploration-to-Better-Understand-our-Data-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Some Exploration to Better Understand our Data</a></span></li><li><span><a href="#Experimental-Setup" data-toc-modified-id="Experimental-Setup-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Experimental Setup</a></span><ul class="toc-item"><li><span><a href="#The-Hypotheses" data-toc-modified-id="The-Hypotheses-2.4.1"><span class="toc-item-num">2.4.1&nbsp;&nbsp;</span>The Hypotheses</a></span></li><li><span><a href="#Setting-a-Threshold" data-toc-modified-id="Setting-a-Threshold-2.4.2"><span class="toc-item-num">2.4.2&nbsp;&nbsp;</span>Setting a Threshold</a></span></li></ul></li><li><span><a href="#Preparing-Fisher's-Test" data-toc-modified-id="Preparing-Fisher's-Test-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Preparing Fisher's Test</a></span><ul class="toc-item"><li><span><a href="#Calculation" data-toc-modified-id="Calculation-2.5.1"><span class="toc-item-num">2.5.1&nbsp;&nbsp;</span>Calculation</a></span></li></ul></li></ul></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Exercise</a></span></li></ul></div>

In [12]:
import numpy as np
import pandas as pd
from scipy.special import comb
from scipy import stats
import seaborn as sns

# Objectives

- Conduct an A/B Test in Python

# Example Together

Now let's try a binomial A/B Test (where the variable of interest is binary). We can use [Fisher's exact test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test).

## Question

We have data about whether customers completed sales transactions, segregated by the type of ad banners to which the customers were exposed.

The question we want to answer is whether there was any difference in sales "conversions" between desktop customers who saw the sneakers banner and desktop customers who saw the accessories banner in the month of May 2019.

## Loading the Data

First let's download the data from [kaggle](https://www.kaggle.com/podsyp/how-to-do-product-analytics) via the release page of this repo: https://github.com/flatiron-school/ds-ab_testing/releases 

The code below will load it into our DataFrame:

In [20]:
# This will download the data from online so it can take some time (but relatively small download)
df = pd.read_csv('https://github.com/flatiron-school/ds-ab_testing/releases/download/v1.2/products_small.csv')

> Let's take a look while we're at it

In [21]:
df.head()

Unnamed: 0.1,Unnamed: 0,order_id,user_id,page_id,product,site_version,time,title,target
0,4122928,3e6c5e89fdddcaee0eed210ec2c9cadf,90d58d967eb72656e86059ec6f208092,2fdc16a09e0016555dd4da4a3fe84414,accessories,desktop,2019-03-06 08:42:47,banner_show,0
1,564306,feed6203517d3abf6aab13761633174b,08703dab1f004eabba25aacb7f0e5484,6b0a902b9b73d5a158d0119d6feb38ac,sneakers,mobile,2019-04-19 18:30:45,banner_show,0
2,1872289,e33d5d7941edc281646aa37763729771,bdf1d25697e21419901c94fabdafad15,9ddb7315c4357929931b48f2b3d11c62,company,mobile,2019-01-20 17:20:10,banner_show,0
3,3616779,7c4caa8d508fa7c3bbc25f35cdd9168a,8d2f23a732c9527d95678088a3bac122,1f86cd0bea31d54a5b511b42fd19401a,sneakers,mobile,2019-02-20 09:38:32,banner_show,0
4,5871482,12874b29bde8bbd43fb2b95735caf9e6,5a22604f8f31ae98ee1211ece3a02004,b533f7e2003418c63fd71471264c559a,sneakers,mobile,2019-04-24 09:19:02,banner_show,0


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 9 columns):
 #   Column        Non-Null Count    Dtype 
---  ------        --------------    ----- 
 0   Unnamed: 0    1000000 non-null  int64 
 1   order_id      1000000 non-null  object
 2   user_id       1000000 non-null  object
 3   page_id       1000000 non-null  object
 4   product       1000000 non-null  object
 5   site_version  1000000 non-null  object
 6   time          1000000 non-null  object
 7   title         1000000 non-null  object
 8   target        1000000 non-null  int64 
dtypes: int64(2), object(7)
memory usage: 68.7+ MB


## Some Exploration to Better Understand our Data

Lets's look at the different banner types:

In [None]:
df['product'].value_counts()

In [None]:
df.groupby('product')['target'].value_counts()

Let's look at the range of time-stamps on these data:

In [None]:
df['time'].min()

In [None]:
df['time'].max()

Let's check the counts of the different site_version values:

In [None]:
df['site_version'].value_counts()

In [None]:
df['title'].value_counts()

In [None]:
df.groupby('title').agg({'target': 'mean'})

## Experimental Setup

We need to filter by site_version, time, and product:

In [None]:
df_AB = df[(df['site_version'] == 'desktop') &
           (df['time'] >= '2019-05-01') &
           ((df['product'] == 'accessories') | (df['product'] == 'sneakers'))].reset_index(drop = True)

In [None]:
df_AB.tail()

### The Hypotheses

$H_0$: Customers who saw the sneakers banner were no more or less likely to buy than customers who saw the accessories banner.

$H_1$: Customers who saw the sneakers banner were more or less likely to buy than customers who saw the accessories banner.

### Setting a Threshold

We'll set a false-positive rate of $\alpha = 0.05$.

## Preparing Fisher's Test

Fisher's Test is an exact calculation of a $p$-value that requires four quantities: the respective numbers of 1's and 0's for each class.

In [None]:
df_A = df_AB[df_AB['product'] == 'accessories']
df_B = df_AB[df_AB['product'] == 'sneakers']

We calculate values in a 2x2 table: the numbers of people who did or did not submit orders, both for the accessories banner and thesneakers banner. 

In [None]:
accessories_orders = sum(df_A['target'])
sneakers_orders = sum(df_B['target'])

accessories_orders, sneakers_orders

To get the numbers of people who didn't submit orders, we get the total number of people who were shown banners and then subtract the numbers of people who did make orders.

In [None]:
accessories_total = sum(df_A['title'] == 'banner_show')
sneakers_total = sum(df_B['title'] == 'banner_show')

accessories_no_orders = accessories_total - accessories_orders
sneakers_no_orders = sneakers_total - sneakers_orders

accessories_no_orders, sneakers_no_orders

### Calculation


Fisher's Test tells us that the $p$-value corresponding to our distribution is given by:

$\Large p = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!b!c!d!n!}$

In [None]:
a = accessories_orders
b = sneakers_orders
c = accessories_no_orders
d = sneakers_no_orders

In [None]:
ab_choose_a = comb(a+b, a, exact=True)

In [None]:
cd_choose_c = comb(c+d, c, exact=True)

In [None]:
n_choose_ac = comb(a+b+c+d, a+c, exact=True)

In [None]:
p = ab_choose_a * cd_choose_c / n_choose_ac
p

Comparing with `stats.fisher_exact()`:

In [None]:
stats.fisher_exact(np.array([[a, b], [c, d]]))

This extremely low $p$-value suggests that these two groups are genuinely performing differently. In particular, the desktop customers who saw the sneakers banner in May 2019 bought at a higher rate than the desktop customers who saw the accessories banner in May 2019.

# Exercise

Same question as before, but this time for April 2019 instead of May! Use a threshold of 0.05.