# A/B Testing Analysis

In [None]:
import numpy as np
import pandas as pd

from scipy import stats
import seaborn as sns

# Objectives

- Conduct an A/B test in Python
- Interpret the results of the A/B tests for a stakeholder

# Example Together

## Question

We have data about whether customers completed sales transactions, segregated by the type of ad banners to which the customers were exposed.

The question we want to answer is whether there was any difference in sales "conversions" between desktop customers who saw the sneakers banner and desktop customers who saw the accessories banner in the month of May 2019.

## Considerations

What would we need to consider when designing our experiment?

Might include:

- Who is it that we're including in our test?
- How big of an effect would make it "worth" us seeing?
    - This can affect sample size
    - This can give context of a statistically significant result
- Other biases or "gotchas"

## Loading the Data

First let's download the data from [kaggle](https://www.kaggle.com/podsyp/how-to-do-product-analytics) via the release page of this repo: https://github.com/flatiron-school/ds-ab_testing/releases 

The code below will load it into our DataFrame:

In [None]:
# This will download the data from online so it can take
# some time (but relatively small download)

df = pd.read_csv('https://github.com/flatiron-school/ds-ab_testing/releases/download/v1.2/products_small.csv')

> Let's take a look while we're at it

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe(include=['object'])

## Some Exploration to Better Understand our Data

Lets's look at the different banner types:

In [None]:
df['product'].value_counts()

In [None]:
df.groupby('product')['target'].value_counts()

In [None]:
df.groupby('product').get_group('accessories')

In [None]:
df.groupby(['product', 'target']).count()

In [None]:
df.groupby(['product', 'target']).agg('count')

Let's look at the range of time-stamps on these data:

In [None]:
df['time'].min()

In [None]:
df['time'].max()

Let's check the counts of the different site_version values:

In [None]:
df['site_version'].value_counts()

And now check titles

In [None]:
df['title'].value_counts()

In [None]:
len(df.loc[df['title'] == 'order'])

In [None]:
sum(df['target'])

In [None]:
df.groupby('title').agg({'target': 'mean'})

## Experimental Setup

We need to filter by site_version, time, and product:

In [None]:
# Time


In [None]:
# All


In [None]:
# Not going to work without nested OR conditional
df_AB = df.loc[(df['site_version'] == 'desktop') & (df['time'] >= '2019-05-01') &
              (df['product'] == 'accessories') | (df['product'] == 'sneakers')]

In [None]:
df_AB.tail()

In [None]:
df_AB['product'].value_counts()

In [None]:
df_AB['site_version'].value_counts()

### What Test Would Make Sense?

Since we're comparing the frequency of conversions of customers who saw the "sneakers" banner against those who saw the "accessories" banner, we can use a $\chi^2$ test.

Note there are other hypothesis tests we can use but this should be fine since it should fit our criteria.

### The Hypotheses

$H_0$: Customers who saw the sneakers banner were no more or less likely to buy than customers who saw the accessories banner.

$H_1$: Customers who saw the sneakers banner were more or less likely to buy than customers who saw the accessories banner.

### Setting a Threshold

We'll set a false-positive rate of $\alpha = 0.05$.

## $\chi^2$ Test

### Setup the Data

We need our contingency table: the numbers of people who did or did not submit orders, both for the accessories banner and the sneakers banner. 

In [None]:
counts = df_AB.groupby(['product', 'target']).count()['title']
counts

In [None]:
df_A = df_AB.loc[df_AB['product'] == 'accessories']
df_B = df_AB.loc[df_AB['product'] == 'sneakers']

In [None]:
accessories_orders = sum(df_A['target'])
sneakers_orders = sum(df_B['target'])

accessories_orders, sneakers_orders

In [None]:
accessories_orders = counts[1]
accessories_orders

To get the numbers of people who didn't submit orders, we get the total number of people who were shown banners and then subtract the numbers of people who did make orders.

In [None]:
accessories_total = len(df_A)
sneakers_total = len(df_B)

accessories_no_orders = accessories_total - accessories_orders
sneakers_no_orders = sneakers_total - sneakers_orders

accessories_no_orders, sneakers_no_orders

In [None]:
contingency_table = np.array([[accessories_orders, accessories_no_orders],
                              [sneakers_orders, sneakers_no_orders]])

contingency_table

In [None]:
contin_list = list([[accessories_orders, accessories_no_orders], 
                   [sneakers_orders, sneakers_no_orders]])

contin_list

In [None]:
pd.DataFrame([[accessories_orders, accessories_no_orders], 
                   [sneakers_orders, sneakers_no_orders]])

### Calculation


In [None]:
stats.chi2_contingency(contingency_table)

In [None]:
stats.chi2_contingency(pd.DataFrame([[accessories_orders, accessories_no_orders], 
                   [sneakers_orders, sneakers_no_orders]]))

In [None]:
stats.chi2_contingency(contin_list)

This extremely low $p$-value suggests that these two groups are genuinely performing differently. In particular, the desktop customers who saw the sneakers banner in May 2019 bought at a higher rate than the desktop customers who saw the accessories banner in May 2019.

In [None]:
(sneakers_orders / sneakers_total) * 100


In [None]:
(accessories_orders / accessories_total) * 100

## Interpretation

In [None]:
contingency_table

In [None]:
contingency_table[:, 0] / contingency_table.sum(axis=1)

In [None]:
contingency_table.sum(axis=1)

In [None]:
# Find the difference in conversion rate
accessory_CR, sneaker_CR = contingency_table[:, 0] / contingency_table.sum(axis=1)

In [None]:
print(f'Conversion Rate for accessory banner:\n\t{100*accessory_CR:.3f}%')
print(f'Conversion Rate for sneaker banner:\n\t{100*sneaker_CR:.3f}%')
print('')
print(f'Absolute difference of CR: {100*(sneaker_CR-accessory_CR):.3f}%')

In [None]:
accessories_total, sneakers_total

In [None]:
accessories_total - sneakers_total

In [None]:
496 / accessories_total * 100

In [None]:
799 / sneakers_total * 100

So we can say:
- There was a statistically significant difference at the $\alpha$-level (confidence level)
- The difference was about $2\%$ in favor of the sneaker banner!

# Exercise

> The company is impressed with what you found and is now wondering if there is a difference in their other banner ads!

With your group, look at the same month (May 2019) but compare different platforms ('mobile' vs 'desktop') and or different banner types ('accessories', 'sneakers', 'clothes', 'sports_nutrition'). Just don't repeat the same test we did above ðŸ˜‰

Make sure you record what considerations you have for the experiment, what hypothesis test you performed ($H_0$ and $H_1$ too), and your overall conclusion/interpretation for the _business stakeholders_. Is there a follow-up you'd suggest? 