# A/B Testing

## AKA Applied Hypothesis Testing!

If you went through all the stats up to this point and thought "oh man when am I ever going to use this stuff" - I get it. But one of the most common ways that Hypothesis Testing techniques are used in the real world is through A/B Testing!

One of the most common places you see A/B Testing out in the world is in marketing - companies will run A/B tests on elements of their website, their emails, their calls to action, etc. While you see A/B testing in other places, Marketing is going to be my example lens for today's session.

### A/B Testing in Marketing

Hubspot is a marketing software company, and I'm going to use some of their resources in the setup to why all this matters. You can access the specific A/B Testing Kit they put out for marketing optimization process at this link: https://drive.google.com/drive/folders/1Wk3J2nA5gguN1Y_41cACxQ9mcJls9TmI

Hubspot's definition of split testing, aka A/B testing:

> Split testing, commonly referred to as A/B testing, is a method of testing through which marketing variables (such as copy, images, layout, etc) are compared to each other to identify the one that brings a better conversion rate. In this context, the element that is being testing is called the “control” and the element that is argued to give a better result is called the “treatment.”

#### Hubspot's 10 Guidelines for Effective A/B Testing: 

1. Only conduct one test (on one asset) at a time
2. Test one variable at a time
3. Test minor changes, too
4. You can A/B test the entire element
5. Measure as far down funnel as possible
6. Set up control & treatment
7. Decide what you want to test
8. Split your sample group randomly 
9. Test at the same time
10. Decide on necessary significance before testing

### What will the data look like?

Data source: https://www.kaggle.com/zhangluyuan/ab-testing

Unfortunately, this data has no real meta-data associated with it, but the author did say the data comes from an e-commerce website. 

Full credit to Robbie Geoghegan, now a Data Scientist at Facebook, for giving me the idea and sharing work they did on this dataset: https://medium.com/@robbiegeoghegan/implementing-a-b-tests-in-python-514e9eb5b3a1 

Another blog I referenced: https://medium.com/@RenatoFillinich/ab-testing-with-python-e5964dd66143

Before we go any further, and typically before we run a test like this, we need to decide our significance level. Otherwise, let's assume that the group who ran this test did it properly (ran tests in parallel, split users randomly, etc)

Significance Level: $\alpha = .05$

In [None]:
# Imports
import pandas as pd
import numpy as np

from scipy import stats

from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize, proportions_ztest

In [None]:
# Grab our data - want the column 'timestamp' to be a datetime object
df = None

In [None]:
# Check our timeframe


#### There's an issue...

In [None]:
# Let's see...

#### One more thing to check...

In [None]:
# Check it out...

#### Now, let's explore:

In [None]:
# Split out our two groups
control_group = None
treat_group = None

In [None]:
# Check the number of samples, timeframe and conv % for each group
for sub_df in [control_group, treat_group]:
    name = list(sub_df['group'])[0].title()
    print(f"Number of Samples in our {name} Group: {len(sub_df):,}")
    print(f"Timeframe: {sub_df['timestamp'].min()} - {sub_df['timestamp'].max()}")
    print(f"Number of Conversions in our {name} Group: {sub_df['converted'].sum():,}")
    print(f"Conversion % in our {name} Group: {sub_df['converted'].mean() * 100:.3f}%")
    print("*"*20)

Our friend at Facebook, whose [blog](https://medium.com/@robbiegeoghegan/implementing-a-b-tests-in-python-514e9eb5b3a1) and [code](https://github.com/RobbieGeoghegan/AB_Testing/blob/master/AB_Testing.ipynb) inspired this notebook, uses two things you can determine in advance to calculate effect size:

> Baseline rate — an estimate of the metric being analyzed before making any changes
> Practical significance level — the minimum change to the baseline rate that is useful to the business, for example an increase in the conversion rate of 0.001% may not be worth the effort required to make the change whereas a 2% change will be

In other words, you can determine the minimum amount of change you want to see between your two groups and use that to calculate effect size (different than calculating effect size after the study has been conducted, which isn't ideal).

To do this with statsmodels, since we're doing a test on a proportion, we use: https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.proportion_effectsize.html

In [None]:
# let's grab some useful variables, going ahead and doing for both groups


In [None]:
# baseline is what we expect given what we have
# here, we'll capture that with our percentage of conversions 
baseline_rate = None
practical_significance = 0.01 # user defined - want at last 1% difference here

effect_size = proportion_effectsize(baseline_rate, baseline_rate + practical_significance)

In [None]:
# determine our minimum sample size per group
confidence_level = 0.05 # user defined - want to be 95% confident
power = 0.8 # user defined (1 - beta)

min_sample_size = NormalIndPower().solve_power(effect_size = effect_size, 
                                               power = power, 
                                               alpha = confidence_level)

print(f"Required minimum sample size: {min_sample_size:,.0f} per group")

In [None]:
# Now let's test!
# Using a proportion test (not dealing with means but proportions)


So?

- 
