# The Design and Application of A/B Testing

## A/B Test

It's an experiment where you...
- Test two or more variants against each other
- to evaluate which one perform "best"
- in the contex of a randomized experiment

**Control and treatment groups**

Testing two or more ideas against each other:
- **Control:** The current state of your product
- **Treatment:** The variant that you want to test
    
## Initial A/B test design

**Responsabe variable**
- The quantity used to measure the impact of your change
- Should either be a KPI or directly related to a KPI
- The easier to measure, the better
    
**Factors & variants**

- **Factors:** the type of variable you are changing
    - *The paywall color*
- **Variants:** Particular changes you are testing
    - *A red vs a blue paywall*

**Experimental unit of our test**
- The smallest unit you are measuring the change over
- Individual users make a convenient experimental unit

In [None]:
# Extract the 'day'; value from the timestamp
purchase_data.date = purchase_data.date.dt.floor('d')

# Replace the NaN price values with 0 
purchase_data.price = np.where(np.isnan(purchase_data.price), 0, purchase_data.price)

# Aggregate the data by 'uid' & 'date'
purchase_data_agg = purchase_data.groupby(by=['uid', 'date'], as_index=False)
revenue_user_day = purchase_data_agg.sum()

# Calculate the final average
revenue_user_day = revenue_user_day.price.mean()
print(revenue_user_day)

## Preparing to run an A/B test

**Test Sensitivity**
- **First question:** what size of impact in meaningful to detect?
    - 1%?
    - 20%?
- Smaller changes = moe difficult to detect
    - can be hidden by randomness
- **Sensitivity:** The minimum level of change we want to be able to detect in our test
    - Evaluate different sensitivity values



In [None]:
# Merge and group the datasets
purchase_data = demographics_data.merge(paywall_views,  how='inner', on=['uid'])
purchase_data.date = purchase_data.date.dt.floor('d')

# Group and aggregate our combined dataset 
daily_purchase_data = purchase_data.groupby(by=['date'], as_index=False)
daily_purchase_data = daily_purchase_data.agg({'purchase': ['sum', 'count']})

# Find the mean of each field and then multiply by 1000 to scale the result
daily_purchases = daily_purchase_data.purchase['sum'].mean()
daily_paywall_views = daily_purchase_data.purchase['count'].mean()
daily_purchases = daily_purchases * 1000
daily_paywall_views = daily_paywall_views * 1000

print(daily_purchases)
print(daily_paywall_views)

In [None]:
large_sensitivity = 0.5

# Find the conversion rate lift with the sensitivity above
large_conversion_rate = conversion_rate * (1 + large_sensitivity)

# Find how many more users per day that translates to
large_purchasers = daily_paywall_views * large_conversion_rate
purchaser_lift = large_purchasers - daily_purchases

print(large_conversion_rate)
print(large_purchasers)
print(purchaser_lift)

In [None]:
# Find the number of paywall views 
n = purchase_data.purchase.count()

# Calculate the quantitiy "v"
v = conversion_rate * (1 - conversion_rate) 

# Calculate the variance and standard error of the estimate
var = v / n 
se = var**0.5

print(var)
print(se)

## Calculating the sample size

**Power:** the probability of rejecting the null hypothesys when the alternative hypothesis is true

In [None]:
# Look at the impact of sample size increase on power
n_param_one = get_power(n=1000, p1=p1, p2=p2, cl=cl)
n_param_two = get_power(n=2000, p1=p1, p2=p2, cl=cl)

# Look at the impact of confidence level increase on power
alpha_param_one = get_power(n=n1, p1=p1, p2=p2, cl=0.8)
alpha_param_two = get_power(n=n1, p1=p1, p2=p2, cl=0.95)
    
# Compare the ratios
print(n_param_two / n_param_one)
print(alpha_param_one / alpha_param_two)

In [None]:
def get_sample_size(power, p1, p2, cl, max_n=1000000):
    n = 1 
    while n <= max_n:
        tmp_power = get_power(n, p1, p2, cl)

        if tmp_power >= power: 
            return n 
        else: 
            n = n + 100

    return "Increase Max N Value"