# Website A/B Testing - Lab

## Introduction

In this lab, you'll get another chance to practice your skills at conducting a full A/B test analysis. It will also be a chance to practice your data exploration and processing skills! The scenario you'll be investigating is data collected from the homepage of a music app page for audacity.

## Objectives

You will be able to:
* Analyze the data from a website A/B test to draw relevant conclusions
* Explore and analyze web action data

## Exploratory Analysis

Start by loading in the dataset stored in the file 'homepage_actions.csv'. Then conduct an exploratory analysis to get familiar with the data.

> Hints:
    * Start investigating the id column:
        * How many viewers also clicked?
        * Are there any anomalies with the data; did anyone click who didn't view?
        * Is there any overlap between the control and experiment groups? 
            * If so, how do you plan to account for this in your experimental design?

In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')

In [2]:
# Load and preview data
df = pd.read_csv('homepage_actions.csv')
df.head()

Unnamed: 0,timestamp,id,group,action
0,2016-09-24 17:42:27.839496,804196,experiment,view
1,2016-09-24 19:19:03.542569,434745,experiment,view
2,2016-09-24 19:36:00.944135,507599,experiment,view
3,2016-09-24 19:59:02.646620,671993,control,view
4,2016-09-24 20:26:14.466886,536734,experiment,view


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8188 entries, 0 to 8187
Data columns (total 4 columns):
timestamp    8188 non-null object
id           8188 non-null int64
group        8188 non-null object
action       8188 non-null object
dtypes: int64(1), object(3)
memory usage: 256.0+ KB


In [4]:
df.action.value_counts()

view     6328
click    1860
Name: action, dtype: int64

So the 8,188 observations represent 6,328 views and 1,860 clicks. Some ids occur twice, others once. This makes me think that there were 6,328 participants total, all of whom viewed the site, and 1,860 of those also clicked.

In [5]:
df.groupby('group')['action'].value_counts()

group       action
control     view      3332
            click      932
experiment  view      2996
            click      928
Name: action, dtype: int64

In [6]:
# How many viewers also clicked?
sum(df.groupby('id')['action'].count() > 1)

1860

In [7]:
# Did anyone click without viewing?
# Compare lists of ids associated with each action to find out.
viewed_ids = df[df.action == 'view']['id']
clicked_ids = df[df.action == 'click']['id']

check = [1 for x in viewed_ids if x in clicked_ids]
sum(check)

0

In [8]:
# Do control and experimental groups overlap?
# Get unique ids for each
control_ids = df[df.group == 'control']['id'].unique()
exp_ids = df[df.group == 'experiment']['id'].unique()

# Check for ids duplicated across groups
check = [1 for x in control_ids if x in exp_ids]
sum(check)

0

There seem to be no ids that appear in both the control and experimental groups. If there were, I would remove all observations associated with those ids from the dataset.

## Conduct a Statistical Test

Conduct a statistical test to determine whether the experimental homepage was more effective than that of the control group.

Question: Did more users click on the experimental version of the site or the control?

H0: control click rate = experiment click rate
Ha: control click rate =/= experiment click rate

This will be a two-tailed test. I will use Welch's t-test and then compare with ANOVA.

Here are the base parameters for the analysis:

alpha = 0.5
power = 0.8

In [9]:
# Pivot and split data into control and experiment samples
df['count'] = 1

control = df[df.group=='control'].pivot(index='id', columns='action', values='count')
control = control.fillna(value=0)

experiment = df[df.group=='experiment'].pivot(index='id', columns='action', values='count')
experiment = experiment.fillna(value=0)



print("Sample sizes:\tControl: {}\tExperiment: {}".format(len(control), len(experiment)))
print("Total Clicks:\tControl: {}\tExperiment: {}".format(control.click.sum(), experiment.click.sum()))
print("Average click rate:\tControl: {}\tExperiment: {}".format(control.click.mean(), experiment.click.mean()))
control.head()

Sample sizes:	Control: 3332	Experiment: 2996
Total Clicks:	Control: 932.0	Experiment: 928.0
Average click rate:	Control: 0.2797118847539016	Experiment: 0.3097463284379172


action,click,view
id,Unnamed: 1_level_1,Unnamed: 2_level_1
182994,1.0,1.0
183089,0.0,1.0
183248,1.0,1.0
183515,0.0,1.0
183524,0.0,1.0


In [10]:
# Run Welch's t-test
from welch_functions import *

p_value_welch_ttest(control.click, experiment.click)

0.004466402814337078

Welch's t-test yields a p-value of 0.004, which beats our threshold for significance (<0.05). The click-through rate for the experimental version of the site is significantly higher than the control. 

## Verifying Results

One sensible formulation of the data to answer the hypothesis test above would be to create a binary variable representing each individual in the experiment and control group. This binary variable would represent whether or not that individual clicked on the homepage; 1 for they did and 0 if they did not. 

The variance for the number of successes in a sample of a binomial variable with n observations is given by:

## $n\bullet p (1-p)$

Given this, perform 3 steps to verify the results of your statistical test:
1. Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. 
2. Calculate the number of standard deviations that the actual number of clicks was from this estimate. 
3. Finally, calculate a p-value using the normal distribution based on this z-score.

### Step 1:
Calculate the expected number of clicks for the experiment group, if it had the same click-through rate as that of the control group. 

In [11]:
# Calculate actual clickrates
control_clicks = len(df[(df.group == 'control') & (df.action == 'click')])
print('Control group clicks:', control_clicks)
control_clickrate = control_clicks / len(control)
print('Control click rate:', control_clickrate)

exp_clicks_actual = len(df[(df.group == 'experiment') & (df.action == 'click')])
print('Exp. group clicks:', exp_clicks_actual)
exp_clickrate_actual = exp_clicks_actual / len(experiment)
print('Exp. group clickrate:', exp_clickrate_actual)

Control group clicks: 932
Control click rate: 0.2797118847539016
Exp. group clicks: 928
Exp. group clickrate: 0.3097463284379172


In [12]:
# Calculate expected clickrates
exp_clicks_pred = control_clickrate * len(experiment)
print('Expected clicks in exp. group:', exp_clicks_pred)
exp_clickrate_pred = exp_clicks_pred / len(experiment)
print('Expected clickrate in exp. group:', exp_clickrate_pred)

Expected clicks in exp. group: 838.0168067226891
Expected clickrate in exp. group: 0.2797118847539016


### Step 2:
Calculate the number of standard deviations that the actual number of clicks was from this estimate.

In [13]:
# Calculate z-score
exp_n = len(experiment)
control_p = control_clickrate
var = exp_n * control_p * (1-control_p)
std = np.sqrt(var)
print('Control st. dev.:', std)

exp_clicks_difference = exp_clicks_actual - exp_clicks_pred
z_score = exp_clicks_difference / std
print('z-score:', z_score)

Control st. dev.: 24.568547907005815
z-score: 3.6625360854823588


### Step 3: 
Finally, calculate a p-value using the normal distribution based on this z-score.

In [14]:
# Calculate p-value
import scipy.stats

p_value = 1 - stats.norm.cdf(z_score)
print('p-value:', p_value)

p-value: 0.00012486528006949715


### Analysis:

Does this result roughly match that of the previous statistical test?

> Comment: The z-test returned an even smaller p-value. We can confidently reject the hypothesis that there was no difference in the performance of the two versions.

## Summary

In this lab, you continued to get more practice designing and conducting AB tests. This required additional work preprocessing and formulating the initial problem in a suitable manner. Additionally, you also saw how to verify results, strengthening your knowledge of binomial variables, and reviewing initial statistical concepts of the central limit theorem, standard deviation, z-scores, and their accompanying p-values.