

In this case study, you’ll analyze A/B test results for Audacity. Here's the customer funnel for typical new users on their site:

View home page > Explore courses > View course overview page > Enroll in course > Complete course

Audacity loses users as they go down the stages of this funnel, with only a few making it to the end. To increase student engagement, Audacity is performing A/B tests to try out changes that will hopefully increase conversion rates from one stage to the next.

We’ll analyze test results for two changes they have in mind, and then make a recommendation on whether they should launch each change.

The first change Audacity wants to try is on their homepage. They hope that this new, more engaging design will increase the number of users that explore their courses, that is, move on to the second stage of the funnel.

The metric we will use is the click through rate for the Explore Courses button on the home page. Click through rate (CTR) is often defined as the the number of clicks divided by the number of views. Since Audacity uses cookies, we can identify unique users and make sure we don't count the same one multiple times. For this experiment, we'll define our click through rate as:

CTR: # clicks by unique users / # views by unique users

Now that we have our metric, let's set up our null and alternative hypotheses:

H_0: CTR_{new} \leq CTR _{old} H 0 ​ :CTR new ​ ≤CTR old ​

H_1: CTR_{new} > CTR _{old} H 1 ​ :CTR new ​ >CTR old ​

Our alternative hypothesis is what we want to prove to be true, in this case, that the new homepage design has a higher click through rate than the old homepage design. And the null hypothesis is what we assume to be true before analyzing data, which is that the new homepage design has a click through rate that is less than or equal to that of the old homepage design. As you’ve seen before, we can rearrange our hypotheses to look like this:

H_0: CTR_{new} - CTR_{old} \leq 0 H 0 ​ :CTR new ​ −CTR old ​ ≤0 H_1: CTR_{new} - CTR_{old} > 0 H 1 ​ :CTR new ​ −CTR old ​ >0

In [None]:
import pandas as pd
import numpy as np
from tqdm import *
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline


df = pd.read_csv('../input/homepage_actions.csv')
df.info() # getting the basic info about our dataset

In [None]:
# total number of actions
df.action.count()


In [None]:
# number of unique users
df.id.nunique()

In [None]:
# size of the control and the experiment group
df.groupby('group').nunique()['id']

In [None]:
# converting the timestamp column to datetime format
df['timestamp'] = pd.to_datetime(df['timestamp'])

In [None]:
# duration of experiment
df.timestamp.max() - df.timestamp.min()

In [None]:
# experiment group users
experiment_gr = df.query("group == 'experiment'")

In [None]:
# click through rate for experiment group users
experiment_ctr = experiment_gr.query("action == 'click'").nunique()['id']/experiment_gr.query("action == 'view'").nunique()['id']
experiment_ctr

In [None]:
# control group users
control_gr = df.query("group == 'control'")

In [None]:
# click through rate for experiment group users
control_ctr = control_gr.query("action == 'click'").nunique()['id']/control_gr.query("action == 'view'").nunique()['id']
control_ctr

In [None]:
# bootstrapping the sampling distribution
diffs = []
for i in tqdm(range(10000)):
    sample = df.sample(4000 ,replace = True)
    experiment_gr = sample.query("group == 'experiment'")
    control_gr = sample.query("group == 'control'")
    experiment_ctr = experiment_gr.query("action == 'click'").nunique()['id']/experiment_gr.query("action == 'view'").nunique()['id']
    control_ctr = control_gr.query("action == 'click'").nunique()['id']/control_gr.query("action == 'view'").nunique()['id']
    diffs.append( experiment_ctr - control_ctr)

In [None]:
# ploting the sampling distribution
plt.hist(diffs)

In [None]:
# null values
null_vals = np.random.normal(0 ,np.array(diffs).std() ,10000)

In [None]:
# ploting null values
# with the observed stats
plt.hist(null_vals)
plt.axvline(x = np.array(diffs).mean() ,color = 'r') # we can see the observed stats are way out of the range of mean null vals

In [None]:
# getting p-values
p_vals = (null_vals > np.array(diffs).mean()).mean()
p_vals # with a p-value of less then 1 % , we can safely reject the null

Thus we can conclude that the click through rate of the new page is definitely better  then old page. i recommend audacity to launch the new web page on the basis of the above reasoning