# Case Study: Homepage Experiment

Let's say that you're working for a fictional productivity software company that is looking for ways to increase the number of people who pay for their software. The way that the software is currently set up, users can download and use the software free of charge, for a 7-day trial. After the end of the trial, users are required to pay for a license to continue using the software.

One idea that the company wants to try is to change the layout of the homepage to emphasize more prominently and higher up on the page that there is a 7-day trial available for the company's software. The current fear is that some potential users are missing out on using the software because of a lack of awareness of the trial period. If more people download the software and use it in the trial period, the hope is that this entices more people to make a purchase after seeing what the software can do.

In [11]:
import numpy as np
import pandas as pd
import statsmodels.stats.proportion as prop_tests

In [2]:
df = pd.read_csv('homepage-experiment-data.csv')
df.head()

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
0,1,1764,246,1,1850,339,3
1,2,1541,234,2,1590,281,2
2,3,1457,240,1,1515,274,1
3,4,1587,224,1,1541,284,2
4,5,1606,253,2,1643,292,3


## Check the Invariant Metric

The invariant metric is the number of people in each condition/group. We want these numbers to be relatively similar. 

In [4]:
total_cookies = df[['Control Cookies', 'Experiment Cookies']].sum()
total_cookies

Control Cookies       46851
Experiment Cookies    47346
dtype: int64

In [7]:
prop_ctl = total_cookies[0]/total_cookies.sum()
prop_exp = total_cookies[1]/total_cookies.sum()
prop_ctl, prop_exp

(0.4973725277875092, 0.5026274722124908)

This doesn't look like it is statistically different, but let's double check.

In [None]:
# # simulate outcomes under null, compare to observed outcome
p = 0.5
n_trials = 200_000

samples = np.random.binomial(n_obs, p, n_trials)

print(np.logical_or(samples <= n_control, samples >= (n_obs - n_control)).mean())

In [8]:
# Collect the relevant stats
n_obs = total_cookies.sum()
n_control = total_cookies[0]

# Simulate the null
p = 0.5
n_trials = 200_000
samples = np.random.binomial(n_obs, p, n_trials)

In [9]:
samples

array([47158, 47471, 47063, ..., 47233, 47299, 47334])

This provides us with a simulation of the number of people in the control group.

We then compare this to discover how many samples were less than what is in `n_control` (as the lower extreme), and how many samples were greater than what is in `n_experiment` (the inverse of n_control) and find the mean to find the p-value.

In [10]:
np.logical_or(samples <= n_control, samples >= (n_obs - n_control)).mean()

0.106285

This p-value is higher than the standard 0.05, so we will say that **there is statistical evidence that there is not a difference in the number of people assigned to each role.**

## Check the Evaluation Metrics

A reminder that because we are conducting two tests, we are performing a Bonferroni correction and the $\alpha$ level that we are comparing to is $0.05/2 = 0.025$.

### Download Rate

In [15]:
total_downloads = df[['Control Downloads', 'Experiment Downloads']].sum()
total_downloads

Control Downloads       7554
Experiment Downloads    8548
dtype: int64

In [16]:
prop_tests.proportions_ztest([total_downloads[1], total_downloads[0]], [n_obs - n_control, n_control], alternative='larger')

(7.870833726066236, 1.7614279636728079e-15)

These results suggest that there is a significant difference in the number of downloads associated with the experiment group compared to the control group.

In [18]:
conversion_control = total_downloads[0]/total_cookies[0]
conversion_experiment = total_downloads[1]/total_cookies[1]
conversion_experiment - conversion_control

0.01930868281829759

The change in the location of the homepage resulted in an increase to the conversion rate of 1.9%. It was decided that an increase of 1.5% in conversion rate was enough to consider the change practically significant, so the **Download Rate conversion metric suggests that the page changes were a success.**

### Purchase Rate

One tricky point to consider is that there is a seven or eight day delay between when most people download the software and when they make a purchase. There's no direct way of attributing cookies all the way through license purchases due to the daily aggregation of results, so the best we can do is to make a justified argument for handling the data. To answer the question below about the license purchasing rate, you should only take the cookies observed through day 21 as the denominator of the ratio as being responsible for all of the license purchases observed.

In [20]:
df_21days = df[df.Day <= 21]
df_21days.tail()

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
16,17,1573,252,32,1551,295,36
17,18,1603,260,33,1607,281,27
18,19,1596,263,29,1625,289,29
19,20,1817,320,35,1780,315,23
20,21,1602,271,38,1588,256,44


In [21]:
total_cookies_21days = df_21days[['Control Cookies', 'Experiment Cookies']].sum()
total_cookies_21days

Control Cookies       33758
Experiment Cookies    34338
dtype: int64

Technically it would be a good idea to check the variance in this metric to make sure that these values aren't significantly different, but based on the values, it should be ok.

In [22]:
total_purchases = df[['Control Licenses', 'Experiment Licenses']].sum()
total_purchases

Control Licenses       710
Experiment Licenses    732
dtype: int64

In [23]:
prop_tests.proportions_ztest([total_purchases[1], total_purchases[0]], 
                             [total_cookies_21days[1], total_cookies_21days[0]], alternative='larger')

(0.2586750111658684, 0.3979430008399871)

In [24]:
purchase_rate_control = total_purchases[0]/total_cookies_21days[0]
purchase_rate_experiment = total_purchases[1]/total_cookies_21days[1]
purchase_rate_experiment - purchase_rate_control

0.00028543916466129693

This result is not significant and so we would fail to reject the null, and say that **there is no difference in the number of purchases made between the two pages.** The size of the difference is slightly larger for the new layout, but not at a significant level.

This may make some sense as the assumption that people weren't purchasing more because they didn't know about the trial may not be correct. There may be other reasons that people are not purchasing the product after downloading the trial.

## Conclusions
Our initial critieria were that if either test was significant, we would consider the test a success and proceed with the change. Therefore, we would consider this **test successful and implement the new homepage layout.**

That being said, we may want to do additional exploration into factors influencing purchasing rates, because these results suggest that we should not see an improvement in revenue as a result of these changes.