# Marketing Analysis HW1
### 2020 NYU Stern - NYU Shanghai Fall Semester
#### By: Ruijie Cai, Chengzhuo Huang, Qinyan Song, Jiawen Wang

# Setup Experiment

We will run an A/B test for Booking.com's hypothesis that users value the area around the location of their rental room (hotel, private residence, etc.). The goal of running an A/B test is to evaluate whether Alternative B improved conversion rates (site users book the property) over alternative A.

In [16]:
import pandas as pd
data=pd.read_csv('AB_test_data.csv')
data

Unnamed: 0,purchase_TF,Variant,date,id
0,False,A,2019-11-08,0x25b44a
1,False,B,2020-08-27,0x46271e
2,False,A,2020-06-11,0x80b8f1
3,False,B,2020-08-22,0x8d736d
4,False,A,2020-08-05,0x96c9c8
...,...,...,...,...
129995,False,A,2020-07-23,0x4089c2
129996,False,A,2020-06-24,0x6a5e3a
129997,False,A,2019-10-12,0x95e302
129998,False,A,2020-03-18,0x7c4afa


In [17]:
summary = data.pivot_table(values='purchase_TF', index='Variant', aggfunc=lambda x: len(x))
summary = summary.rename(columns = {"purchase_TF":"count"})
summary

Unnamed: 0_level_0,count
Variant,Unnamed: 1_level_1
A,125000
B,5000


# Q1: Conduct an A/B test to determine whether Alternative B improved conversion rates (site users book the property) over alternative A.

In [18]:
test_A=data[data['Variant']=='A']
test_B=data[data['Variant']=='B']

In [19]:
summary_A = test_A.pivot_table(values='Variant', index='purchase_TF', aggfunc=lambda x: len(x))
summary_A = summary_A.rename(columns = {"Variant":"count"})
summary_A

Unnamed: 0_level_0,count
purchase_TF,Unnamed: 1_level_1
False,106298
True,18702


In [20]:
summary_B = test_B.pivot_table(values='Variant', index='purchase_TF', aggfunc=lambda x: len(x))
summary_B = summary_B.rename(columns = {"Variant":"count"})
summary_B

Unnamed: 0_level_0,count
purchase_TF,Unnamed: 1_level_1
False,4117
True,883


In [21]:
cr_A = 18702/125000
cr_B = 883/5000

In [22]:
cr_A

0.149616

In [23]:
cr_B

0.1766

In [24]:
cr_B - cr_A

0.026984000000000008

In [25]:
import numpy

Z = (cr_B - cr_A)/numpy.sqrt(cr_A*(1-cr_A)/5000)
Z

5.349273094732516

Hypothesis Testing：

$H_0: P_b <= P$

$H_a: P_b > P$
 
z_score = 5.349273094732516 > 1.96 (95%)

Therefore, we reject $H_0$

Alternative B improved conversion rates (site users book the property) over alternative A.

# Q2: Calculate the optimal sample size for a 95% confidence rate and test with 80% power. Conduct the test 10 times using samples of the optimal size. Report results.


Let’s say the conversion rate of your product page with the existing icon is 14.9616% (baseline conversion rate). You assume that the new icon should have at least a 17% conversion rate for you to use it instead of the existing icon.

So, you have to configure an experiment in such a way that it declares the winner when the conversion rate difference is at least 16.9616% – 14.9616% = 2%. To set that up, you have to count your estimated MDE. 

MDE is calculated as a percent of the baseline conversion rate: 

MDE = desired conversion rate lift / baseline conversion rate x 100%
MDE = 2/14.9616

In [26]:
from scipy import stats
import scipy
import numpy as np
norm = scipy.stats.norm()

A = test_A['purchase_TF'].tolist()
B = test_B['purchase_TF'].tolist()
mean_A = np.mean(A)
mean_B = np.mean(B)
var_A = np.var(A)
var_B = np.var(B)
#mde = norm.ppf(0.8)*np.sqrt(var_A/len(A)+var_B/len(B))+(mean_A-mean_B)
mde = cr_B - cr_A
p = (mean_A+mean_B)/2
n = (((norm.ppf(0.975)*np.sqrt(2*p*(1-p)) + norm.ppf(0.8)*np.sqrt(mean_A*(1-mean_A)+mean_B*(1-mean_B))))**2)/(mde**2)
print(n)

2941.6814032458096


In [35]:
from random import sample
import math

def t_test(A,B,confidence):
    s = 0
    for i in range(len(A)):
        s += (A[i]-mean_A)**2
    for j in range(len(B)):
        s += (B[j]-mean_B)**2
    s = np.sqrt(s/(len(A)+len(B)-2))
    t = (mean_B-mean_A)/(s*np.sqrt(1/len(A)+1/len(B)))
    if t >= norm.ppf(confidence):
        print('The improvment of B is statistically siginificant.')
    else:
        print('The improvment of B is not statistically siginificant.')
    print('t score is: ', t)
    return t

size  = math.ceil(n)
for i in range(10):
    B_sample = sample(B,int(n))
    t_test(A,B_sample,0.95)

The improvment of B is statistically siginificant.
t score is:  4.049151010465374
The improvment of B is statistically siginificant.
t score is:  4.047387573530014
The improvment of B is statistically siginificant.
t score is:  4.048830214121312
The improvment of B is statistically siginificant.
t score is:  4.048990602762236
The improvment of B is statistically siginificant.
t score is:  4.049873081125648
The improvment of B is statistically siginificant.
t score is:  4.0482690039385085
The improvment of B is statistically siginificant.
t score is:  4.0478682823709
The improvment of B is statistically siginificant.
t score is:  4.049391657769989
The improvment of B is statistically siginificant.
t score is:  4.046186550295568
The improvment of B is statistically siginificant.
t score is:  4.046826962945632


# Q3: Conduct a sequential test for the 10 samples. For any of the samples, were you able to stop the test prior to using the full sample? What was the average number of iterations required to stop the test?

In [56]:
import random

iteration = list() 
size  = math.ceil(n)
for k in range(10):

    ln_A = np.log(1/0.05)
    ln_B = np.log(1-0.8)
    total_sample = A+B

    random.shuffle(total_sample)
    p_A = np.mean(A)
    p_B = np.mean(B)
    accumulative_log_lamda = 0
    i=0
    while (accumulative_log_lamda>ln_B) and (accumulative_log_lamda <ln_A):
        if total_sample[i]:
            accumulative_log_lamda += np.log(p_A/p_B)
        else:
            accumulative_log_lamda += np.log((1-p_A)/(1-p_B))
        i+=1

        if i >= n:
            print('Cannot reject or accept H0')
            break
    if accumulative_log_lamda <= ln_B:
        print('Fail to reject H0 in {} trials'.format(i))
    elif accumulative_log_lamda >= ln_A:
        print('Reject H0 in {} trials'.format(i))
    iteration.append(i)
print(np.average(iteration))

Fail to reject H0 in 75 trials
Reject H0 in 1217 trials
Reject H0 in 419 trials
Fail to reject H0 in 195 trials
Reject H0 in 984 trials
Reject H0 in 2796 trials
Reject H0 in 978 trials
Reject H0 in 978 trials
Reject H0 in 738 trials
Reject H0 in 708 trials
908.8
