# Analyze A/B Test Results to Determine the Conversion Rate of a New Web Page

#### Table of Contents
- Introduction
- Part I - Probability
- Part II - A/B Test
- Part III - Regression
- Conclusion

### Introduction

For this project, we will be working to understand the results of an A/B test run by an e-commerce website. The goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.

### Part I - Probability

In [2]:
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
# Setting the seed to assure you get the same answers on quizzes as we set up
random.seed(42)

In [3]:
df = pd.read_csv('../Data/ab_data.csv')
df.head(5)

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [4]:
# The number of rows in the dataset
df.shape

(294478, 5)

In [5]:
# The number of unique users in the dataset
df.user_id.nunique()

290584

In [10]:
# The proportion of users converted.
temp = df[(df['group']=='control') & (df['landing_page']=='old_page') | (df['group']=='treatment') & (df['landing_page']=='new_page')]
temp.groupby('group').converted.mean()

group
control      0.120386
treatment    0.118807
Name: converted, dtype: float64

In [9]:
#The number of times the new_page and treatment don't match.
df_control = df.query('group == "control"')
df_treat = df.query('group == "treatment"')
df_control_wrong = df_control.query('landing_page == "new_page"')
df_treat_wrong = df_treat.query('landing_page == "old_page"')
df_control_wrong.shape[0] + df_treat_wrong.shape[0]

3893

In [11]:
# Do any of the rows have missing values?
df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

In [12]:
# Drop rows with the wrong values 
df2 = df.drop(df_control_wrong.index)
df2.drop(df_treat_wrong.index, inplace = True)

# Double Check all of the correct rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

0

In [13]:
# How many unique user_ids are in df2?
df2.user_id.nunique()


290584

In [16]:
df2.shape[0]

290585

There is one duplicate user_id in df2. Check what's the user_id and remove it.

In [17]:
#Find dups and put into a df in case we need in future
dup_df = df2[df2.user_id.duplicated(keep = False)]
dup_df

Unnamed: 0,user_id,timestamp,group,landing_page,converted
1899,773192,2017-01-09 05:37:58.781806,treatment,new_page,0
2893,773192,2017-01-14 02:55:59.590927,treatment,new_page,0


In [18]:
#Drop duplicates on user id, removing inplace from df
df2.drop_duplicates(subset = "user_id", inplace = True)

In [19]:
#Confirm our user id still exists but only once
df2.query('user_id == 773192')

Unnamed: 0,user_id,timestamp,group,landing_page,converted
1899,773192,2017-01-09 05:37:58.781806,treatment,new_page,0


In [21]:
# Calculate the probability of converting conditional on an individual being in the control group.
control_conv = df2.query('group == "control"').converted.mean()
# Calculate the probability of converting conditional on an individual being in the treatment group.
treat_conv = df2.query('group == "treatment"').converted.mean()
# Calculate the probability of converting condition regardless of the page they receive.
conv = df2.converted.mean()
# print the results
print(f"Probability of converting conditional on an individual being in the control group: {control_conv}")
print(f"Probability of converting conditional on an individual being in the treatment group: {treat_conv}")
print(f"Probability of converting condition regardless of the page they receive: {conv}")


Probability of converting conditional on an individual being in the control group: 0.1203863045004612
Probability of converting conditional on an individual being in the treatment group: 0.11880806551510564
Probability of converting condition regardless of the page they receive: 0.11959708724499628


In [22]:
# What is the probability that an individual received the new page?
p_new_page = df2[df2['landing_page'] == 'new_page'].user_id.count()\
/df2.user_id.count()
p_new_page

0.5000619442226688

`Observations`

- If we look at these descriptive statistics results, the split of those who are in the control group and the treatment group is about 50%, giving an equal representation of each.

- The conversion rate for the control group is around 12%, while for treatment group is just around 11.88%.

- This would lead us to believe that there is no significant impact of using the new page on the treatment group, since conversion rate is not significantly positivly affected.