# Analyze A/B Test Results 


## Introduction

A/B tests are very commonly performed by data analysts and data scientists. For this project, you will be working to understand the results of an A/B test run by an e-commerce website.  Your goal is to work through this notebook to help the company understand if they should:
- Implement the new webpage, 
- Keep the old webpage, or 
- Perhaps run the experiment longer to make their decision.


<a id='probability'></a>
## Part I - Probability

In [1]:
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
random.seed(42)

 Below is the description of the data, there are a total of 5 columns:

<center>

|Data columns|Purpose|Valid values|
| ------------- |:-------------| -----:|
|user_id|Unique ID|Int64 values|
|timestamp|Time stamp when the user visited the webpage|-|
|group|In the current A/B experiment, the users are categorized into two broad groups. <br>The `control` group users are expected to be served with `old_page`; and `treatment` group users are matched with the `new_page`. <br>However, **some inaccurate rows** are present in the initial data, such as a `control` group user is matched with a `new_page`. |`['control', 'treatment']`|
|landing_page|It denotes whether the user visited the old or new webpage.|`['old_page', 'new_page']`|
|converted|It denotes whether the user decided to pay for the company's product. Here, `1` means yes, the user bought the product.|`[0, 1]`|
</center>

In [2]:
df= pd.read_csv('ab_data.csv')
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [3]:
df.shape[0]

294478

The number of unique users in the dataset.

In [4]:
tot = df.user_id.nunique()
tot

290584

The proportion of users converted.

In [5]:
prop = df['converted'].sum()/tot
prop

0.12126269856564711

The number of times when the "group" is `treatment` but "landing_page" is not a `new_page`.

In [6]:
df.query('group == "treatment" and landing_page !="new_page"',inplace=False).shape[0]

1965

In [7]:
df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

 
In a particular row, the **group** and **landing_page** columns should have either of the following acceptable values:

|user_id| timestamp|group|landing_page|converted|
|---|---|---|---|---|
|XXXX|XXXX|`control`| `old_page`|X |
|XXXX|XXXX|`treatment`|`new_page`|X |


It means, the `control` group users should match with `old_page`; and `treatment` group users should matched with the `new_page`. 

However, for the rows where `treatment` does not match with `new_page` or `control` does not match with `old_page`, we cannot be sure if such rows truly received the new or old wepage.  


In [8]:
df2=pd.concat([df.query("group == 'treatment' and landing_page== 'new_page'"),df.query("group == 'control' and landing_page== 'old_page'")])
df2.sample(10)

Unnamed: 0,user_id,timestamp,group,landing_page,converted
228601,890507,2017-01-03 00:56:04.524521,treatment,new_page,0
275749,661417,2017-01-10 08:24:41.949080,treatment,new_page,0
258534,646602,2017-01-20 16:08:39.194439,treatment,new_page,0
189808,657138,2017-01-21 21:31:00.388287,control,old_page,0
278376,768857,2017-01-06 05:37:59.887519,treatment,new_page,0
195214,661341,2017-01-17 21:04:41.000288,treatment,new_page,0
269893,708120,2017-01-20 13:19:46.917577,treatment,new_page,0
153124,813076,2017-01-13 13:22:14.663242,treatment,new_page,0
293612,912988,2017-01-11 06:38:02.577295,treatment,new_page,0
116577,858510,2017-01-03 13:01:19.456626,treatment,new_page,0


In [9]:
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

0

Unique **user_id**s

In [10]:
df2.user_id.nunique()

290584

Duplicates

In [11]:
df2[df2.duplicated('user_id', keep = False)]

Unnamed: 0,user_id,timestamp,group,landing_page,converted
1899,773192,2017-01-09 05:37:58.781806,treatment,new_page,0
2893,773192,2017-01-14 02:55:59.590927,treatment,new_page,0


In [12]:
df2[df2.duplicated('user_id', keep = False)]['user_id'][1899]

773192

In [13]:
df2.drop_duplicates(subset='user_id', inplace= True)
# Check again if the row with a duplicate user_id is deleted or not
df2[df2.duplicated('user_id', keep = False)]

Unnamed: 0,user_id,timestamp,group,landing_page,converted


Probability of an individual converting regardless of the page they receive

In [14]:
df['converted'].mean()

0.11965919355605512

Probabitlity that an individual  in the `control` group converted.

In [15]:
df.query("group=='control'")['converted'].mean()

0.12039917935897611

Probability that an individual  in the `treatment` group converted

In [16]:
df.query("group=='treatment'")['converted'].mean()

0.11891957956489856

In [17]:
# The actual difference (obs_diff) between the conversion rates for the two groups.
df.query("group=='control'")['converted'].mean()-df.query("group=='treatment'")['converted'].mean()

The probability that an individual received the new page

In [None]:
df['landing_page'].value_counts()[0]/df['landing_page'].value_counts().sum()

0.5

Mean value of converstion rate for control is more than treamtment group. Hence, treatment group should not lead to more converstions. The difference is small so we cannot say it for sure yet. This difference can appear by chance. So we can not say for sure using above results.

<a id='ab_test'></a>
## Part II - A/B Test

Since a timestamp is associated with each event, we could run a hypothesis test continuously as long as you observe the events. 

However, then the hard questions would be: 
- Do you stop as soon as one page is considered significantly better than another or does it need to happen consistently for a certain amount of time?  
- How long do you run to render a decision that neither page is better than another?  

These questions are the difficult parts associated with A/B tests in general.  



We assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%. The null and alternative hypotheses are **$H_0$** and **$H_1$** respectively.  

$$
H_0: p_{old}    >=    p_{new}\\
H_1: p_{old}     <    p_{new}
$$

### ToDo 2.2 - Null Hypothesis $H_0$ Testing
Under the null hypothesis $H_0$, we assume that $p_{new}$ and $p_{old}$ are equal. Furthermore, we assume that $p_{new}$ and $p_{old}$ both are equal to the **converted** success rate in the `df2` data regardless of the page. So, our assumption is: <br><br>
<center>
$p_{new}$ = $p_{old}$ = $p_{population}$
</center>


The **conversion rate** for $p_{new}$ under the null hypothesis.

In [None]:
p_new = df2.converted.mean()
p_new

0.11959708724499628

The **conversion rate** for $p_{old}$ under the null hypothesis.

In [None]:
p_old= df2.converted.mean()
p_old

0.11959708724499628

Number of individuals in the treatment group:

In [None]:
n_new = df2.query("landing_page=='new_page'").user_id.nunique()
n_new

145310

Number of individuals in the control group.

In [None]:
n_old = df2.query("landing_page=='old_page'").user_id.nunique()
n_old

145274

We simulate $n_{new}$ transactions with a conversion rate of $p_{new}$ under the null hypothesis.

In [None]:
new_page_converted= np.random.choice([0,1],size=n_new,p=[p_new,1-p_new])

We simulate $n_{old}$ transactions with a conversion rate of $p_{old}$ under the null hypothesis.

In [None]:
old_page_converted= np.random.choice([0,1],size=n_old,p=[p_old,1-p_old])

Difference in the "converted" probability $(p{'}_{new}$ - $p{'}_{old})$ for simulated samples. 

In [None]:
obs_diff = df2['converted'][df2['group'] == 'treatment'].mean() - df2['converted'][df2['group'] == 'control'].mean()
obs_diff

-0.0015782389853555567


**Sampling distribution** <br>
 Now, we re-create `new_page_converted` and `old_page_converted` and find the $(p{'}_{new}$ - $p{'}_{old})$ value 10,000 times using the same simulation process as above. 

In [None]:
new_page_converted= np.random.binomial(n_new, p_new, 10000)/n_new
old_page_converted= np.random.binomial(n_old, p_old, 10000)/n_old
p_diffs=new_page_converted-old_page_converted

In [None]:
plt.hist(p_diffs)
plt.axvline(obs_diff)

In [None]:
(p_diffs >obs_diff).mean()

0.90590000000000004

This value is called p-value. This value needs to be less than Type I error rate for us to reject the null hypothesis. Otherwise, we fail to reject null hypothesis.

In our case, p-value is 0.7495 which is larger than Type I error rate. Hence, we fail to reject null hypothesis. Hence, new page is not better than old page.

In [None]:
import statsmodels.api as sm

# number of conversions with the old_page
convert_old = df2.query("landing_page=='old_page'").converted.sum()

# number of conversions with the new_page
convert_new = df2.query("landing_page=='new_page'").converted.sum()

# number of individuals who were shown the old_page
n_old = df2.query("landing_page=='old_page'").shape[0]

# number of individuals who received new_page
n_new = df2.query("landing_page=='new_page'").shape[0]


In [None]:
import statsmodels.api as sm
# ToDo: Complete the sm.stats.proportions_ztest() method arguments
z_score, p_value = sm.stats.proportions_ztest([convert_old,convert_new],[n_old,n_new],alternative='smaller')
print(z_score, p_value)

1.31092419842 0.905058312759


$z_{alpha} = 1.625$ 
As z_score < $z_{alpha}$,  we fail to reject to null hypothesis which is same as our previous result in j.

<a id='regression'></a>
### Part III - A regression approach

In this case, we need to use logistic regression.

In [None]:
df2['intercept']=1
df2['ab_page']=pd.get_dummies(df2['group'])['treatment']
df2.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted,intercept,ab_page
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0,1,1
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0,1,1
6,679687,2017-01-19 03:26:46.940749,treatment,new_page,1,1,1
8,817355,2017-01-04 17:58:08.979471,treatment,new_page,1,1,1
9,839785,2017-01-15 18:11:06.610965,treatment,new_page,1,1,1


In [None]:
df2.sample(10)

Unnamed: 0,user_id,timestamp,group,landing_page,converted,intercept,ab_page
223863,899112,2017-01-13 20:17:41.022021,control,old_page,0,1,0
122023,890393,2017-01-15 11:45:39.411310,treatment,new_page,0,1,1
86777,939387,2017-01-20 12:13:33.054663,treatment,new_page,0,1,1
281896,869449,2017-01-12 23:50:48.572153,treatment,new_page,1,1,1
22522,740736,2017-01-24 09:28:24.794640,control,old_page,0,1,0
147712,785875,2017-01-14 23:56:15.761462,control,old_page,0,1,0
157402,766359,2017-01-17 21:16:56.409984,treatment,new_page,0,1,1
246644,687008,2017-01-11 14:10:41.818209,control,old_page,0,1,0
92284,861888,2017-01-04 12:18:35.757955,control,old_page,1,1,0
263781,819028,2017-01-13 21:05:29.861100,treatment,new_page,0,1,1


In [None]:
logistic_regression = sm.Logit(df2['converted'],df2[['intercept','ab_page']])
res = logistic_regression.fit()

Optimization terminated successfully.
         Current function value: 0.366118
         Iterations 6


In [None]:
res.summary2()

0,1,2,3
Model:,Logit,No. Iterations:,6.0
Dependent Variable:,converted,Pseudo R-squared:,0.0
Date:,2022-02-14 10:57,AIC:,212780.3502
No. Observations:,290584,BIC:,212801.5095
Df Model:,1,Log-Likelihood:,-106390.0
Df Residuals:,290582,LL-Null:,-106390.0
Converged:,1.0000,Scale:,1.0

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
intercept,-1.9888,0.0081,-246.6690,0.0000,-2.0046,-1.9730
ab_page,-0.0150,0.0114,-1.3109,0.1899,-0.0374,0.0074


In part II, p-value is 0.1899. In logistic regression in Part III, the hypothesis is 2 sided whereas in Part II, the hypothesis is one sided. Because of this, p value in part III is twice as part II.

There can be many factors influencing converstaion rate. It is good idea to add them to regression model because they may have huge influence on results which might get neglected otherwise. The disadvantage is that it can affect result in any direction.

In [None]:
# Read the countries.csv
countries=pd.read_csv('countries.csv')

In [None]:
# Join with the df2 dataframe
df_merged = countries.set_index('user_id').join(df2.set_index('user_id'), how='inner')
df_merged.head()

Unnamed: 0_level_0,country,timestamp,group,landing_page,converted,intercept,ab_page
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
834778,UK,2017-01-14 23:08:43.304998,control,old_page,0,1,0
928468,US,2017-01-23 14:44:16.387854,treatment,new_page,0,1,1
822059,UK,2017-01-16 14:04:14.719771,treatment,new_page,1,1,1
711597,UK,2017-01-22 03:14:24.763511,control,old_page,0,1,0
710616,UK,2017-01-16 13:14:44.000513,treatment,new_page,0,1,1


In [None]:
# Create the necessary dummy variables
df_merged[['US','UK']]=pd.get_dummies(df_merged['country'])[['US','UK']]
df_merged['US_ab_page']=df_merged['US']*df_merged['ab_page']
df_merged['UK_ab_page']=df_merged['UK']*df_merged['ab_page']
df_merged.head()

Unnamed: 0_level_0,country,timestamp,group,landing_page,converted,intercept,ab_page,US,UK
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
834778,UK,2017-01-14 23:08:43.304998,control,old_page,0,1,0,0,1
928468,US,2017-01-23 14:44:16.387854,treatment,new_page,0,1,1,1,0
822059,UK,2017-01-16 14:04:14.719771,treatment,new_page,1,1,1,0,1
711597,UK,2017-01-22 03:14:24.763511,control,old_page,0,1,0,0,1
710616,UK,2017-01-16 13:14:44.000513,treatment,new_page,0,1,1,0,1


In [None]:
# Fit your model, and summarize the results
model = sm.Logit(df_merged['converted'],df_merged[['intercept','ab_page','US','UK','US_ab_page','UK_ab_page']])
res2=model.fit()
res2.summary2()

Optimization terminated successfully.
         Current function value: 0.366113
         Iterations 6


0,1,2,3
Model:,Logit,No. Iterations:,6.0
Dependent Variable:,converted,Pseudo R-squared:,0.0
Date:,2022-02-14 11:18,AIC:,212781.1253
No. Observations:,290584,BIC:,212823.4439
Df Model:,3,Log-Likelihood:,-106390.0
Df Residuals:,290580,LL-Null:,-106390.0
Converged:,1.0000,Scale:,1.0

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
intercept,-2.0300,0.0266,-76.2488,0.0000,-2.0822,-1.9778
ab_page,-0.0149,0.0114,-1.3069,0.1912,-0.0374,0.0075
US,0.0408,0.0269,1.5161,0.1295,-0.0119,0.0934
UK,0.0506,0.0284,1.7835,0.0745,-0.0050,0.1063


All of the p-values are still greater than Type I error rate for every category. Hence, we still fail to reject null hypothesis for all countries.