# A/B TESTING WEBSITE PAGE
N is digital media company that has website to distribute its videos and articles. The company wants to determine which one of its user interface will give better impact so users will explore more the website. The company has two landing pages with different coloured and layout. The experiment designed was to change its homepage to a more engaging design, which aimed to increase the number of users that move on to the second stage of the funnel, that is to explore the website. The metric used was the **click through rate (CTR) for the Explore Website button on the home page**. We defined CTR as the number of unique visitors who click at least once divided by the number of unique visitors who view the page.
<br>

The hypothesis is something like this based on the PICOT : <br>
- Null Hypothesis : The Visitors of website (P - population) who recieve Layout B (Treatment Group) will not have higher conversion rate as compared to the those who recieve Layout A (control Group) at the end of a visit<br>
- Alternate Hypothesis : The Visitors of website (P - population) who recieve Layout B (Treatment Group) will have higher conversion rate as compared to the those who recieve Layout A (control Group) at the end of a visit<br>

##### P - Population of Visitors ,  I - Intervention in the form of Layout A/B, C - Comparison - Treatment group vs Control group, O - outcome - Converstion rate , T - at what time - At the end of a visit.<br>

Translating the above : let convertion rate of old page be Co and new page be Cn:<br>

    H0 = CTR old - CTR new >=0
    H1 = CTR old - CTR new <0

-----

# STEPs

### 1. Import library

In [2]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### 2. Load dataset

In [3]:
data = pd.read_csv('../Dataset/ab_data.csv')
data.head()

Unnamed: 0,user_id,timestamp,group,landing_page,action
0,851104,11:48.6,control,old_page,click
1,804228,01:45.2,control,old_page,click
2,661590,55:06.2,treatment,new_page,click
3,853541,28:03.1,treatment,new_page,click
4,864975,52:26.2,control,old_page,view


### 3. Data cleansing

In [4]:
df = data.drop(columns='timestamp')
df.head()

Unnamed: 0,user_id,group,landing_page,action
0,851104,control,old_page,click
1,804228,control,old_page,click
2,661590,treatment,new_page,click
3,853541,treatment,new_page,click
4,864975,control,old_page,view


### 4. Calculate the CTR control and CTR treatment
CTR control represents old design, meanwhile CTR treatment represents new design.

In [5]:
# CTR control
control_df = df[df['group'] == 'control'] #select control group
control_ctr = control_df[control_df['action'] == 'click'].user_id.nunique() / control_df[control_df['action'] == 'view'].user_id.nunique() #calculate ctr control
print('Total CTR control = ', control_ctr)

Total CTR control =  7.270673294170809


In [6]:
# CTR treatment
treatment_df = df[df['group'] == 'treatment'] #select treatment group
treatment_ctr = treatment_df[treatment_df['action'] == 'click'].user_id.nunique() / treatment_df[treatment_df['action'] == 'view'].user_id.nunique() #calculate ctr treatment
print('Total CTR treatment = ', treatment_ctr)

Total CTR treatment =  7.37554310541962


### 5. Calculate the difference of control and treatment CTR
To see whether it fits the hypothesis or not.

In [7]:
obs_diff = control_ctr-treatment_ctr #calculate obs difference
print('The difference of control ctr and treatment ctr = ', obs_diff)

The difference of control ctr and treatment ctr =  -0.10486981124881112


From the above, we know that CTR old - CTR new < 0.

### 6. Compute the p-value
Finding the proportion of values in the null distribution that were greater than our observed difference.

In [11]:
import statsmodels.api as sm

In [12]:
# calculate click page from old and new page, also calculate the number of old and new page
click_old = (df.query('landing_page=="old_page"')['action']=='click').sum()
click_new = (df.query('landing_page=="new_page"')['action']=='click').sum()
page_old = (df['landing_page']=='old_page').sum()
page_new=(df['landing_page']=='new_page').sum()

click_old, click_new, page_old, page_new

(129500, 129741, 147239, 147239)

In [13]:
# calculate z_score and p_value
z_score, p_value = sm.stats.proportions_ztest([click_new, click_old], [page_new, page_old], alternative='smaller')
print('The p-value is', p_value)
print('The z-score is', z_score)

The p-value is 0.9143962454534342
The z-score is 1.3683341399999251


### 7. Conclusion

If p-value < 0.05 then we reject null hypothesis.<br>
If p-value > 0.05 then we do not reject null hypothesis.<br>

Since p-value is .91 and z-score is 1.37, thus **we do not reject the null hypothesis** thats Visitors with Layout 2 will not have higher conversion rate. So our overall claim that new design of button will have higher ctr is **not proven by the experiment**.

------------