## Analyze A/B Test Results
### by Ibrahim Olayiwola

## Table of contents.

- [Introduction](#intro)
- [Part I - Probability](#probability)
- [Part II - A/B Test](#ab_test)
- [Part III - Regression](#regression)
- [Part IV - Conclusions](#conclusions)


<a id='intro'></a>
### Introduction
A/B tests are very commonly performed by to test weather a changeversion of a webpage brings about more conversion in users.
For this project, I will be working to understand the results of an A/B test run by an e-commerce website. The goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.

The A/B test has been carried out and what remains is for me to analyze it. The reason I'm doing this analysis is to help an e-commerce company know if they should implement a new page, keep the old page, or perhaps run the experiment longer to make a decision. The A/B test results is save in a csv file name ab_data.


<a id='probability'></a>
### Part I - Probability

To get started, let's import our libraries.

In [1]:
import pandas as pd
import numpy as np
import random 
import matplotlib.pyplot as plt

# Setting seed so as to get uniform answer when another analyst carries out this test.
random.seed(42)

Now, the data will be read in and stored in a dataframe.

In [2]:
df = pd.read_csv('ab_data.csv')

# Checking the first 5 rows of the dataframe.
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


From the above rows, we can see that;
- There are 5 columns in the dataset.
- The users were divided into two groups, the control group which were shown the old landing page and the treatment group which were shown the new landing page.
- There is also a column that shows if a user was converted or not based on the page they were shown.

Next, I would explore the dataset to know more about it.

In [3]:
# Number of rows in the dataset
df.shape[0]

294478

There are two hundred and ninety-four thousand, four hundred and seventy-eight (294,478) rows in the dataset.

Next is to check the number of unique users and other unique details about the dataset.

In [4]:
# Unique details
df.nunique()

user_id         290584
timestamp       294478
group                2
landing_page         2
converted            2
dtype: int64

So, from the output above we can say that
- Out of 294478 users, 290584 are unique meaning 3894 are duplicate of some users.
- The users were divided into two main groups, the control group and the treatment group.
- There are two landing pages, the old page and the new page.
- Users were either converted or not.

Now, we'd check the proportion of users converted.

In [5]:
# Proportion of users converted
df.converted.mean()

0.11965919355605512

Approximately 12% of users were converted.