# Problem

The current landing page for our eCommerce website is not converting visitors into customers as effectively as we would like. We believe that a new landing page design could potentially improve our conversion rate.

# Objective 

To determine whether a new landing page design can significantly improve the conversion rate compared to the current landing page design.

# Metric

Conversion rate: The percentage of visitors who making a purchase on our website

# Hypothesis

Null Hypothesis (H0): There is no statistically significant difference in conversion rate between the new landing page and the current landing page.

Alternative Hypothesis (H1): The new landing page will lead to a statistically significant increase in conversion rate compared to the current landing page.

## Data Preparing

### Data Collection

In [5]:
#Import Packages
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.power import NormalIndPower, TTestIndPower
import random

In [6]:
#Import Data
file_path = 'ab_data.csv'
df = pd.read_csv(file_path)
df.head()


Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


From the dataset, we can get some attributes that can be described as:

user_id : Unique id represent each user.
timestamp : Timestamp when the user visited the webpage.
group : User category, control if user that given visited the old page, treatment if user visited the new page.
landing_page : Represent which page user visit, old page or new page
converted : Represent whether the user decided to pay for the product or not. 1 if yes, 0 if no.

### Data Processing

In [7]:
df.shape

(294478, 5)

In [8]:
df.dtypes

user_id          int64
timestamp       object
group           object
landing_page    object
converted        int64
dtype: object

In [11]:
df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

There is no null value in the dataset

In [111]:
dup_user = df['user_id'].value_counts(ascending=False)
dup_user
multi_users = dup_id[dup_user > 1].count()
print(f'There are {multi_users} users that appear multiple times in the dataset')

There are 3894 users that appear multiple times in the dataset


In [112]:
#Remove Duplicates of User_ID
df = df.drop_duplicates(subset=['user_id'], keep='first')
df.shape

(290584, 5)

In [113]:
# Check for inconsistencies data and drop them
inconsistencies = df[(df['group'] == 'control') & (df['landing_page'] != 'old_page') |
                       (df['group'] == 'treatment') & (df['landing_page'] != 'new_page')]
df = df.drop(inconsistencies.index)

# Experiment

#### Determine the Sample Size

In [15]:
baseline_conversion_rate = df['converted'].mean()
minimum_detectable_effect = 0.01 
significance_level = 0.05
power = 0.8

In [14]:
power_analysis = NormalIndPower()
sample_size = power_analysis.solve_power(effect_size=minimum_detectable_effect / np.sqrt(baseline_conversion_rate * (1 - baseline_conversion_rate)),
                                         power=power,
                                         alpha=significance_level,
                                         ratio=1)
sample_size = int(sample_size)
print(f"Sample size: {sample_size}")

Sample size: 16536


### Z-Test

In [34]:
# Filter the dataset to get the sample size for each group
control_group = df[df['group'] == 'control'].sample(n=sample_size, random_state=42)
treatment_group = df[df['group'] == 'treatment'].sample(n=sample_size, random_state=42)

In [36]:
#Calculate mean and std of each group
mean_control = control_group['converted'].mean()
mean_treatment = treatment_group['converted'].mean()
std_control = control_group['converted'].std()  
std_treatment = treatment_group['converted'].std()
n_control = len(control_group)
n_treatment = len(treatment_group)

#Calculate z-score
se = np.sqrt((std_control**2 / n_control) + (std_treatment**2 / n_treatment))
z_score = (mean_control - mean_treatment) / se

#Calculate p-value
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))  # two-sided test

print(f"Conversion Rate of Old Landing Page: {mean_control}")
print(f"Conversion Rate of New Landing Page: {mean_treatment}")
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

Conversion Rate of Old Landing Page: 0.12747943880019352
Conversion Rate of New Landing Page: 0.1217343976777939
Z-score: 1.5816997193630347
P-value: 0.11371813682186382


Since our p-value=0.11 is way above our =0.05, we cannot reject the null hypothesis. This means the new landing page did not perform significantly different than the old landing page. Hereafter, we'll try to check how much the conversion rate from both old and new landing page.

In [45]:
conversion_rate_control = df[df['group'] == 'control']['converted'].mean()
conversion_rate_treatment = df[df['group'] == 'treatment']['converted'].mean()
print(f"Conversion Rate of Old Landing Page: {conversion_rate_control}")
print(f"Conversion Rate of New Landing Page: {conversion_rate_treatment}")

Conversion Rate of Old Landing Page: 0.12039917935897611
Conversion Rate of New Landing Page: 0.11891957956489856


# Conclusion

Based on our A/B test results, we did not observe a statistically significant difference in conversion rates between the control and treatment groups. We can also see that the conversion rate from the new landing page is even less than the old landing page. From those insights, we can conclude that the new landing page is not improving the eCommerce conversion rate.