# AB TEST STUDY ON E-COMMERCE DATA

For this project, an attempt was made to understand the results of an A/B test conducted by an e-commerce website. The company has developed a new web page with the aim of increasing the number of users who "convert", that is, the number of users who decide to pay for the company's product. Our goal is to help the company understand whether it should implement this new page, keep the old page, or run the experiment longer to decide.

**INSTALLING REQUIRED LIBRARIES**

In [1]:
!pip install statsmodels



**IMPORT OF LIBRARIES**


In [2]:
import pandas as pd
from statsmodels.stats.proportion import proportions_ztest

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 12)
pd.set_option("display.width", 50)
pd.set_option("display.float_format", lambda x: '%.4f'%x)

**Reading and Examining the Dataset**

In [3]:
df = pd.read_csv("/kaggle/input/ecommerce-ab-testing-2022-dataset1/ecommerce_ab_testing_2022_dataset1/ab_data.csv")
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294480 entries, 0 to 294479
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294480 non-null  int64 
 1   timestamp     294480 non-null  object
 2   group         294480 non-null  object
 3   landing_page  294480 non-null  object
 4   converted     294480 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [5]:
df.nunique()

user_id         290585
timestamp        35993
group                2
landing_page         2
converted            2
dtype: int64

In [6]:
df.groupby(["group","landing_page"])["converted"].mean()

group      landing_page
control    new_page       0.1214
           old_page       0.1204
treatment  new_page       0.1188
           old_page       0.1272
Name: converted, dtype: float64

In [7]:
df.groupby("group")["landing_page"].value_counts()

group      landing_page
control    old_page        145274
           new_page          1928
treatment  new_page        145313
           old_page          1965
Name: count, dtype: int64

control > new_page > 1928

treatment > old_page > 1965

Since there may be incorrect data, I delete the new_page in the control group and the old_page in the treatment group because there is a small amount of data.

In [8]:
df = df.loc[~((df["group"] == 'control') & (df["landing_page"] == 'new_page')), :]
df = df.loc[~((df["group"] == 'treatment') & (df["landing_page"] == 'old_page')), :]
df.groupby("group")["landing_page"].value_counts()

group      landing_page
control    old_page        145274
treatment  new_page        145313
Name: count, dtype: int64

**I drop the duplicate user_id's so that the last one remains. I'm interested in the conversion in the final process.**



In [9]:
df = df.drop_duplicates(subset = 'user_id', keep='last')

In [10]:
df.shape

(290585, 5)

## Preparation of Test Data

In [11]:
control_succ = df.loc[df["group"] == 'control', "converted"].sum()
treatment_succ = df.loc[df["group"] == 'treatment', "converted"].sum()
print("Control Success: ", control_succ, "\nTreatment Success: ", treatment_succ)
control_nobs = df.loc[df["group"] == 'control', "group"].shape[0]
treatment_nobs = df.loc[df["group"] == 'treatment', "group"].shape[0]
print("Number of Control Group: ", control_nobs, "\nNumber of Treatment Group: ", treatment_nobs)


Control Success:  17489 
Treatment Success:  17264
Number of Control Group:  145274 
Number of Treatment Group:  145311


In [12]:
control_rate = control_succ / control_nobs
treatment_rate = treatment_succ / treatment_nobs
print("Conversion rate of the control group:", control_rate)
print("Conversion rate of the treatment group:", treatment_rate)

Conversion rate of the control group: 0.1203863045004612
Conversion rate of the treatment group: 0.11880724790277405


AB Testing (Two Sample Proportion Test)

H0: p1 = p2

There is no statistically significant difference between the conversion rate of the new design and the conversion rate of the old design.

H1: p1 != p2

... There is statistically significant difference.

In [13]:
ttest_stats, pvalue = proportions_ztest(count= [control_succ, treatment_succ],
                                        nobs= [control_nobs, treatment_nobs])
print('Test Stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

Test Stats= 1.3116 , Pvalue= 0.1897


Pvalue= 0.1892 > 0.05

## RESULT
Therefore, the H0 hypothesis cannot be rejected.

As a result, no statistically significant difference was found in new_page and old_page conversion rates.

Considering the large number of observations, it would be pointless to extend the experiment.

