# A/B Testing - Analysis of the Effectiveness of Two Landing Page Variants

## Import dependencies.

In [31]:
import pandas as pd

## Load and viaualize data

### Load data

In [32]:
data = pd.read_csv('../../data/ab_data_tourist.csv')
data.head()

Unnamed: 0,user_id,date,group,purchase,price
0,851104,2021-01-21,A,0,0
1,804228,2021-01-12,A,0,0
2,661590,2021-01-11,B,0,0
3,853541,2021-01-08,B,0,0
4,864975,2021-01-21,A,1,150000


### Exploratory data analysis

In [33]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column    Non-Null Count   Dtype 
---  ------    --------------   ----- 
 0   user_id   294478 non-null  int64 
 1   date      294478 non-null  object
 2   group     294478 non-null  object
 3   purchase  294478 non-null  int64 
 4   price     294478 non-null  int64 
dtypes: int64(3), object(2)
memory usage: 11.2+ MB


All columns have a correct data type except or the column 'date'. We need to convert the 'date' column to Datetime data type to perform operations with dates.

In [34]:
data['date'] = pd.to_datetime(data['date'])
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column    Non-Null Count   Dtype         
---  ------    --------------   -----         
 0   user_id   294478 non-null  int64         
 1   date      294478 non-null  datetime64[ns]
 2   group     294478 non-null  object        
 3   purchase  294478 non-null  int64         
 4   price     294478 non-null  int64         
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 11.2+ MB


Check the duration of the test interval for both groups.

In [35]:
group_a_start = data[data['group'] == 'A']['date'].dt.date.min()
group_a_end = data[data['group'] == 'A']['date'].dt.date.max()
group_b_start = data[data['group'] == 'B']['date'].dt.date.min()
group_b_end = data[data['group'] == 'B']['date'].dt.date.max()

print(f'Grop A test interval: {group_a_start} - {group_a_end}')
print(f'Grop B test interval: {group_b_start} - {group_b_end}')

Grop A test interval: 2021-01-02 - 2021-01-24
Grop B test interval: 2021-01-02 - 2021-01-24


Test intervals are identical for both test groups. No action is required to equalize the test intervals.

Check for empty values.

In [36]:
data.isnull().sum()

user_id     0
date        0
group       0
purchase    0
price       0
dtype: int64

There are no empty values in data.

Check if there are users who got into both groups during the test.

In [38]:
user_group_count = data.groupby('user_id')['group'].count().reset_index()
users_in_both_groups = user_group_count[user_group_count['group'] != 1]
print('Number of users present in both groups:', users_in_both_groups.shape[0])

# We need to delete these users from the dataset.
print('Total test data:', data.shape[0])
users_in_both_groups = users_in_both_groups['user_id'].to_list()
data = data[~data['user_id'].isin(users_in_both_groups)]
print('Total test data after deletion:', data.shape[0])

Number of users present in both groups: 0
Total test data: 286690
Total test data after deletion: 286690
