**We've received an analytical task from an international online store. Your predecessor failed to complete it: they launched an A/B test and then quit (to start a watermelon farm in Brazil). They left only the technical specifications and the test results.**

# **Technical description:**

- Test name: recommender_system_test
- Groups: А (control), B (new payment funnel)
- Launch date: 2020-12-07
- The date when they stopped taking up new users: 2020-12-21
- End date: 2021-01-01
- Audience: 15% of the new users from the EU region
- Purpose of the test: testing changes related to the introduction of an improved recommendation system
- Expected result: within 14 days of signing up, users will show better conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase). At each of the stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.
- Expected number of participants: 6000

We will download the data, see whether it was done correctly and analyse the results. 

# **Instructions:** <a id='instructions'></a>
### [Initial exploration of the data](#exploring)
### [Exploratory data analysis](#eda)
[- Conversion at different funnel stages](#convfunnel)<br>
[- Is the number of events per user distributed equally in the samples?](#distr)<br>
[- Are there users who enter both samples?](#both)<br>
[- How is the number of events distributed by days?](#dates)<br>
[- What details in the data we have to take into account before starting the A/B test?](#details)<br>
### [Evaluate the A/B test results](#evaluate)
[- What can we say about the A/A test results?](#aa)<br>
[- Using the z-criterion to check the statistical difference between the proportions](#z)<br>
### [Overall Conclusions](#conc)

In [1]:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
#import plotly.express as px
from scipy import stats as st
import scipy.stats as stats
import math as mth

In [2]:
#!pip install q --upgrade plotly
import plotly
import plotly.express as px 
#print('Plotly version', plotly.__version__)

# **Initial exploration of the data**<a id='exploring'></a>

**All <mark>events</mark> of the new users within the period from <mark>December 7, 2020 to January 1, 2021</mark>:**

In [3]:
try:
    events = pd.read_csv('final_ab_events_us.csv.csv')
except:
    events = pd.read_csv('/datasets/final_ab_events_us.csv.csv')

events.head()

Unnamed: 0,user_id,event_dt,event_name,details
0,E1BDDCE0DAFA2679,2020-12-07 20:22:03,purchase,99.99
1,7B6452F081F49504,2020-12-07 09:22:53,purchase,9.99
2,9CD9F34546DF254C,2020-12-07 12:59:29,purchase,4.99
3,96F27A054B191457,2020-12-07 04:02:40,purchase,4.99
4,1FD7660FDF94CA1F,2020-12-07 10:15:09,purchase,4.99


In [4]:
events.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 423761 entries, 0 to 423760
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   user_id     423761 non-null  object 
 1   event_dt    423761 non-null  object 
 2   event_name  423761 non-null  object 
 3   details     60314 non-null   float64
dtypes: float64(1), object(3)
memory usage: 12.9+ MB


In [5]:
events.tail()

Unnamed: 0,user_id,event_dt,event_name,details
423756,245E85F65C358E08,2020-12-30 19:35:55,login,
423757,9385A108F5A0A7A7,2020-12-30 10:54:15,login,
423758,DB650B7559AC6EAC,2020-12-30 10:59:09,login,
423759,F80C9BDDEA02E53C,2020-12-30 09:53:39,login,
423760,7AEC61159B672CC5,2020-12-30 11:36:13,login,


In [6]:
events.describe(include='all')

Unnamed: 0,user_id,event_dt,event_name,details
count,423761,423761,423761,60314.0
unique,58703,257138,4,
top,A3917F81482141F2,2020-12-14 18:54:55,login,
freq,36,10,182465,
mean,,,,23.881219
std,,,,72.228884
min,,,,4.99
25%,,,,4.99
50%,,,,4.99
75%,,,,9.99


In [7]:
events.isnull().sum()

user_id            0
event_dt           0
event_name         0
details       363447
dtype: int64

In [8]:
events.shape

(423761, 4)

In [9]:
for i in events.columns:
    print(i, len(events[events[i]==0]))

user_id 0
event_dt 0
event_name 0
details 0


In [10]:
events.duplicated().sum()

0

In [11]:
percent_missing = events.isnull().sum() * 100 / len(events)
percent_missing

user_id        0.000000
event_dt       0.000000
event_name     0.000000
details       85.766977
dtype: float64

**Thus:** the table on events contains 423761 rows and 4 columns. There are 363447 missing values in 'details' column, no 0s and 58703 unique users. 

**All <mark>users</mark> who <mark>signed up</mark> in the online store from <mark>December 7 to 21, 2020</mark>:**

In [12]:
try:
    new_users = pd.read_csv('final_ab_new_users_upd.csv')
except:
    new_users = pd.read_csv('/datasets/final_ab_new_users_upd.csv')

new_users.head()

Unnamed: 0,user_id,first_date,region,device
0,D72A72121175D8BE,2020-12-07,EU,PC
1,F1C668619DFE6E65,2020-12-07,N.America,Android
2,2E1BF1D4C37EA01F,2020-12-07,EU,PC
3,50734A22C0C63768,2020-12-07,EU,iPhone
4,E1BDDCE0DAFA2679,2020-12-07,N.America,iPhone


In [13]:
new_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58703 entries, 0 to 58702
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     58703 non-null  object
 1   first_date  58703 non-null  object
 2   region      58703 non-null  object
 3   device      58703 non-null  object
dtypes: object(4)
memory usage: 1.8+ MB


In [14]:
new_users.tail()

Unnamed: 0,user_id,first_date,region,device
58698,1DB53B933257165D,2020-12-20,EU,Android
58699,538643EB4527ED03,2020-12-20,EU,Mac
58700,7ADEE837D5D8CBBD,2020-12-20,EU,PC
58701,1C7D23927835213F,2020-12-20,EU,iPhone
58702,8F04273BB2860229,2020-12-20,EU,Android


In [15]:
new_users.describe(include='all')

Unnamed: 0,user_id,first_date,region,device
count,58703,58703,58703,58703
unique,58703,17,4,4
top,404C3593F5E1F39E,2020-12-21,EU,Android
freq,1,6077,43396,26159


In [16]:
new_users.isnull().sum()

user_id       0
first_date    0
region        0
device        0
dtype: int64

In [17]:
new_users.shape

(58703, 4)

In [18]:
for i in new_users.columns:
    print(i, len(new_users[new_users[i]==0]))

user_id 0
first_date 0
region 0
device 0


In [19]:
new_users.duplicated().sum()

0

**Thus:** the table on new_users contains 58703 rows and 4 columns, no missing values and no 0s.

In [20]:
try:
    participants = pd.read_csv('final_ab_participants_upd.csv')
except:
    participants = pd.read_csv('/datasets/final_ab_participants_upd.csv')

participants.head()

Unnamed: 0,user_id,group,ab_test
0,D1ABA3E2887B6A73,A,recommender_system_test
1,A7A3664BD6242119,A,recommender_system_test
2,DABC14FDDFADD29E,A,recommender_system_test
3,04988C5DF189632E,A,recommender_system_test
4,4FF2998A348C484F,A,recommender_system_test


In [21]:
participants.ab_test.unique()

array(['recommender_system_test', 'interface_eu_test'], dtype=object)

In [22]:
participants.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14525 entries, 0 to 14524
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  14525 non-null  object
 1   group    14525 non-null  object
 2   ab_test  14525 non-null  object
dtypes: object(3)
memory usage: 340.6+ KB


In [23]:
participants.tail()

Unnamed: 0,user_id,group,ab_test
14520,1D302F8688B91781,B,interface_eu_test
14521,3DE51B726983B657,A,interface_eu_test
14522,F501F79D332BE86C,A,interface_eu_test
14523,63FBE257B05F2245,A,interface_eu_test
14524,79F9ABFB029CF724,B,interface_eu_test


In [24]:
participants.describe(include='all')

Unnamed: 0,user_id,group,ab_test
count,14525,14525,14525
unique,13638,2,2
top,A3EC85E750F2AFC6,A,interface_eu_test
freq,2,8214,10850


In [25]:
participants.isnull().sum()

user_id    0
group      0
ab_test    0
dtype: int64

In [26]:
participants.shape

(14525, 3)

In [27]:
for i in participants.columns:
    print(i, len(participants[participants[i]==0]))

user_id 0
group 0
ab_test 0


In [28]:
participants.duplicated().sum()

0

**Thus:** The table on participants contains 14525 rows and 3 columns no missing values and no 0s. There are 13638 unique users in the data. This can suggest that users accidentaly got into two group and we will need to investigate that. **There are also two tests and we will need to understand which one to choose.**

**The calendar of <mark>marketing events</mark> for 2020:**

In [29]:
try:
    marketing = pd.read_csv('ab_project_marketing_events.csv')
except:
    marketing = pd.read_csv('/datasets/ab_project_marketing_events.csv')

marketing.head()

Unnamed: 0,name,regions,start_dt,finish_dt
0,Christmas&New Year Promo,"EU, N.America",2020-12-25,2021-01-03
1,St. Valentine's Day Giveaway,"EU, CIS, APAC, N.America",2020-02-14,2020-02-16
2,St. Patric's Day Promo,"EU, N.America",2020-03-17,2020-03-19
3,Easter Promo,"EU, CIS, APAC, N.America",2020-04-12,2020-04-19
4,4th of July Promo,N.America,2020-07-04,2020-07-11


In [30]:
marketing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       14 non-null     object
 1   regions    14 non-null     object
 2   start_dt   14 non-null     object
 3   finish_dt  14 non-null     object
dtypes: object(4)
memory usage: 576.0+ bytes


In [31]:
marketing.tail()

Unnamed: 0,name,regions,start_dt,finish_dt
9,Victory Day CIS (May 9th) Event,CIS,2020-05-09,2020-05-11
10,CIS New Year Gift Lottery,CIS,2020-12-30,2021-01-07
11,Dragon Boat Festival Giveaway,APAC,2020-06-25,2020-07-01
12,Single's Day Gift Promo,APAC,2020-11-11,2020-11-12
13,Chinese Moon Festival,APAC,2020-10-01,2020-10-07


In [32]:
marketing.describe(include='all')

Unnamed: 0,name,regions,start_dt,finish_dt
count,14,14,14,14
unique,14,6,14,14
top,St. Valentine's Day Giveaway,APAC,2020-11-11,2020-07-11
freq,1,4,1,1


In [33]:
marketing.isnull().sum()

name         0
regions      0
start_dt     0
finish_dt    0
dtype: int64

In [34]:
marketing.shape

(14, 4)

In [35]:
for i in marketing.columns:
    print(i, len(marketing[marketing[i]==0]))

name 0
regions 0
start_dt 0
finish_dt 0


In [36]:
marketing.duplicated().sum()

0

**Thus:** the table on marketing contains 14 rows and 4 columns, no missing values or 0s.

[**Back to contents**](#instructions)

# **Exploratory data analysis**<a id='eda'></a>

**Conversion at different funnel stages:**<a id='convfunnel'></a>

In [37]:
occurence = events.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False) / events.user_id.nunique()
occurence = occurence.reset_index()
occurence['number'] = occurence['user_id'] * events.user_id.nunique()
occurence

Unnamed: 0,event_name,user_id,number
0,login,0.999881,58696.0
1,product_page,0.663152,38929.0
2,purchase,0.333339,19568.0
3,product_cart,0.328501,19284.0


In [38]:
actions_share = px.bar(occurence, x='event_name', y='user_id', color='user_id', text = 'number',
                    labels={'user_id':'% of users who clicked', 'event_name':'event__name'}, height=400)
actions_share.update_layout(title='Conversion:',
                 xaxis_title = 'Actions',
                 yaxis_title = 'Share of users',
                 )

actions_share.show()

AttributeError: module 'plotly.express' has no attribute 'bar'

**Conclusion:** This is what conversion looks like for all stages in the initial data. From just eyeballing it, we can say that each stage has about 33% lower conversion rate that the previous one (99%, 66%, 33%, apart from the last two: purchase and product_cart look very similar with product_cart having 32% conversion. 

[**Back to contents**](#instructions)

**Is the number of events per user distributed equally in the samples?**<a id='distr'></a>

In [None]:
number_of_events_per_user = events.groupby(['user_id'])['event_dt'].count().reset_index()
number_of_events_per_user = number_of_events_per_user.rename(columns = {'event_dt':'number_of_events'})
number_of_events_per_user_ = number_of_events_per_user.merge(events, on='user_id', how='right')
number_of_events_per_user_.duplicated().sum()

In [None]:
number_of_users_per_event = events.groupby(['event_name'])['user_id'].count().reset_index()
number_of_users_per_event.sort_values(by='user_id', ascending = False)

In [None]:
fig_per_user = px.histogram(number_of_events_per_user_, x='number_of_events', color='event_name', title = 'Distribution of the number of events per user:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_per_user.update_layout(
                 xaxis_title = 'Number of events',
                 yaxis_title = 'Number of users')
fig_per_user.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))
fig_per_user.show()

**Conclusion:** First of all, we checked the number of events in relation to the total number of users who performed them. We can see that across the entire data the most common number of events is 6 and 12. After 25 events the frequency of appearance of higher numbers goes down significantly. 'Login' and 'product_page' are the most common actions performed by users.

In [None]:
number_of_events_per_user__ = number_of_events_per_user.merge(participants, on='user_id', how='right')
number_of_events_per_user__.duplicated().sum()

In [None]:
fig_per_user_group = px.histogram(number_of_events_per_user__, x='number_of_events', color='group', title = 'Distribution of the number of events per user:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_per_user_group.update_layout(
                 xaxis_title = 'Number of events',
                 yaxis_title = 'Number of users')
fig_per_user_group.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))
fig_per_user_group.show()

**Conclusion:** Then we checked the same distribution but divided bars into 'A' and 'B' groups. Now we can notice that groups A and B are more or less equally distributed throughout the initial data but we will need to check that. 

In [None]:
fig_per_user_test = px.histogram(number_of_events_per_user__, x='number_of_events', color='ab_test', title = 'Distribution of the number of events per user:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_per_user_test.update_layout(
                 xaxis_title = 'Number of events',
                 yaxis_title = 'Number of users')
fig_per_user_test.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))

fig_per_user_test.show()

**Conclusion:** If we look at the distribution of number of events per users with bars split by two different tests, we can see that users who participate in enterface_eu_test have significantly more actions performed. 

In [None]:
new_users['first_date_'] = pd.to_datetime(new_users['first_date'], format='%Y-%m-%d')
new_users['first_date__'] = new_users['first_date_'].dt.date
print('The minimum date in new_users table is', new_users['first_date'].min())
print('The maximum date in new_users table is', new_users['first_date'].max())

In [None]:
fig_distr = px.histogram(new_users, x='first_date__', color = 'region', title = 'Distribution of the number of new users by dates:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_distr.update_layout(
                 xaxis_title = 'Date',
                 yaxis_title = 'Number of new users')
fig_distr.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="left",
    x=0.99
))
fig_distr.show()

**Conclusion:** If we pay attention to the distribution of the number of new users by date, we can see it is pretty much dominated by EU region which corresponds to the description of the task. And also leaves some room for thought in regards to which test we should actually use. 

[**Back to contents**](#instructions)

**Are there users who enter both samples?**<a id='both'></a>

- get a list of those users who belong to both tests
- remove them from the table

Since we have two test names in the data, by removing users who beling to two tests, we will probably remove those who also belong to two groups.

In [None]:
participants_w_two_tests = participants.groupby('user_id').agg({'ab_test':'nunique'}).reset_index().query('ab_test > 1')
participants_w_two_tests.head()

This is the number of users who belong to two tests and we will need to remove them.

In [None]:
participants_w_two_tests.shape

In [None]:
participants_w_two_tests.duplicated().sum()

In [None]:
participants.shape

In [None]:
two_tests_list = []
two_tests_list = participants_w_two_tests['user_id'].tolist()

participants_new = participants[~participants['user_id'].isin(two_tests_list)]
participants_new.shape

Now we will check if there are still users who belong to two tests:

In [None]:
participants_w_two_groups = participants_new.groupby('user_id').agg({'ab_test':'nunique'}).reset_index().query('ab_test > 1')
participants_w_two_groups.head()

**Conclusion:** So we identified and deleted users who belong to two tests simultaneously first, found 1602 such users and removed them. Then we checked if there were still other users left who belonged in two groups (A and B) by mistake and did not find any. We now can be sure we have a reasonable data and need to check which test is suitable to the task.

**Which test do we need?**

In [None]:
participants_new_by_test = participants_new.groupby(['ab_test'])['user_id'].count()
participants_new_by_test

**Conclusion:** Now we see that not only the distribution of new users hinted at the interface_eu_test, but also that only interface_eu_test is consistent with 'at least 6000 users' rule. Recommender_system_test does not have enough users.

**For the test we need at least 6000 participants, so let us check the funnel just for the 'interface_eu_test'**

In [None]:
events.head()

Slicing the table so that we only have data for the interface_eu_test:

In [None]:
participants_new = participants_new.query('ab_test in "interface_eu_test"')
participants_new.head()

**Merging_tables:**

In [None]:
events_participants = events.merge(participants_new, on ='user_id', how='right')
events_participants.head()

In [None]:
events_participants.shape

In [None]:
events_participants.duplicated().sum()

**Conclusion:** now we have the table we need with 73216 rows in it.

**Building a funnel:**

In [None]:
funnel_shift_part_ev = events_participants.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False).reset_index()
funnel_shift_part_ev

In [None]:
funnel_shift_part_ev['perc_ch'] = funnel_shift_part_ev['user_id'].pct_change()
funnel_shift_part_ev.head()

In [None]:
funnel_by_groups = []
for i in events_participants.group.unique():
    group = events_participants[events_participants.group == i].groupby(['event_name', 'group'])['user_id'].nunique().reset_index().sort_values(by='user_id', ascending=False)
    display(group)
    funnel_by_groups.append(group)

In [None]:
funnel_by_groups = pd.concat(funnel_by_groups)
funnel_by_groups

In [None]:
#!pip install plotly==4.4.1 
fig_funnel = px.funnel(funnel_by_groups, x='user_id', y='event_name', color='group')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))

fig_funnel.show()

**Conclusion:** We built a funnel for groups A and B for interface_eu_test to see whether two groups are more or less equal at each stage. They turned out to be more or less equal. We can use them for the test of proportions. Also the funnel stages correspond to those described in the task statement.

**How is the number of events distributed by days?**<a id='dates'></a>

In [None]:
events['event_dt'] = pd.to_datetime(events['event_dt'], format='%Y-%m-%dT%H:%M:%S')
events['event_day'] = events['event_dt'].dt.date

In [None]:
fig_events = px.histogram(events, x='event_day', color = 'event_name', title = 'Distribution of the number of events by dates')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_events.update_layout(
                 xaxis_title = 'Date',
                 yaxis_title = 'Number of events')
fig_events.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))

fig_events.show()

**Conclusion:** After building a histogram to see the distribution of the number of events by dates, we notice a gap on the 25th of December - it seems like no event in regards to data happened on that day. Also the 30th of December appear to have a very small number of actions performed. The peak of activity can be observed on the 21st of December. And 25th is obviously Christmas so that could explain it somewhat.

**What details in the data we have to take into account before starting the A/B test?**<a id='details'></a>

- Are users correctly devided into segments?
- Are groups of the same size?
- Is one segment really better, or is this just a statistical fluctuation?
- Are groups big enough?
- Did we calculate the sample size using the calculator? Is it too long or too short?
- Did we analyze outliers and anomalies?

requirements for proporion test:
- the sampling method is simple random sampling
- each sample point can result in just two possible outcomes (success and failure)
- the sample includes at least 10 successes and 10 failures
- the proporion size is at least 20 times as big as the sample size

In [None]:
events_participants.head()

In [None]:
number_of_events_per_user_int = events_participants.groupby(['user_id'])['event_dt'].count().reset_index()
number_of_events_per_user_int = number_of_events_per_user_int.rename(columns = {'event_dt':'number_of_events'})
number_of_events_per_user_int = number_of_events_per_user_int.merge(events_participants, on='user_id', how='right')
number_of_events_per_user_int.head()

In [None]:
number_of_events_per_user_int.duplicated().sum()

In [None]:
fig_per_user_test_int = px.histogram(number_of_events_per_user_int, x='number_of_events', color='group', title = 'Distribution of the number of events per user for interface eu test:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_per_user_test_int.update_layout(
                 xaxis_title = 'Number of events',
                 yaxis_title = 'Number of users')
fig_per_user_test_int.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))

fig_per_user_test_int.show()

**Conclusion:** Before  analysing results of the A/B test we did a double check for the distribution of the number of the events and it appears relatively normal.

In [None]:
number_of_events_per_user_int = number_of_events_per_user_int.merge(new_users, on = 'user_id', how = 'right')
number_of_events_per_user_int.head()

In [None]:
number_of_events_per_user_int.duplicated().sum()

In [None]:
number_of_events_per_user_int.region.unique()

In [None]:
regions = number_of_events_per_user_int.groupby(['region'])['user_id'].count().reset_index()
regions.sort_values(by='user_id', ascending = False)

I do not know if it is ok to still have users who participate from other countries but this is what we ended up with after slicing.

In [None]:
fig_per_user_test_int_device = px.histogram(number_of_events_per_user_int, x='number_of_events', color='device', title = 'Distribution of the number of events per user for interface eu test:')#\
#.for_each_trace(lambda t: t.update(name=t.name.split("=")[1]))
fig_per_user_test_int_device.update_layout(
                 xaxis_title = 'Number of events',
                 yaxis_title = 'Number of users')
fig_per_user_test_int_device.update_layout(legend=dict(
    yanchor="top",
    y=0.99,
    xanchor="right",
    x=0.97
))

fig_per_user_test_int_device.show()

**Conclusion:** The distribution of the number of events in relation to the number of users shown by device shows that on this occasion users are split into segments correctly.

**Overall conclusion:** Thus, we made our minds about which test suits the description of the task and which is not, checked the data for duplicates, distribution of the users across regions, what kind of distribution we are working with and number of events per users split by device. Let us move to the test part.

[**Back to contents**](#instructions)

# **Evaluate the A/B test results**<a id='evaluate'></a>

**What can we say about the A/A test results?**<a id='aa'></a>

**Can we trust the results? Why not?**

In [None]:
events_participants.event_dt
events_participants['event_dt_'] = pd.to_datetime(events_participants['event_dt'], format='%Y-%m-%d')
events_participants['event_dt_'].min()

- First of all, it will be hard to trust the results of the test because we have users in the data who performed actions before the 16th of December which is a recommended day to start the experiment because of the 14 days rule in the task description.
- Secondly, we have gaps or weird lacks in the data on the 25th of December and 30th of December. We probably should not test anything during winter holidays period unless we need to test it for some specific reason.
- Thirdly, in the resulting table we have users who are based outside EU and we are not sure if that is a mistake and whether we need them there.

**Using the z-criterion to check the statistical difference between the proportions**<a id='z'></a>

Expected result: within 14 days of signing up,<br> **users will show better conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase). At each of the stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.**

Test three stages separately: 
- login → product_page
- product_pape → product_card
- product_card → purchase

In [None]:
events_participants.user_id.nunique()

In [None]:
occurence_int = events_participants.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False) / events_participants.user_id.nunique()
occurence_int = occurence_int.reset_index()
occurence_int['number'] = occurence_int['user_id'] * events_participants.user_id.nunique()
occurence_int['number'] = occurence_int['number'].round(2)
occurence_int

In [None]:
occurence

In [None]:
actions_share_int = px.bar(occurence_int, x='event_name', y='user_id', color='user_id', text = 'number',
                    labels={'user_id':'% of users who clicked', 'event_name':'event__name'}, height=400)
actions_share_int.update_layout(title='Conversion for interface test only:',
                 xaxis_title = 'Actions',
                 yaxis_title = 'Share of users',
                               )

actions_share_int.show()

In [None]:
pivot = events_participants.pivot_table(index='event_name', columns='group', values='user_id', aggfunc=lambda x: x.nunique()).reset_index()
pivot = pivot.query('event_name not in "login"')
pivot

- **Significance level**: 0.05
- **Null hypothesis**: There will be no statistically significant difference between the convestion into product page views (the product_page event), product card views (product_card) and purchases (purchase).
- **Alternative hypothesis**: Users will show better conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase). At each of the stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.

In [None]:
def check_hypothesis(group1, group2, event, alpha=.05):
    success1 = pivot[pivot['event_name'] == event][group1].iloc[0]
    success2 = pivot[pivot['event_name'] == event][group2].iloc[0]
    
    trials1 = events_participants[events_participants.group == group1]['user_id'].nunique()
    trials2 = events_participants[events_participants.group == group2]['user_id'].nunique()
    
    
    p1 = success1 / trials1

    p2 = success2 / trials2 

    p_combined = (success1 + success2) / (trials1 + trials2)

    difference = p1 - p2

    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1 / trials1 + 1 / trials2))

    distr = st.norm(0, 1)

    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p_value:', p_value)

    if (p_value < alpha):
        print("Rejecting the null hypothesis for", event, "and groups", group1,'and', group2)
    else:
        print("Failed to reject the null hypothesis for:", event, "and groups", group1,'and', group2)

In [None]:
for i in pivot.event_name.unique():
    check_hypothesis('A', 'B', i, alpha=.05)

**Conclusions**: For the test of proportions, first of all we plotted a bar chart for conversion just for the sliced data. We then grouped the data by event_name and the number of users at each stage split by the group. Then we formulated null and alternative hypothesis thus: <br>
- **Null hypothesis**: There will be no statistically significant difference between the conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase).
- **Alternative hypothesis**: Users will show better conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase). At each of the stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.<br><br>
In the end, we ended up with these **results**:<br>
p_value: 0.10326893846203533
**Failed to reject the null hypothesis for: product_cart and groups A and B** - which means that there **is not** a statistically significant difference in conversion from 'login' to 'product_cart' stage.<br>
p_value: 0.12247464713382916
**Failed to reject the null hypothesis for: product_page and groups A and B** - which means that there **is not** a statistically significant difference in conversion from 'product_cart' to 'product_page' stage.<br>
p_value: 0.021999992681769553
**Rejecting the null hypothesis for purchase and groups A and B** - which means the is a statistically significant difference in conversion from 'product_page' to 'purchase' stage.

Thus, the change of the interface only affects the 'purchase' stage.

[**Back to contents**](#instructions)

### Overall Conclusions <a id='conc'></a> 

We plotted what conversion looks like for all stages in the initial data. From just eyeballing it, we can say that each stage has about 33% lower conversion rate that the previous one (99%, 66%, 33%, apart from the last two: purchase and product_cart look very similar with product_cart having 32% conversion.<br>

First of all, we checked the number of events in relation to the total number of users who performed them. We can see that across the entire data the most common number of events is 6 and 12. After 25 events the frequency of appearance of higher numbers goes down significantly. 'Login' and ‘product_page' are the most common actions performed by users.<br>

Then we checked the same distribution but divided bars into 'A' and 'B' groups. Now we can notice that groups A and B are more or less equally distributed throughout the initial data but we will need to check that. <br>

If we look at the distribution of number of events per users with bars split by two different tests, we can see that users who participate in enterface_eu_test have significantly more actions performed. <br>

If we pay attention to the distribution of the number of new users by date, we can see it is pretty much dominated by EU region which corresponds to the description of the task. And also leaves some room for thought in regards to which test we should actually use. <br>

So we identified and deleted users who belong to two tests simultaneously first, found 1602 such users and removed them. Then we checked if there were still other users left who belonged in two groups (A and B) by mistake and did not find any. We now can be sure we have a reasonable data and need to check which test is suitable to the task.<br>

Now we see that not only the distribution of new users hinted at the interface_eu_test, but also that only interface_eu_test is consistent with 'at least 6000 users' rule. Recommender_system_test does not have enough users.<br>

After slicing the table so that we only have data for the interface_eu_test: now we have the table we need with 73216 rows in it. We built a funnel for groups A and B for interface_eu_test to see whether two groups are more or less equal at each stage. They turned out to be more or less equal. We can use them for the test of proportions. Also the funnel stages correspond to those described in the task statement.<br>

After building a histogram to see the distribution of the number of events by dates, we notice a gap on the 25th of December - it seems like no event in regards to data happened on that day. Also the 30th of December appear to have a very small number of actions performed. The peak of activity can be observed on the 21st of December. And 25th is obviously Christmas so that could explain it somewhat.<br>

Before analysing results of the A/B test we did a double check for the distribution of the number of the events and it appears relatively normal.<br>

The distribution of the number of events in relation to the number of users shown by device shows that on this occasion users are split into segments correctly.<br>

Thus, we made our minds about which test suits the of the task and which is not, checked the data for duplicates, distribution of the users across regions, what kind of distribution we are working with and number of events per users split by device. Let us move to the test part.<br>

Can we trust the results? Why not?
- First of all, it will be hard to trust the results of the test because we have users in the data who performed actions before the 16th of December which is a recommended day to start the experiment because of the 14 days rule in the task description.
- Secondly, we have gaps or weird lacks in the data on the 25th of December and 30th of December. We probably should not test anything during winter holidays period unless we need to test it for some specific reason.
- Thirdly, in the resulting table we have users who are based outside EU and we are not sure if that is a mistake and whether we need them there.<br>

For the test of proportions, first of all we plotted a bar chart for conversion just for the sliced data. We then grouped the data by event_name and the number of users at each stage split by the group. Then we formulated null and alternative hypothesis thus: <br>
- **Null hypothesis**: There will be no statistically significant difference between the conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase).
- **Alternative hypothesis**: Users will show better conversion into product page views (the product_page event), product card views (product_card) and purchases (purchase). At each of the stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.<br><br>
In the end, we ended up with these **results**:<br>
p_value: 0.10326893846203533
**Failed to reject the null hypothesis for: product_cart and groups A and B** - which means that there **is not** a statistically significant difference in conversion from 'login' to 'product_cart' stage.<br>
p_value: 0.12247464713382916
**Failed to reject the null hypothesis for: product_page and groups A and B** - which means that there **is not** a statistically significant difference in conversion from 'product_cart' to 'product_page' stage.<br>
p_value: 0.021999992681769553
**Rejecting the null hypothesis for purchase and groups A and B** - which means the is a statistically significant difference in conversion from 'product_page' to 'purchase' stage.

Thus, the change of the interface only affects the 'purchase' stage.<br>

Overall, I would recommend to test again later so that it does not happen during holidays and the data is full in terms of dates.

Thank you for checking!