# Final Project - A/B testing

## Introduction

**Project Description**

Our task is to evaluate the results of an A/B test. We have a dataset containing user actions, a technical specification, and several auxiliary datasets. Our objectives are:

1. Assess the correctness of the test implementation.
2. Analyze the test results.

To evaluate the correctness of the test implementation, we will check the following:

1. Intersection of the test audience with a competing test.
2. Alignment of the test with marketing events and other issues regarding the temporal boundaries of the test.

**Technical Specification**

1. Test name: `recommender_system_test`.
2. Groups: A - control group, B - new payment funnel.
3. Launch date: December 7, 2020.
4. Date to stop collecting new users: December 21, 2020.
5. End date: January 4, 2021.
6. Audience: 15% of new users from the EU region.
7. Test objective: testing changes related to the implementation of an improved recommender system.
8. Expected number of test participants: 6000.
9. Expected effect: within 14 days from registration, users should show an improvement of at least 10% for each metric: conversion to product page views (event: product_page), views of the product cart (event: product_cart), and purchases (event: purchase).

**Data Description**

`ab_project_marketing_events.csv` - calendar of marketing events for the year 2020.

File structure:
1. `name` - name of the marketing event.
2. `regions` - regions where the advertising campaign will take place.
3. `start_dt` - start date of the campaign.
4. `finish_dt` - end date of the campaign.

`final_ab_new_users.csv` - users who registered from December 7 to December 21, 2020.

File structure:
1. `user_id` - user identifier.
2. `first_date` - registration date.
3. `region` - user's region.
4. `device` - device used for registration.

`final_ab_events.csv` - actions of new users from December 7, 2020, to January 4, 2021.

File structure:
1. `user_id` - user identifier.
2. `event_dt` - date and time of the event.
3. `event_name` - type of event.
4. `details` - additional data about the event. For purchases (event: purchase), this field contains the cost of the purchase in dollars.

`final_ab_participants.csv` - table of test participants.

File structure:
1. `user_id` - user identifier.
2. `ab_test` - test name.
3. `group` - user group.

**Work Plan**

1. Data exploration: type conversion, identification of missing values and duplicates.
2. Assess the correctness of the test implementation. Check the following:
   1. Data compliance with the technical specification. Correctness of all technical specification points.
   2. Test timing: it should not coincide with marketing or other activities.
   3. Test audience: ensure there are no overlaps with a competing test and no users participating in both test groups simultaneously. Check the uniform distribution across test groups and the correctness of their formation.
3. Perform exploratory data analysis:
   1. Are the number of events per user evenly distributed in the samples?
   2. How is the number of events distributed over the days in the samples?
   3. How does the conversion in the funnel change at different stages in the samples?
   4. What data peculiarities should be considered before starting A/B testing?
4. Evaluate the results of A/B testing:
   1. What can be said about the results of A/B testing?
   2. Check the statistical difference in proportions using the z-test.
5. Make a general conclusion about the correctness of the test implementation.


In [2]:
import pandas as pd
import numpy as np
import math as mth
import plotly.express as px
from scipy import stats as st
from IPython.display import display

# Save raw datasets in case we need them
try:
    raw_new_users = pd.read_csv('final_ab_new_users.csv')
    raw_test_users = pd.read_csv('final_ab_participants.csv')
    raw_events = pd.read_csv('final_ab_events.csv')
    raw_marketing = pd.read_csv('ab_project_marketing_events.csv')
except:
    raw_new_users = pd.read_csv('/datasets/final_ab_new_users.csv')
    raw_test_users = pd.read_csv('/datasets/final_ab_participants.csv')
    raw_events = pd.read_csv('/datasets/final_ab_events.csv')
    raw_marketing = pd.read_csv('/datasets/ab_project_marketing_events.csv')

# Constants, which we will need later
FIG_WIDTH = 8
FIG_HEIGHT = 5


## Data preprocessing

To begin, let's examine the data we have received. We are interested in the following:

1. Column names.
2. Values in the columns that are not of the appropriate data type.
3. Missing values.
4. Additional columns and their respective data types.

Let's do one-by-one.

In [3]:
for dataset in [raw_new_users, raw_test_users, raw_events, raw_marketing]:
    print('-' * 50)
    dataset.info()


--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61733 entries, 0 to 61732
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     61733 non-null  object
 1   first_date  61733 non-null  object
 2   region      61733 non-null  object
 3   device      61733 non-null  object
dtypes: object(4)
memory usage: 1.9+ MB
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18268 entries, 0 to 18267
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  18268 non-null  object
 1   group    18268 non-null  object
 2   ab_test  18268 non-null  object
dtypes: object(3)
memory usage: 428.3+ KB
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440317 entries, 0 to 440316
Data columns (total 4 columns):
 #   Column   

The column names are already properly defined. We have encountered a considerable number of missing values in the `raw_events` dataset, which may be attributed to the data collection process. Lastly, we need to establish the appropriate data types for the date columns.

Let's examine the contents of these tables in more detail:

In [4]:
(raw_new_users
 .fillna('NA')
 .pivot_table(index='region', columns='device', values='user_id', aggfunc='nunique', margins=True)
 .reset_index()
 .rename_axis('index', axis=1)
 .assign(sort_key=lambda df: df.region == 'All')
 .sort_values(['sort_key', 'All'], ascending=[True, False])
 .drop('sort_key', axis=1)
)


index,region,Android,Mac,PC,iPhone,All
2,EU,20629,4575,11693,9373,46270
3,N.America,4077,883,2327,1868,9155
1,CIS,1413,310,776,656,3155
0,APAC,1401,316,803,633,3153
4,All,27520,6084,15599,12530,61733


In [5]:
(raw_test_users
 .fillna('NA')
 .pivot_table(index='group', columns='ab_test', values='user_id', aggfunc='nunique', margins=True)
 .reset_index()
 .rename_axis('index', axis=1)
 .rename(columns={'user_id':'ucount'})
)


index,group,interface_eu_test,recommender_system_test,All
0,A,5831,3824,9173
1,B,5736,2877,8269
2,All,11567,6701,16666


In [6]:
(raw_events
 .fillna('NA')
 .pivot_table(index=['event_name','details'], values='user_id', aggfunc='count', margins=True)
 .reset_index()
 .rename_axis('index', axis=1)
 .rename(columns={'user_id':'count'})
)


index,event_name,details,count
0,login,,189552
1,product_cart,,62462
2,product_page,,125563
3,purchase,4.99,46362
4,purchase,9.99,9530
5,purchase,99.99,5631
6,purchase,499.99,1217
7,All,,440317


In [7]:
fig = px.timeline(
    raw_marketing.sort_values('start_dt'),
    x_start='start_dt', x_end='finish_dt',
    y='name'
)
fig.update_yaxes(autorange='reversed')
fig.show()

It seems that everything is in order, except for the `raw_marketing` table. The `regions` column contains values separated by commas, which may cause issues.

In the `raw_events` table, the `details` column appears to represent the purchase cost. For events that do not correspond to purchases, this column will be empty (as expected).

Finally, in the `raw_tests` table, our users from groups A and B participated in two experiments (and some may have participated in both simultaneously). We will need to consider this in our subsequent analysis.

Now, let's verify the main parameters according to the technical specification. We will begin by examining the dates in the datasets.

In [8]:
print(
    'User registrations from ', raw_new_users.first_date.min(), ' to ', raw_new_users.first_date.max(), '\n',
    '-' * 50, '\n',
    'User events from ', raw_events.event_dt.min(), ' to ', raw_events.event_dt.max(), '\n',
    '-' * 50, '\n',
    'Marketing events start dates from ', raw_marketing.start_dt.min(), ' to ', raw_marketing.start_dt.max(), '\n',
    'Marketing events end dates from ', raw_marketing.finish_dt.min(), ' to ', raw_marketing.finish_dt.max(),
    sep=''
)


User registrations from 2020-12-07 to 2020-12-23
--------------------------------------------------
User events from 2020-12-07 00:00:33 to 2020-12-30 23:36:33
--------------------------------------------------
Marketing events start dates from 2020-01-25 to 2020-12-30
Marketing events end dates from 2020-02-07 to 2021-01-07


The registration date of new users aligns with the specification: `2020-12-07`. However, the end date does not match; in the dataset, it is `2020-12-23` instead of the required `2020-12-21`. We will need to truncate these users.

Regarding the events, there is a shortfall. The first event occurs on `2020-12-07`, and the last event is on `2020-12-30`, rather than the specified date of `2021-01-04`. According to the conditions, the user lifetime should be 14 days. Users who arrive on `2020-12-21` will not have enough time to "live" sufficiently for analysis, while users who arrive on `2020-12-07` will have a longer lifetime than necessary. We will trim the events that correspond to a lifetime of more than 14 days.

Lastly, we had multiple marketing events, but only two fall within the relevant dates: the New Year Lottery in the CIS region and Christmas in Europe and North America. Since we are only interested in users from Europe, only Christmas could have influenced them.

To facilitate further analysis, it is advisable to create a consolidated dataset that we will utilize.

In [9]:
# Take users which participated only in one experiment
df_temp = (
    pd.pivot_table(data=raw_test_users, index='user_id', values='ab_test', aggfunc='nunique')
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'ab_test': 'test_count'})
    .loc[lambda df: df.test_count == 1]
)

# From test users select only the ones that had one experiment
# df_test_users = raw_test_users[raw_test_users.user_id.isin(df_temp.user_id)].copy()
df_test_users = raw_test_users.copy()

# Keep the rest of the datasets
df_new_users = raw_new_users.copy().astype({'first_date': 'datetime64[D]'})
df_events = raw_events.copy().astype({'event_dt': 'datetime64[s]'})

# Create a summary dataset
df_summary = (
    raw_events
    .copy()
    .merge(df_new_users, on='user_id', how='left')
    .merge(df_test_users, on='user_id', how='left')
    .assign(
        event_date=lambda df: df.event_dt,
        lifetime=lambda df:
            (pd.to_datetime(df.event_dt) - pd.to_datetime(df.first_date)) / np.timedelta64(1, 'D')
    )
    .loc[lambda df:
         (df.first_date <= '2020-12-21')
         & (df.ab_test == 'recommender_system_test')
         & (df.lifetime <= 14)
         & (df.region == 'EU')
         & (df.user_id.isin(df_temp.user_id))
    ]
    .drop(['first_date', 'ab_test'], axis=1)
    .astype({'event_dt': 'datetime64[s]', 'event_date': 'datetime64[D]'})
)


Let's examine the number of users in the participants table.

In [10]:
df_temp = (
    df_test_users[df_test_users.ab_test == 'recommender_system_test']
    .pivot_table(index='group', values='user_id', aggfunc='nunique', margins=True)
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'user_id':'user_id_total_ucount'})
    .assign(user_id_pct=lambda df: round(100 * df.user_id_total_ucount / df.user_id_total_ucount.iloc[-1], 1))
)

display(df_temp)


index,group,user_id_total_ucount,user_id_pct
0,A,3824,57.1
1,B,2877,42.9
2,All,6701,100.0


The total number of unique users in the two groups turned out to be less than 6,000. This is because of the initial sample: we had 6,700 individuals, but 1,600 of them participated in both experiments, and we had to truncate them.

Furthermore, our groups are not well-balanced, with a difference of over 10% (around 700 individuals).

In [11]:
(df_summary
 .pivot_table(index='group', values='user_id', aggfunc='nunique', margins=True)
 .reset_index()
 .rename_axis('index', axis=1)
 .rename(columns={'user_id':'user_id_active_ucount'})
 .merge(df_temp[['group', 'user_id_total_ucount']], on='group', how='left')
 .assign(user_acitve_pct=lambda df: round(100 * df.user_id_active_ucount / df.user_id_total_ucount, 2))
)


index,group,user_id_active_ucount,user_id_total_ucount,user_acitve_pct
0,A,1939,3824,50.71
1,B,655,2877,22.77
2,All,2594,6701,38.71


The situation with active users is not satisfactory, particularly in group B, where only 30% of the individuals performed any actions during the specified period.

Let's examine the distribution of users across regions.

In [12]:
df_temp = (
    pd.merge(left=df_test_users, right=df_new_users, on='user_id', how='left')
    .loc[lambda df: df.ab_test == 'recommender_system_test']
    .pivot_table(index='region', values='user_id', aggfunc='nunique', margins=True)
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'user_id': 'user_id_ucount'})
    .assign(
        user_id_pct=lambda df: round(100 * df.user_id_ucount / df.user_id_ucount.iloc[-1], 1),
        sort_key=lambda df: df.region == 'All'
    )
    .sort_values(['sort_key', 'user_id_ucount'], ascending=[True, False])
    .drop('sort_key', axis=1)
)

display(df_temp)


index,region,user_id_ucount,user_id_pct
2,EU,6351,94.8
3,N.America,223,3.3
0,APAC,72,1.1
1,CIS,55,0.8
4,All,6701,100.0


The majority of people in the test came from the EU region, as we expected. Let's examine the proportion they represent in relation to the total number of users.

In [13]:
print(
    'Total number of new users from EU: ',
    df_new_users[(df_new_users.region == 'EU') & (df_new_users.first_date <= '2020-12-21')].user_id.nunique(), '\n',
    'Number of new users from EU in the test: ', 
    df_temp[df_temp.region == 'EU'].user_id_ucount.sum(), '\n',
    'Percentage of new users from EU in the test: ',
    round(100 * df_temp[df_temp.region == 'EU'].user_id_ucount.sum()
    / df_new_users[(df_new_users.region == 'EU') & (df_new_users.first_date <= '2020-12-21')].user_id.nunique(), 2),
    sep=''
)


Total number of new users from EU: 42340
Number of new users from EU in the test: 6351
Percentage of new users from EU in the test: 15.0


Users from the EU region account for 10% of the test participants. Once again, this is not what we would have preferred. However, the reason remains the same as in the previous section – we have excluded users who participated in both tests simultaneously.

Determining the next steps for the test is challenging since half of our metrics do not meet the requirements of the technical specification.

We have 5.1K users instead of the expected 6K (with slightly over half of them being active). Users are not evenly distributed between the groups. We do not have sufficient data to cover the entire test period (events end before the 14-day user lifetime). We have only obtained 10% of new users from the EU, and the remaining 5% of users are from other regions.

Given the limited options, let's examine what occurred during the test period. However, we must exercise caution in interpreting the results of these experiments.

## Exploratory data analysis

Let's examine the main dependencies in the data and explore the following aspects: how users performed events and the progression through the funnel.

### Analysis of event counts

First, we will check the total number of events by groups, and then by user.

In [14]:
df_temp = (
    pd.pivot_table(
        data=df_summary,
        index=['event_date', 'group'],
        values='event_name',
        aggfunc='count'
    )
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'event_name': 'event_count'})
)

fig = px.line(
    df_temp,
    x='event_date',
    y='event_count',
    color='group',
    markers=True,
    title='Number of events per day depending on the test group',
    labels=dict(event_date='Date', event_count='Events, #'),
    width=FIG_WIDTH * 100,
    height=FIG_HEIGHT * 100,
    template='plotly_white'
)
fig.update_layout(
    legend=dict(orientation='h', yanchor='top', y=1.1, x=0), hovermode='x unified'
)
fig.show()


The number of events in group A significantly increases after December 12th. However, this could be attributed to the difference in the total number of users.

Let's examine this metric on a per-user basis.

In [15]:
df_temp = (
    pd.pivot_table(
        data=df_summary,
        index=['event_date', 'group'],
        values=['user_id', 'event_name'],
        aggfunc={'user_id': 'nunique', 'event_name': 'count'}
    )
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'event_name': 'event_count', 'user_id': 'user_id_ucount'})
    .assign(
        events_per_user=lambda df: df.event_count / df.user_id_ucount
    )
)

fig = px.line(
    df_temp,
    x='event_date',
    y='events_per_user',
    color='group',
    markers=True,
    title='Number of events per user per day depending on the test group',
    labels=dict(event_date='Date', events_per_user='Events per user, #'),
    width=FIG_WIDTH * 100,
    height=FIG_HEIGHT * 100,
    template='plotly_white'
)
fig.update_layout(
    legend=dict(orientation='h', yanchor='top', y=1.1, x=0), hovermode='x unified'
)
fig.show()


The difference still exists, but it is not as significant as in the previous graph. In group A, there were more events per user throughout the entire test period compared to group B.

Let's examine the overall user behavior.

In [16]:
df_temp = (
    pd.pivot_table(
        data=df_summary,
        index=['user_id', 'group'],
        values='event_name',
        aggfunc='count'
    )
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'event_name': 'event_count'})
)

fig = px.violin(
    df_temp,
    y='group',
    x='event_count',
    color='group',
    # spanmode='hard',
    title='Number of events per user depending on the test group',
    labels=dict(group='Test group', event_count='Events per user, #'),
    width=FIG_WIDTH * 100,
    height=FIG_HEIGHT * 100,
    template='plotly_white',
)
fig.update_traces(showlegend=False)
fig.show()

display(
    round(
        pd.pivot_table(data=df_temp, index='user_id', columns='group')
        .reset_index()
        .droplevel(level=0, axis=1)
        .describe()
        .T, 2
    )
)


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
A,1939.0,6.9,3.83,1.0,4.0,6.0,9.0,24.0
B,655.0,5.41,3.27,1.0,3.0,4.0,6.5,24.0


User behavior differs: on average, users in group A performed 7 events, while users in group B performed around 6 events. The median value for group A is 6 events, while for group B it is 5 events. Whether these observations, along with the previous ones, have statistical significance will be examined in the subsequent sections.

### Conversion funnel analysis

Let's examine the sales funnels in general and for each group.

In [17]:
pvt_funnel = (
    pd.pivot_table(
        data=df_summary,
        index='event_name',
        columns='group',
        values='user_id',
        aggfunc='nunique',
        margins=True,
    )
    .reset_index()
    .rename_axis('index', axis=1)
    .rename(columns={'A': 'group_a', 'B': 'group_b', 'All': 'group_ab'})
    .assign(
        conversion_a_pct=lambda df: round(100 * df.group_a / df.group_a.iloc[0], 2),
        conversion_b_pct=lambda df: round(100 * df.group_b / df.group_b.iloc[0], 2),
        conversion_ab_pct=lambda df: round(100 * df.group_ab / df.group_ab.iloc[0], 2),
        sort_key=lambda df: df.event_name.map({'login': 1, 'product_page': 2, 'product_cart': 3, 'purchase': 4})
    )
    .loc[lambda df: df.event_name != 'All']
    .sort_values('sort_key')
    .drop('sort_key', axis=1)
)

display(pvt_funnel)


index,event_name,group_a,group_b,group_ab,conversion_a_pct,conversion_b_pct,conversion_ab_pct
0,login,1939,654,2593,100.0,100.0,100.0
2,product_page,1265,367,1632,65.24,56.12,62.94
1,product_cart,589,184,773,30.38,28.13,29.81
3,purchase,613,191,804,31.61,29.2,31.01


It appears that users can make a purchase without viewing the cart. Let's take a closer look at the funnels.

In [18]:
# pvt_funnel = pvt_funnel[pvt_funnel.event_name != 'product_cart']

fig = px.funnel(
    pvt_funnel.iloc[:, 0:4].melt(id_vars='event_name'),
    x='value',
    y='event_name',
    # facet_col='index',
    color='index',
    title='Funnel stages per group',
    labels=dict(event_name='Funnel stage', index='Legend'),
    width=FIG_WIDTH * 100,
    height=FIG_HEIGHT * 100,
    template='plotly_white'
)
fig.update_layout(
    legend=dict(orientation='h', yanchor='top', y=1.1, x=0)
)
fig.show()

The conversion rates at each step for the groups were different: group A turned out to be slightly more successful than group B.

As a final step, we will determine whether this difference is statistically significant. We have already noted the various data peculiarities that should be taken into account when drawing conclusions.

### A/B test results

The main task is to assess the differences in conversion rates at each step between the groups. We can utilize the Z-test for this purpose. To minimize the risk of errors, let's choose a significance level of `alpha = 0.01`.

In [19]:
display(pvt_funnel)


index,event_name,group_a,group_b,group_ab,conversion_a_pct,conversion_b_pct,conversion_ab_pct
0,login,1939,654,2593,100.0,100.0,100.0
2,product_page,1265,367,1632,65.24,56.12,62.94
1,product_cart,589,184,773,30.38,28.13,29.81
3,purchase,613,191,804,31.61,29.2,31.01


Let's write a function that will perform the Z-test for us.

In [20]:
def fun_get_p_value(start_count: list, end_count: list, alpha: float) -> float:
    """
    This function output p-value of a Z-test for conversions of 2 samples.

    Args:
        start_count (list): list of starting values.
        end_count (list): list of values after an event.
        alpha (float): statistical significance

    Returns:
        p-value: p-value of a test
    """
    
    p1 = end_count[0] / start_count[0]
    p2 = end_count[1] / start_count[1]
    
    p_combined = (end_count[0] + end_count[1]) / (start_count[0] + start_count[1])
    difference = p1 - p2
    
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1 / start_count[0] + 1 / start_count[1]))
    
    distribution = st.norm(0, 1)
    p_value = (1 - distribution.cdf(abs(z_value))) * 2

    return p_value


Now let's test the hypotheses:

> **Null Hypothesis (H0):** There are no statistically significant differences between the conversion rates of group A and group B.

> **Alternative Hypothesis (H1):** There are statistically significant differences between the conversion rates of group A and group B.

We will accept the null hypothesis if `p_value > alpha`. We will accept the alternative hypothesis if `p_value < alpha`. Let's write another function that will analyze our `pvt_funnel` table.

In [21]:
def fun_get_hypothesis_results(df_input: pd.DataFrame, group_1: object, group_2: object, alpha: float) -> pd.DataFrame:
    """
    This function creates a summary report after testing hypotheses related to conversion.

    Args:
        df_input (DataFrame): input dataframe
        group_1 (object): name of the column which contains total unique users per each step of the pipeline for the first group 
        group_2 (object): name of the column which contains total unique users per each step of the pipeline for the second group
        alpha (float): statistical significance.

    Returns:
        DataFrame: summary table of key results for each test: steps in scope, conversions, p_value, alpha and H0/H1 decision.
    """
    
    start_event = []
    end_event = []
    conversion_group_1 = []
    conversion_group_2 = []
    p_values = []
    hypothesis_to_accept = []
    
    for counter in range(len(df_input.event_name) - 1):       
        start_event.append(df_input.event_name.iloc[0])
        end_event.append(df_input.event_name.iloc[counter + 1])
        
        conversion_group_1.append(
            round(100 * df_input[group_1].iloc[counter + 1] / df_input[group_1].iloc[0], 1)
        )
        
        conversion_group_2.append(
            round(100 * df_input[group_2].iloc[counter + 1] / df_input[group_2].iloc[0], 1)
        )
        
        p_value = fun_get_p_value(
            [df_input[group_1].iloc[0], df_input[group_2].iloc[0]],
            [df_input[group_1].iloc[counter + 1], df_input[group_2].iloc[counter + 1]],
            alpha
        )
        
        p_values.append(round(p_value, 3))
        
        if p_value > alpha:
            hypothesis_to_accept.append('H0')
        else:
            hypothesis_to_accept.append('H1')

    df_output = pd.DataFrame({
        'start_event': start_event,
        'end_event': end_event,
        'conversion_' + group_1: conversion_group_1,
        'conversion_' + group_2: conversion_group_2,
        'p_value': p_values,
        'alpha': alpha,
        'hypothesis_to_accept': hypothesis_to_accept
    })
    
    return df_output


Настало время узнать ответ: есть ли разница между группами A и B.

In [22]:
display(fun_get_hypothesis_results(pvt_funnel, 'group_a', 'group_b', 0.01))

Unnamed: 0,start_event,end_event,conversion_group_a,conversion_group_b,p_value,alpha,hypothesis_to_accept
0,login,product_page,65.2,56.1,0.0,0.01,H1
1,login,product_cart,30.4,28.1,0.278,0.01,H0
2,login,purchase,31.6,29.2,0.249,0.01,H0


Significant differences in conversion rates are observed only when transitioning from `login` to `product_page`. However, there is no expected 10% improvement in conversion rates at any step.

## Conclusions

Let's summarise our findings:

The conducted A/B test evaluating the effectiveness of implementing an enhanced recommender system did not yield the desired results and demonstrated several shortcomings in the test execution. The test was marred by issues such as overlapping experiments, imbalanced user groups, insufficient event data coverage, and a lower-than-expected number of active users.

The analysis of the test results revealed that there were no significant improvements in conversion rates at any stage of the funnel. This suggests that the implemented changes related to the enhanced recommender system did not have the desired impact on user behavior and engagement.

Given these findings, it is crucial to reassess the implementation of the recommender system and explore alternative strategies or improvements to achieve the desired outcomes. The test results underscore the importance of careful planning, accurate data collection, and robust experimental designs to obtain reliable and meaningful insights.

Moving forward, it is recommended to redesign the test, addressing the identified issues, and conduct a new experiment with proper controls and a larger sample size. Further analysis and evaluation of alternative approaches, such as refining recommendation algorithms and enhancing user experience, should be pursued to optimize user engagement and maximize the desired metrics.

This study serves as a valuable learning experience, highlighting the significance of rigorous experimentation practices and the need for data-driven decision-making. By incorporating these lessons, future tests can yield more accurate insights and inform effective strategies for improving user engagement and achieving desired business outcomes.