# Event-based analytics - funnels and aab testing

We will to investigate user behavior for an app, created by the a startup that sells food products.  

1. We will first study the sales funnel and find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular?  

2. Then we will look at the results of an A/A/B test.The users are split into three groups: two control groups get the old fonts and one test group gets the new ones. We will find out which set of fonts produces better results. Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. Comparing control groups also will tell us how much time and data we'll need when running further tests.  

We'll be using the same dataset for general analytics and for A/A/B analysis.

**Description of the data**  

Each log entry is a user action or an event.  
•	EventName — event name  
•	DeviceIDHash — unique user identifier  
•	EventTimestamp — event time  
•	ExpId — experiment number: 246 and 247 are the control groups, 248 is the test group

## Step 4. Studying the event funnel

### Events in the logs and their frequency of occurrence

In [42]:
events_freq = clean_logs['event_name'].value_counts().sort_values(ascending=False).reset_index()
events_freq

Unnamed: 0,index,event_name
0,MainScreenAppear,117328
1,OffersScreenAppear,46333
2,CartScreenAppear,42303
3,PaymentScreenSuccessful,33918
4,Tutorial,1005


The most frequent event is MainScreenAppear.

### Users who performed each of the actions

In [43]:
funnel = clean_logs.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False).reset_index()
funnel

Unnamed: 0,event_name,user_id
0,MainScreenAppear,7419
1,OffersScreenAppear,4593
2,CartScreenAppear,3734
3,PaymentScreenSuccessful,3539
4,Tutorial,840


- 7,419 users performed MainScreenAppear action  
- 4,593 users performed OffersScreenAppear action
- 3,734 users performed CartScreenAppear action
- 3,539 users performed PaymentScreenSuccessful action
- 840 users performed Tutorial action

### Proportion of users who performed an action at least once

In [44]:
funnel['proportion_to_total_users'] = funnel['user_id']/clean_logs['user_id'].nunique()
funnel

Unnamed: 0,event_name,user_id,proportion_to_total_users
0,MainScreenAppear,7419,0.984736
1,OffersScreenAppear,4593,0.609636
2,CartScreenAppear,3734,0.49562
3,PaymentScreenSuccessful,3539,0.469737
4,Tutorial,840,0.111495


It's interesting no see that not all users performed the most frequent action - only 98% did. Some may have stayed the the tutorial stage and never passed it. Also, business-wise, the product does great converting users - 47% users completed payment successfully.

### Order the actions took place

Let's see whether all groups are present in all events to make sure that none of the events was available for only one or two groups.

In [45]:
events_per_group = clean_logs.pivot_table(index='event_name', columns='group', values='user_id', aggfunc='nunique').reset_index()
events_per_group

group,event_name,246,247,248
0,CartScreenAppear,1266,1238,1230
1,MainScreenAppear,2450,2476,2493
2,OffersScreenAppear,1542,1520,1531
3,PaymentScreenSuccessful,1200,1158,1181
4,Tutorial,278,283,279


Based on the logic and on the numbers, the stages of the funnel are:   

**Stage 1:** MainScreenAppear  
**Stage 2:** OffersScreenAppear  
**Stage 3:** CartScreenAppear  
**Stage 4:** PaymentScreenSuccessful

However, since there's a drastic drop in numbers for users in Tutorial event, maybe it's optional, and the majority of users may have chosen to skip it. It's unclear what place in then funnel it takes, and therefore, we can conclude that it's not a part of a single sequence. Per project goal, we are interested to see how many users reach the purchase stage, and Tutorial has no impact on the matter. We may as well take it out of the funnel.

### Event funnel and share of users in the funnel

In [46]:
funnel['conversion_stages']=funnel['user_id'].pct_change()
funnel

Unnamed: 0,event_name,user_id,proportion_to_total_users,conversion_stages
0,MainScreenAppear,7419,0.984736,
1,OffersScreenAppear,4593,0.609636,-0.380914
2,CartScreenAppear,3734,0.49562,-0.187024
3,PaymentScreenSuccessful,3539,0.469737,-0.052223
4,Tutorial,840,0.111495,-0.762645


Stage 1: MainScreenAppear - 98% of users performed MainScreenAppear  
Stage 2: OffersScreenAppear - 61% of users performed OffersScreenAppear  
Stage 3: CartScreenAppear - 50% of users performed CartScreenAppear  
Stage 4: PaymentScreenSuccessful - 47% of users performed PaymentScreenSuccessful
   
Only 11% performed Tutorial.  
Among the main events, the largest drop is in the transition between acion 1: MainScreenAppear and action 2: OffersScreenAppear - 38%.

### Share of users who made the entire journey from their first event to payment

In [47]:
funnel_by_groups=[]
for i in clean_logs.group.unique():
    group=clean_logs[clean_logs.group==i].groupby(['event_name','group'])['user_id'].nunique().reset_index().sort_values(by='user_id',ascending=False)
    display(group)
    funnel_by_groups.append(group)

Unnamed: 0,event_name,group,user_id
1,MainScreenAppear,246,2450
2,OffersScreenAppear,246,1542
0,CartScreenAppear,246,1266
3,PaymentScreenSuccessful,246,1200
4,Tutorial,246,278


Unnamed: 0,event_name,group,user_id
1,MainScreenAppear,247,2476
2,OffersScreenAppear,247,1520
0,CartScreenAppear,247,1238
3,PaymentScreenSuccessful,247,1158
4,Tutorial,247,283


Unnamed: 0,event_name,group,user_id
1,MainScreenAppear,248,2493
2,OffersScreenAppear,248,1531
0,CartScreenAppear,248,1230
3,PaymentScreenSuccessful,248,1181
4,Tutorial,248,279


In [48]:
funnel_by_groups=pd.concat(funnel_by_groups)
funnel_by_groups

Unnamed: 0,event_name,group,user_id
1,MainScreenAppear,246,2450
2,OffersScreenAppear,246,1542
0,CartScreenAppear,246,1266
3,PaymentScreenSuccessful,246,1200
4,Tutorial,246,278
1,MainScreenAppear,247,2476
2,OffersScreenAppear,247,1520
0,CartScreenAppear,247,1238
3,PaymentScreenSuccessful,247,1158
4,Tutorial,247,283


In [49]:
colors = ["yellow", "yellow", "yellow", "yellow", "yellow", "purple", "purple", "purple", "purple", "purple", "red", "red", "red", "red", "red"]
fig = go.Figure(go.Funnel(y=funnel_by_groups['event_name'], x=funnel_by_groups['user_id'], 
                          textposition="inside", textinfo="value+percent initial", marker = dict(color=colors)))

fig.update_layout(
    title="Event funnel: share of users who made the entire journey from their first event to payment",
    legend_title="Groups",
    font=dict(
        size=12,
        color="RebeccaPurple"
    )
)
fig.show();

**Group 246:**  
62.9% of users who performed stage 1, performed stage 2. 82% of of users who performed stage 2, performed stage 3, and 94.6% of users who performed stage 3, performed stage 4. Almost 49% of users in the group went through all main stages.

**Group 247:**  
61.5% of users who performed stage 1, performed stage 2. 80.6% of of users who performed stage 2, performed stage 3, and 95.6% of users who performed stage 3, performed stage 4. 48% of users in the group went through all main stages.

**Group 248:**   
61.6% of users who performed stage 1, performed stage 2. 81% of of users who performed stage 2, performed stage 3, and 93.5% of users who performed stage 3, performed stage 4. 47% of users in the group went through all main stages.

On average, 48% of users made the entire journey from their first event to payment.

## Step 5. Studying the results of the experiment

### Number of users in each group

In [50]:
# Checking number of users per group
users_by_group = clean_logs.groupby(['group'])['user_id'].nunique().reset_index()
users_by_group

Unnamed: 0,group,user_id
0,246,2484
1,247,2513
2,248,2537


In [51]:
# Checking how many groups each user belongs to
clean_logs.groupby(['user_id'])['group'].nunique().reset_index()

Unnamed: 0,user_id,group
0,6888746892508752,1
1,6909561520679493,1
2,6922444491712477,1
3,7435777799948366,1
4,7702139951469979,1
...,...,...
7529,9217594193087726423,1
7530,9219463515465815368,1
7531,9220879493065341500,1
7532,9221926045299980007,1


In [52]:
#Checking whether there're users who belong to more than 1 group
clean_logs.groupby(['user_id'])['group'].nunique().reset_index().query('group>1')

Unnamed: 0,user_id,group


No users who belong to more then one group were found. We can conclude the following:  
In group 246 - 2,484 users.  
In group 247 - 2,513 users.  
In group 247 - 2,537 users.
Seems that the groups were split properly.

### Most popular event

We have already calculated the number of users who performed each events. Let's use the same table to see the most popular event per groups.

In [53]:
events_per_group

group,event_name,246,247,248
0,CartScreenAppear,1266,1238,1230
1,MainScreenAppear,2450,2476,2493
2,OffersScreenAppear,1542,1520,1531
3,PaymentScreenSuccessful,1200,1158,1181
4,Tutorial,278,283,279


In [54]:
share_246 = 2450/2513
print(share_246, 'of users in control group 246 performed MainScreenAppear.')

0.9749303621169917 of users in control group 246 performed MainScreenAppear.


In [55]:
share_247 = 2476/2537
print(share_247, 'of users in control group 246 performed MainScreenAppear.')

0.9759558533701221 of users in control group 246 performed MainScreenAppear.


- The most popular event is MainScreenAppear accross all groups.
- From control group 246 - 2450 users performed this action.
- From control group 247 - 2476 users performed this action.

### Statistical significance of the difference between samples

To check the statistical significance of the difference between samples 246 and 247, we will check the statistical significance of the difference between proportions of users who went from one stage to another.

Ho - null hypothesis: The proportions of users who went from one stage to another are the same are the same.

H1 hypothesis: The proportions of users who went from one stage to another are the same are NOT the same.

The difference between the proportions we observe in our samples will be our statistic. That's what you call a variable whose values can be found only from sample data. You can prove that it's distributed normally.

Z is the standard value for a criterion with a standard normal distribution, where the mean is 0 and the standard deviation is 1. All this is stated in the right part of the formula after the '～' sign, which says the expression is distributed as N(0,1). n₁ and n₂ represent the sizes of the two samples being compared (the number of observations they contain). P₁ and P₂ are the proportions observed in the samples, and P is the proportion in the two samples combined. π₁ and π₂ are the actual proportions in the populations we're comparing.


#### Control groups: proportions' comparison with the test group

We will first compare proportions of users who performed events between two control groups and then between each control group and test group separately.

In [56]:
def check_hypothesis(group1,group2,event,alpha=0.05):
    """
    The function will calculate the proportions and the p-value for proportions of users who performed each of the events.
    The proportion is the relationship between the number of successes (users who performed the event) to the number of trials
    (total users in the group). 
    It will then compare p-value with the significance level (set alpha=0.05) and make a conclusion based on the result: whether
    to reject the Ho hypothesis or not.
    The comparison will be perfromed for the set of two groups each time.
    """
    success1=events_per_group[events_per_group.event_name==event][group1].iloc[0]
    success2=events_per_group[events_per_group.event_name==event][group2].iloc[0]
    
    trials1=clean_logs[clean_logs.group==group1]['user_id'].nunique()
    trials2=clean_logs[clean_logs.group==group2]['user_id'].nunique()
    
    
    # success proportion in the first group:
    p1 = success1/trials1

    # success proportion in the second group:
    p2 = success2/trials2

    # success proportion in the combined dataset:
    p_combined = (success1 + success2) / (trials1 + trials2)

    # the difference between the datasets' proportions
    difference = p1 - p2
    # calculating the statistic in standard deviations of the standard normal distribution
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))

    # setting up the standard normal distribution (mean 0, standard deviation 1)
    distr = st.norm(0, 1)
    # calculating the statistic in standard deviations of the standard normal distribution
    p_value = (1 - distr.cdf(abs(z_value))) * 2
    
    print('The proportion of users who performed the action to total number of users is', p1)
    print('The proportion of users who performed the action to total number of users is', p2)
    print('p-value: ', p_value)

    if (p_value < alpha):
        print("Rejecting the null hypothesis for", event,"and groups", group1,group2, ": there is a significant difference between the proportions.")
    else:
        print("Failed to reject the null hypothesis for",event,"and groups", group1,'and', group2, ": there is no significant difference between the proportions.")
    

In [57]:
for i in events_per_group.event_name.unique():
    check_hypothesis(246,247,i,alpha=0.05)
    print('-----------------------------')

The proportion of users who performed the action to total number of users is 0.5096618357487923
The proportion of users who performed the action to total number of users is 0.4926382809391166
p-value:  0.22883372237997213
Failed to reject the null hypothesis for CartScreenAppear and groups 246 and 247 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.9863123993558777
The proportion of users who performed the action to total number of users is 0.9852765618782332
p-value:  0.7570597232046099
Failed to reject the null hypothesis for MainScreenAppear and groups 246 and 247 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.6207729468599034
The proportion of users who performed the action to total number of users is 0.6048547552725826
p-value:  0.2480954

In [58]:
for i in events_per_group.event_name.unique():
    check_hypothesis(246,248,i,alpha=0.05)
    print('-----------------------------')

The proportion of users who performed the action to total number of users is 0.5096618357487923
The proportion of users who performed the action to total number of users is 0.48482459597950334
p-value:  0.07842923237520116
Failed to reject the null hypothesis for CartScreenAppear and groups 246 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.9863123993558777
The proportion of users who performed the action to total number of users is 0.9826566811194324
p-value:  0.2949721933554552
Failed to reject the null hypothesis for MainScreenAppear and groups 246 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.6207729468599034
The proportion of users who performed the action to total number of users is 0.6034686637761135
p-value:  0.208362

In [59]:
for i in events_per_group.event_name.unique():
    check_hypothesis(247,248,i,alpha=0.05)
    print('-----------------------------')

The proportion of users who performed the action to total number of users is 0.4926382809391166
The proportion of users who performed the action to total number of users is 0.48482459597950334
p-value:  0.5786197879539783
Failed to reject the null hypothesis for CartScreenAppear and groups 247 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.9852765618782332
The proportion of users who performed the action to total number of users is 0.9826566811194324
p-value:  0.4587053616621515
Failed to reject the null hypothesis for MainScreenAppear and groups 247 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.6048547552725826
The proportion of users who performed the action to total number of users is 0.6034686637761135
p-value:  0.9197817

**Conclusion:**  

The results are obvoius and unambigious: we failed the reject the Ho hypotheses - there's no significant difference between the proportions of users who performed any of the actions, neither between the control groups (as expected), not between each of the control groups and the test group. The font change doesn't produce a difference. 

#### Combined control groups: proportions' comparison with the test group

We will now compare proportions for the combined control group (246 users+247 users) and the test group. For the sake of unity, we will define the combined group of users 246 and 247 as group 249 and will make calculations for each event.

In [60]:
#Creating column [249] which will caclulate the combinbed number of users from control groups 246 and 247, 
#who performed each of the events.
events_per_group[249] = events_per_group[246] + events_per_group[247]
events_per_group

group,event_name,246,247,248,249
0,CartScreenAppear,1266,1238,1230,2504
1,MainScreenAppear,2450,2476,2493,4926
2,OffersScreenAppear,1542,1520,1531,3062
3,PaymentScreenSuccessful,1200,1158,1181,2358
4,Tutorial,278,283,279,561


In [61]:
def check_hypothesis2(group1,group2,group3,group4,event,alpha=0.05):
    """
    The function will calculate the proportions and the p-value for proportions of users who performed each of the events.
    The groups:
    group 1 - control group 246
    group 2 - control group 247
    group 3 - combined control group 246
    group 4 - test group 248
    The proportion is the relationship between the number of successes (users who performed the event) to the number of trials
    (total users in the group). 
    trails1 variable is a sum of the number of users in group 246 and group 247 in clean_logs dataset.
    Success1 variable is a number of users in group 249 (group 3), which is a combination of groups 246 and 247.
    The function will then compare p-value with the significance level (set alpha=0.05) and make a conclusion based on the result: whether
    to reject the Ho hypothesis or not.
    """
    success1=events_per_group[events_per_group.event_name==event][group3].iloc[0]
    success2=events_per_group[events_per_group.event_name==event][group4].iloc[0]
    
    trials1=clean_logs[clean_logs.group==group1]['user_id'].nunique()+clean_logs[clean_logs.group==group2]['user_id'].nunique()
    trials2=clean_logs[clean_logs.group==group4]['user_id'].nunique()
    
    
    # success proportion in the first group:
    p1 = success1/trials1

    # success proportion in the second group:
    p2 = success2/trials2

    # success proportion in the combined dataset:
    p_combined = (success1 + success2) / (trials1 + trials2)

    # the difference between the datasets' proportions
    difference = p1 - p2
    # calculating the statistic in standard deviations of the standard normal distribution
    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))

    # setting up the standard normal distribution (mean 0, standard deviation 1)
    distr = st.norm(0, 1)
    # calculating the statistic in standard deviations of the standard normal distribution


    p_value = (1 - distr.cdf(abs(z_value))) * 2
    
    print('The proportion of users who performed the action to total number of users is', p1)
    print('The proportion of users who performed the action to total number of users is', p2)
    print('p-value: ', p_value)

    if (p_value < alpha):
        print("Rejecting the null hypothesis for", event,"and groups", group3, 'and', group4, ": there is a significant difference between the proportions.")
    else:
        print("Failed to reject the null hypothesis for",event,"and groups", group3,'and', group4, ": there is no significant difference between the proportions.")
    

In [62]:
for i in events_per_group.event_name.unique():
    check_hypothesis2(246,247,249,248,i,alpha=0.05)
    print('-----------------------------')

The proportion of users who performed the action to total number of users is 0.5011006603962378
The proportion of users who performed the action to total number of users is 0.48482459597950334
p-value:  0.18175875284404386
Failed to reject the null hypothesis for CartScreenAppear and groups 249 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.9857914748849309
The proportion of users who performed the action to total number of users is 0.9826566811194324
p-value:  0.29424526837179577
Failed to reject the null hypothesis for MainScreenAppear and groups 249 and 248 : there is no significant difference between the proportions.
-----------------------------
The proportion of users who performed the action to total number of users is 0.6127676605963578
The proportion of users who performed the action to total number of users is 0.6034686637761135
p-value:  0.43425

**Conclusion:**  

The results are obvoius and unambigious: we failed the reject the Ho hypotheses - there's no significant difference between the proportions of users who performed any of the actions between the combined control group (as expected) and the test group. The font change doesn't produce a difference. 

### Bonferroni correction

•	What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

The significant level set was 0.05. We have carried out 20 statistical hypothesis tests. When we compare several groups with a control group, we'll have to take into account the increasing probability of the type I (reject the true null hypothesis) and type II (accept a false hull hypothesis) errors. Looking at the hypotheses testing results above, we can say that the type II error is relevant here. We might have accepted some false Ho hypotheses. 
The Bonferroni procedure (the Bonferroni correction) is the most common and most approximate correction of the required significance level. The significance level in each of m comparisons is m times less than the significance level required for a single comparison: the significance level ɑ is divided by the number of hypotheses.

In [63]:
a=0.05/20
a

0.0025

The significance level of each test should be alpha = 0.0025.

In order to reject the null hypothesis, the p-value has to be smaller then alpha. In all the tests above the p-value is significantly bigger than alpha, and by dividing the original significance level by the number of hypothesis, we only make this gap larger. Therefore, there's no need to run the tests again with different significance level - the results won't change. There's no significant statistical difference between the proportions of users who performed of the actions.

## General conclusions


**1. The Data:**   
    -Duplicates were dropped.  
    -EventTimestamp column was converted to datetime format.  
    -The original columns were renamed.  
    -No missing values were found.  
    -Date column has been added.   

**2. Date description:**  
    - There’re 5 unique events in the log:  
        - MainScreenAppear  
        - OffersScreenAppear  
        - CartScreenAppear  
        - PaymentScreenSuccessful  
        - Tutorial   
        
  -	In total, there are 243,713 events performed by users.  
  -	There're 7,551 unique users in the log.   
  - On average, each user performed 32 actions. 50% of users performed 20 actions and under. 25% of uers performed over 37 actions.   
  - Out of 7,751 users, 3035 (about 40%) performed all main actions, including successful payment.   
  
**3. Data period:**   
  
  The original data covers the period between 25 July 2019 and 07 August 2019 - two weeks.
The number of events was suspiciously low for the period July 25-July 31. As we couldn’t be sure that we had equally complete data for the entire period, we have excluded those dates from the analysis, and only used the period of 1 August 2019 - 7 August 2019. The first week may represent the logs of previous experiment participants.
By cleaning the data we’ve lost around 1% of events in the log and 17 users.   

**4. Studying the event funnel**  
   4.1  The most frequent event is MainScreenAppear.  
   
   4.2  
   •	7,419 users performed MainScreenAppear action  
   •	4,593 users performed OffersScreenAppear action 
   •	3,734 users performed CartScreenAppear action  
   •	3,539 users performed PaymentScreenSuccessful action  
   •	840 users performed Tutorial action  
   
   4.3 Only 98% users performed the most frequent action. Business-wise, the product does great converting users, as 47% users completed payment successfully.  
   
   
   4.4 Based on the logic and on the numbers, the stages of the funnel are: 
   Stage 1: MainScreenAppear  
   Stage 2: OffersScreenAppear  
   Stage 3: CartScreenAppear  
   Stage 4: PaymentScreenSuccessful  
   
   However, since there's a drastic drop in numbers for users in Tutorial event, it seems this is an optional event, and the majority of users may have chosen to skip it. It's unclear what place in then funnel it takes. Per project goal, we are interested to see how many users reach the purchase stage, and Tutorial has no impact on the matter, therefore, we’ve calculated the funnel without the event. 
   
   
**4. Share of users in the funnel**  
- Stage 1: MainScreenAppear - 98% of users performed MainScreenAppear  
- Stage 2: OffersScreenAppear - 61% of users performed OffersScreenAppear  
- Stage 3: CartScreenAppear - 50% of users performed CartScreenAppear  
- Stage 4: PaymentScreenSuccessful - 47% of users performed PaymentScreenSuccessful  

Only 11% performed Tutorial.  
Among the main events, the largest drop is in the transition between acion 1: MainScreenAppear and action 2: OffersScreenAppear - 38%.  


**5. Share of users who made the entire journey from their first event to payment**  

**Group 246:**  
62.9% of users who performed stage 1, performed stage 2. 82% of of users who performed stage 2, performed stage 3, and 94.6% of users who performed stage 3, performed stage 4. Almost 49% of users in the group went through all main stages.  

**Group 247:**  
61.5% of users who performed stage 1, performed stage 2. 80.6% of of users who performed stage 2, performed stage 3, and 95.6% of users who performed stage 3, performed stage 4. 48% of users in the group went through all main stages.  

**Group 248:**  
61.6% of users who performed stage 1, performed stage 2. 81% of of users who performed stage 2, performed stage 3, and 93.5% of users who performed stage 3, performed stage 4. 47% of users in the group went through all main stages.
On average, 48% of users made the entire journey from their first event to payment.  

**6. The results of the experiment**
   6.1 Number of users in each group: 
  - No users were found belonging to more than one group were found. 
  - In group 246 - 2,484 users.   
  - In group 247 - 2,513 users.   
  - In group 247 - 2,537 users.    
  
  Seems that the groups were split properly.
    
   6.2 Most popular event  
  •	The most popular event is MainScreenAppear accross all groups.  
  •	From control group 246 - 2450 users performed this action.  
  •	From control group 247 - 2476 users performed this action.  
  
  
**7. Statistical significance of the difference between samples**  


To check the statistical significance of the difference between samples 246 and 247, we have checkd the statistical significance of the difference between proportions of users who went from one stage to another.  

**Ho - null hypothesis:** The proportions of users who went from one stage to another are the same are the same.  

**H1 hypothesis:** The proportions of users who went from one stage to another are the same are NOT the same.  

Z is the standard value for a criterion with a standard normal distribution, where the mean is 0 and the standard deviation is 1. All this is stated in the right part of the formula after the '～' sign, which says the expression is distributed as N(0,1). n₁ and n₂ represent the sizes of the two samples being compared (the number of observations they contain). P₁ and P₂ are the proportions observed in the samples, and P is the proportion in the two samples combined. π₁ and π₂ are the actual proportions in the populations we're comparing.  
   1.	Control groups: proportions' comparison with the test group
The results were obvious and unambiguous: we failed the reject the Ho hypotheses - there's no significant difference between the proportions of users who performed any of the actions, neither between the control groups (as expected), not between each of the control groups and the test group. The font change doesn't produce a difference.  
   
   2.	Combined control groups: proportions' comparison with the test group
To compare proportions for the combined control group (246 users+247 users) and the test group, for the sake of unity, we defined the combined group of users 246 and 247 as group 249.
The results were obvious and unambiguous: we failed the reject the Ho hypotheses - there's no significant difference between the proportions of users who performed any of the actions between the combined control group (as expected) and the test group. The font change doesn't produce a difference.  

**8. Error correction**  

  - The initial significant level set was 0.05. We have carried out 20 statistical hypothesis tests. When we compare several groups with a control group, we'll had to take into account the increasing probability of the type I (reject the true null hypothesis) and type II (accept a false hull hypothesis) errors. Looking at the hypotheses testing results above, we can say that the type II error is relevant here. We might have accepted some false Ho hypotheses. The Bonferroni procedure (the Bonferroni correction) is the most common and most approximate correction of the required significance level. The significance level in each of m comparisons is m times less than the significance level required for a single comparison: the significance level ɑ is divided by the number of hypotheses.
  
  -	Significance level of each test should be alpha = 0.0025.  
  
  -	In order to reject the null hypothesis, the p-value has to be smaller then alpha. In all the tests above the p-value is significantly bigger than alpha, and by dividing the original significance level by the number of hypotheses, we only make this gap larger. Therefore, there's no need to run the tests again with different significance level - the results won't change. There's no significant statistical difference between the proportions of users who performed of the actions.
