# Project description
---
**Context**

You work at a startup that sells food products. You need to investigate user behavior for the company's app.

First study the sales funnel. Find out how users reach the purchase stage. How many users actually make it to this stage? How many get stuck at previous stages? Which stages in particular?

Then look at the results of an A/A/B test. (Read on for more information about A/A/B testing.) The designers would like to change the fonts for the entire app, but the managers are afraid the users might find the new design intimidating. They decide to make a decision based on the results of an A/A/B test.

The users are split into three groups: two control groups get the old fonts and one test group gets the new ones. Find out which set of fonts produces better results.

Creating two A groups has certain advantages. We can make it a principle that we will only be confident in the accuracy of our testing when the two control groups are similar. If there are significant differences between the A groups, this can help us uncover factors that may be distorting the results. Comparing control groups also tells us how much time and data we'll need when running further tests.

You'll be using the same dataset for general analytics and for A/A/B analysis. In real projects, experiments are constantly being conducted. Analysts study the quality of an app using general data, without paying attention to whether users are participating in experiments.


# Description of the data
---

Each log entry is a user action or an event.
- `EventName` — event name
- `DeviceIDHash` — unique user identifier
- `EventTimestamp` — event time
- `ExpId` — experiment number: 246 and 247 are the control groups, 248 is the test group

# Table of Contents
---

**Step 1. Open the data file and read the general information**

- File path: /datasets/logs_exp_us.csv Download dataset

**Step 2. Prepare the data for analysis** 
- Rename the columns in a way that's convenient for you
- Check for missing values and data types. Correct the data if needed
- Add a date and time column and a separate column for dates

**Step 3. Study and check the data**
- How many events are in the logs?
- How many users are in the logs?
- What's the average number of events per user?
- What period of time does the data cover? 
    - Maximum and the minimum date. 
    - Plot a histogram by date and time. 
        - Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?
- Did we lose many events and users when excluding the older data?
- Making sure we have users from all three experimental groups.

**Step 4. Study the event funnel**
- See what events are in the logs and their frequency of occurrence. Sort them by frequency.
- Find the number of users who performed each of these actions. Sort the events by the number of users. Calculate the proportion of users who performed the action at least once.
- In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.
- Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)
- At what stage do you lose the most users?
- What share of users make the entire journey from their first event to payment?

**Step 5. Study the results of the experiment**
- How many users are there in each group?
- We have two control groups in the A/A test, where we check our mechanisms and calculations. See if there is a statistically significant difference between samples 246 and 247.
- Select the most popular event. In each of the control groups, find the number of users who performed this action. Find their share. Check whether the difference between the groups is statistically significant. Repeat the procedure for all other events (it will save time if you create a special function for this test). Can you confirm that the groups were split properly?
- Do the same thing for the group with altered fonts. Compare the results with those of each of the control groups for each event in isolation. Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?
- What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

## **Step 1. Open the data file and read the general information**

In [None]:
! pip install -qq plotly

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import pandas as pd
import random
import math as mth
import matplotlib
import numpy as np
import datetime as dt
import seaborn as sns
from scipy import stats
from random import sample
import plotly.express as px
from functools import reduce
from scipy import stats as st
import matplotlib.pyplot as plt
from itertools import chain,cycle
from plotly import graph_objects as go
from plotly.subplots import make_subplots

In [None]:
try:
    exp_data = pd.read_csv('/Users/doronerlich/Desktop/Code ME/Integrated Project 2/logs_exp_us.csv', sep='\t')
except:
    exp_data = pd.read_csv('/datasets/logs_exp_us.csv', sep='\t')

In [None]:
exp_data.info('deep')

In [None]:
exp_data.describe(include='all', datetime_is_numeric=True).fillna('N/A').T

<span style="color:green">**Notes from the analyst:** </span> 

- The dataset has 244,126 rows and 4 columns
- The column names has format issues and we will need to change data types
- At first look, we might not have missing values


In [None]:
def exploration(parameter):
    print("Unique values of '{}' column:".format(parameter))
    print()
    print(exp_data[parameter].unique())
    print()
    print("Value counts of '{}' column:".format(parameter))
    print()
    print(exp_data[parameter].value_counts())
    print()
    print('------------------------------------')
    print()

In [None]:
exploration('EventName')
exploration('DeviceIDHash')
exploration('EventTimestamp')
exploration('ExpId')

In [None]:
exp_data.head()

<span style="color:green">**Notes from the analyst:** </span> 

- We have 5 unique types of events
- Some users are very active comparing to others that only used the app once
- We do not have a timestamp that appears more than 9. times
- We have 3 experiment groups with different sizes

<h3 style="color:orange">Conlcusions:</h3> 

- There are 244,126 rows and 4 columns in the data frame
- Column names have upper and lowercase, and incorrect datatypes
- At first look, we might not have missing values
- There are 176,654 different timestamps;
- We have 3 experiment groups with different sizes:
    - 246, 247 and 248. They are not of equal size. 248 has the most events/actions, then 246 and then 247.



In [None]:
#change column names
exp_data = exp_data.rename(columns={'EventName':'event_name', 
                                    'DeviceIDHash':'user_id', 'EventTimestamp':'timestamp', 'ExpId':'exp_id'})

In [None]:
#check if any user appears in more than 1 experiment group
exp_data.groupby(['user_id']).agg({'exp_id': 'nunique'}).sort_values(by='exp_id', ascending=False
                                                                    ).query('exp_id > 1').reset_index()

### Change data types & adding datetime columns

In [None]:
exp_data["event_name"] = exp_data["event_name"].astype("category")


In [None]:
exp_data['logs'] = pd.to_datetime(exp_data['timestamp'], unit='s')

In [None]:
exp_data['log_date'] = exp_data['logs'].astype('datetime64[D]')


In [None]:
exp_data.head()

### Check for duplicates

In [None]:
#check for duplicates
exp_data.duplicated().sum()

In [None]:
#check sample of the duplicated ones
exp_data[exp_data.duplicated()].sort_values('user_id').head(10)

In [None]:
#drop duplicates
exp_data = exp_data.drop_duplicates().reset_index(drop=True)

In [None]:
exp_data.duplicated().sum()

In [None]:
exp_data.info('deep')

<h3 style="color:orange">Conlcusions:</h3> 

- Renamed columns for a better naming format
- Checked if we have users in more than one experient group -- there are none
- Changed datatype in `even_name` to category
- Added a new columnn: 
    - `logs` with each event's info
    - `log_date` for timestamp of the event
- Found 413 duplicates that we dropped from the dataset

We now working with a dataset of 243,713 rows and 6 columns

## **Step 3. Study and check the data**

- How many events are in the logs?
- How many users are in the logs?
- What's the average number of events per user?
- What period of time does the data cover? 
    - Maximum and the minimum date. 
    - Plot a histogram by date and time. 
        - Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?
- Did we lose many events and users when excluding the older data?
- Making sure we have users from all three experimental groups.

### Users/events info

In [None]:
#users/events info

user_logs = (exp_data.groupby('user_id')['logs'].count().reset_index()).groupby('logs')['user_id'].nunique().reset_index()
logs_per_user = exp_data.groupby('user_id')['logs'].count().reset_index()

fig = px.histogram(user_logs, x="logs", y='user_id', log_y=True, 
                   color_discrete_sequence=(px.colors.sequential.Rainbow), title='Distribution of Logs per User', nbins=100)
fig.update_layout(bargap=0.05,
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='# of Users',
        titlefont_size=16,
        tickfont_size=14), xaxis=dict(
        title="# of Logs",
        titlefont_size=16,
        tickfont_size=14))
fig.show()
print("""Number of logs in the Data Frame: {}.
Unique users in the Data Frame:  {} .
The distribution is skewed to the right, which means that the mode is often less than the median, 
which is less than the mean.
The most frequent number of logins is {}, the average number of logs per user is {:.2f}, the median 
is {}.
The first event in the Data Frame occurred at {}, and the latest occurred at 
{}.""".format(
    exp_data['user_id'].count(), exp_data['user_id'].nunique(), 
    int(logs_per_user['logs'].mode()), logs_per_user['logs'].mean(), int(logs_per_user['logs'].median()),
    exp_data['logs'].min(),exp_data['logs'].max()))

### Distribution events overtime

In [None]:
#distribution events/users between first and last 

fig = px.histogram(exp_data, x="logs", title='Histogram of Events by Date and Time', 
                   nbins=200, log_y=True, color_discrete_sequence=(px.colors.sequential.Rainbow))

fig.update_layout(bargap=0.2, 
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='# of Events',
        titlefont_size=16,
        tickfont_size=14), xaxis=dict(
        title="Event Date",
        titlefont_size=16,
        tickfont_size=14))
fig.show()

<span style="color:green">**Notes from the analyst:** </span> 

- First event took place on early July 25th, 2019
- Last event took 9:15pm on August 7th, 2019

- From the histogram we can we see the biggest amount of events happened beginning August 19th, 2019. We see general pattern and consistency. 

The time period that fully represents the data is from August 1st to August 7th 2019


### Filtering the data with only relevant time period

In [None]:
filtered_exp = exp_data.query('log_date >= "2019-08-01"')

In [None]:
filtered_exp.sort_values(by='logs').head()

### Reviewing data removed when filtering 

In [None]:
labels = ["Filtered Data", "Older Events"]
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=labels, values=[filtered_exp['user_id'].nunique(), exp_data['user_id'].nunique()-filtered_exp['user_id'].nunique()], name="Users"),
              1, 1)
fig.add_trace(go.Pie(labels=labels, values=[filtered_exp['logs'].count(), exp_data['logs'].count()-filtered_exp['logs'].count()], name="Events"),
              1, 2)
fig.update_traces(hole=.3, hoverinfo="label+value+percent+name", marker=dict(colors=(px.colors.sequential.Rainbow)))

fig.update_layout(coloraxis=dict(colorscale='Oranges'),
    title_text="Filtered data | Percentage of Data Removed",
    annotations=[dict(text='Users', x=0.18, y=0.5, font_size=18, showarrow=False),
                 dict(text='Events', x=0.82, y=0.5, font_size=18, showarrow=False)])

fig.show()

print("""Filtered data: the number of users and events removed is {} and {}, respectively. 
In the charts we can see that the percentage of the data removed is less than 0.3% of users and less than 1.2% of events.""".format(
    exp_data['user_id'].nunique()-filtered_exp['user_id'].nunique(), exp_data['logs'].count()-filtered_exp['logs'].count()))




### Distribution events/activities per group in filtered period

In [None]:
fig = px.histogram(filtered_exp, x="logs", color='exp_id',title='Histogram of Events by Date and Time for Filtered Data', 
                   nbins=200, color_discrete_sequence=px.colors.sequential.Pinkyl)
fig.update_layout(bargap=0.2, 
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Total # of Events',
        titlefont_size=16,
        tickfont_size=14), xaxis=dict(
        title="Event Date",
        titlefont_size=16,
        tickfont_size=14), 
        legend=dict(title="Experiment Group"))
fig.show()


<h3 style="color:orange">Conlcusions:</h3> 

After processing and analyzing the data in this first section we conclude:

- Number of events in the dataset: 243713
- 7551 unique users in the dataset
- Average number of events is 32.28
- First event took place on July 25th, 2019 04:43:36
- Last event took place on August 7th, 2019 21:15:17
- Biggest amount of events happened beginning August 19th, 2019. We see general pattern and consistency.


- We sliced the data with the events happening between August 1st, called it `filtered_exp`
- In this filtered new dataset, we removed 17 users and 2826 events, which corresponds to 0.3% and 1.2% of the dataset respectively
- We now work with a dataset of 240,887 rows
- The 3 groups are represented in the filtered data


## **Step 4. Study the event funnel**

- Event's frequency
- Unique users per event
- Order of events and event funnel
- Event funnel chart
- Event funnel chart per group


### Event's frequency

In [None]:
fig = px.bar(filtered_exp.groupby(['event_name'])['logs'].count().reset_index().sort_values(by='logs'), 
             x='logs', y='event_name', color_discrete_sequence=(px.colors.sequential.Agsunset_r,0), orientation='h', text='logs')
fig.update_layout(title='Number of Events per Event Name in Filtered Data',
    xaxis_tickfont_size=14,
    xaxis=dict(
        title='# of Events',
        titlefont_size=16,
        tickfont_size=14), yaxis=dict(
        title="Event Name",
        titlefont_size=16,
        tickfont_size=14), width=900, height=400)
fig.show()


<span style="color:green">**Notes from the analyst:** </span> 

- From the graph above, we can see our main event is "MainScreenAppear", which makes sense as it might be the first screen when opening the app. 

- The least frequent event is "Tutorial", maybe users need to actively click on it to show and that is less common 



### Unique users per event

In [None]:
prop = (filtered_exp.groupby('event_name')['user_id'].nunique()/filtered_exp.user_id.nunique()).round(2).reset_index().rename(columns={'user_id': 'proportion'})
prop_u = filtered_exp.groupby('event_name')['user_id'].nunique().reset_index()
prop = prop.merge(prop_u, on='event_name')
fig = px.bar(prop.sort_values(by='user_id'), 
             x='user_id', y='event_name', color_discrete_sequence=(px.colors.sequential.Agsunset_r, 5),
             hover_data=['proportion'], orientation='h', text='proportion')
fig.update_layout(title='Number of Unique Users per Event and Usage Proportion',
    xaxis_tickfont_size=14,
    xaxis=dict(
        title='# of Unique Users per Event',
        titlefont_size=16,
        tickfont_size=14), yaxis=dict(
        title="Event Name",
        titlefont_size=16,
        tickfont_size=14), width=900, height=400)
fig.show() 

<span style="color:green">**Notes from the analyst:** </span> 

- 98% of the users at least opened the main screen once. 
- 61% performed the action "OfferscreenAppear" 
- 47% of users got to "PaymentScreenSuccessful"
- Only 11% of users actually clicked on Tutorial. We might need to review what's included there and if the content is still relevant for the users. 



Based on the charts above, we can understand the order of the events:

1. Main Screen Appear -- opening the app
2. Offer Screen Appear -- the user might check if there are any offers available or discounts
3. Cart Screen Appear -- user added products to the cart
4. Payment Screen Successful -- user did checkout and complete purchase
5. Tutorial might be something related with the app and might be in the wrong place as users maybe click on it if they do not know how to perform X action. As said before, we need to review the content on this section and adjust it or delete it. 

### Event's funnel

In [None]:
prop_f = prop.drop(4)
fig = go.Figure(go.Funnel(
    y = prop_f.sort_values(by = 'user_id', ascending = False)['event_name'],
    x = prop_f.sort_values(by = 'user_id', ascending = False)['user_id'],
    textinfo = "value + percent previous + percent initial"))
fig.update_traces(marker=dict(color=(px.colors.sequential.Agsunset_r)))
fig.update_layout(
    title_text="Filtered Data | Event Funnel", width=1000, height=500,plot_bgcolor='white', paper_bgcolor='white')
fig.show()

In [None]:
funnel = filtered_exp.query('event_name != "Tutorial"').pivot_table(index='event_name',values = 'user_id',aggfunc='nunique').sort_values(by='user_id', ascending=False).reset_index().drop(4)
funnel['percentage_from_previous']=((funnel['user_id'].pct_change().fillna(1))*100).round(2).astype(str) + '%'
funnel['percentage_from_initial'] = ((funnel['user_id'].pct_change().cumsum().fillna(1))*100).round(2).astype(str) + '%'

funnel

<span style="color:green">**Notes from the analyst:** </span> 

- Thanks to the event funnel is easier to understand the action path since the opening of the app to the purchase stage
- Only 50% of users actually get to their cart, which is very low retetion in the flow. 
- But 81% of the user who view offers, get to the cart 
- And 95% of them actually perform a purchase, which is good, so we might need to improve the flow before to retain more users from the offers to the cart screen


### Event funnel chart per group

In [None]:
prop_e = filtered_exp.groupby(['event_name', 'exp_id'])['user_id'].nunique().reset_index().merge((filtered_exp.groupby(['event_name', 'exp_id'])['user_id'].nunique()/filtered_exp.user_id.nunique()).round(2).reset_index().rename(columns={'user_id': 'proportion'}), on=['event_name', 'exp_id']).drop([12,13,14])

fig = go.Figure()

fig.add_trace(go.Funnel(
    name = '246',
    y = prop_e.query('exp_id == 246').sort_values(by = 'user_id', ascending = False)['event_name'],
    x = prop_e.query('exp_id == 246').sort_values(by = 'user_id', ascending = False)['user_id'],
    textinfo = "value + percent previous + percent initial"))

fig.add_trace(go.Funnel(
    name = '247',
    y = prop_e.query('exp_id == 247').sort_values(by = 'user_id', ascending = False)['event_name'],
    x = prop_e.query('exp_id == 247').sort_values(by = 'user_id', ascending = False)['user_id'],
    textinfo = "value + percent previous + percent initial"))

fig.add_trace(go.Funnel(
    name = '248',
    y = prop_e.query('exp_id == 248').sort_values(by = 'user_id', ascending = False)['event_name'],
    x = prop_e.query('exp_id == 248').sort_values(by = 'user_id', ascending = False)['user_id'],
    textinfo = "value + percent previous + percent initial"))

fig.update_traces(marker=dict(color=(px.colors.sequential.Agsunset_r)))
fig.update_layout(
    title_text="Filtered Data | Event Funnel per Exp. Group", width=1100, height=600, 
    legend=dict(title='Groups:', orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=1), 
    plot_bgcolor='white', paper_bgcolor='white')

fig.show();


<span style="color:green">**Notes from the analyst:** </span> 

- If we look at the funnel per group we can see the 3 groups have very similar trends

<h3 style="color:orange">Conlcusions:</h3> 

**In this section we conclude:**

- From the users who opened the Main Screen, 98% of them did it at least once
- 61% performed the action "OfferscreenAppear"
- 47% of users got to "PaymentScreenSuccessful"
- Only 11% of users actually clicked on Tutorial. We might need to review what's included there and if the content is still relevant for the users.

Based on the charts above, we can understand the order of the events:

1. Main Screen Appear -- opening the app
2. Offer Screen Appear -- the user might check if there are any offers available or discounts
3. Cart Screen Appear -- user added products to the cart
4. Payment Screen Successful -- user did checkout and complete purchase

*Tutorial might be something related with the app and might be in the wrong place as users maybe click on it if they do not know how to perform X action. As said before, we need to review the content on this section and adjust it or delete it.*


Thanks to the event funnel is easier to understand the action path since the opening of the app to the purchase stage
- Only 50% of users actually get to their cart, which is very low retetion in the flow.
- But 81% of the user who view offers, get to the cart
- And 95% of them actually perform a purchase, which is good, so we might need to improve the flow before to retain more users from the offers to the cart screen

- The 3 groups have very similar trends



## **Step 5. Study the results of the experiment**

### Users per group and login distribution

In [None]:
fig = px.bar(filtered_exp.groupby(['exp_id'])['user_id'].nunique(), 
             y='user_id', color_discrete_sequence=(px.colors.sequential.Agsunset,3), text='user_id')
fig.update_layout(title='Number of Users per Experiment Group in Filtered Data',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Number of Users',
        titlefont_size=16,
        tickfont_size=14), xaxis=dict(
        title="Experiment Group",
        titlefont_size=16,
        tickfont_size=14, tickmode = 'array', tickvals= filtered_exp.exp_id, ticktext = filtered_exp.exp_id))
fig.show() 

<span style="color:green">**Notes from the analyst:** </span> 

- The groups are evenly distributed:

    - **248** has the most users (2,537)
    - **247** with 2,513
    - **246** with 2,484 users.



In [None]:
fig = px.scatter(filtered_exp.groupby(['event_name', 'user_id', 'exp_id'])['logs'].count().reset_index(), y='logs', 
x='event_name', size='exp_id', color="user_id", symbol='event_name',color_continuous_scale=(px.colors.sequential.Agsunset))
fig.update_coloraxes(showscale=False)
fig.update_layout(title='Distribution of Logs per User per Event',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Number of Logs',
        titlefont_size=16,
        tickfont_size=14), xaxis=dict(
        title="Event Name",
        titlefont_size=16,
        tickfont_size=14))
fig.show()

<span style="color:green">**Notes from the analyst:** </span> 

- We see some outliers on the cart and successfull payment screens, meaning we have users who buy a lot 


In [None]:
exp_piv = filtered_exp.pivot_table(columns="exp_id", index="event_name", values='user_id', aggfunc='nunique').append(filtered_exp.pivot_table(columns="exp_id", values='user_id', aggfunc='nunique')).reset_index().rename(columns={'index': 'event'})
exp_piv['control_groups'] = exp_piv[246].iloc[:]+exp_piv[247].iloc[:]
exp_piv

<span style="color:green">**Notes from the analyst:** </span> 

- We created a table above to see the overall number of users per group and the number of users on each event


### A/A test & A/B test

- We have two control groups in the A/A test, where we check our mechanisms and calculations. See if there is a statistically significant difference between samples 246 and 247.

Groups **246** and **247** are the control groups, will be used for the A/A test. 

**A/A tests** are designed to help examine the reliability of the A/B testing tool used to run the A/B tests. The concept is that these two groups have the same exact experience:
   - Two groups of users that are provided with absolutely identical high traffic web pages;
   - Similar user experience for both of these groups 
   - It is expected that KPI (Key Performance Indicator) will also be the same for both groups.

**Null Hypothesis (H0) that group 246's proportion is equal to that of group 247. If rejected, the Alternative Hypothesis (H1) is that group 246's proportion is NOT equal to that of group 247.**



After that, we will perform **A/B test** comparing each control group (**246** & **247**) to the test group: **248**

* **Null Hypothesis (H0) means that either control group's proportion is equal to that of the test group. If rejected, the Alternative Hypothesis (H1) is that their proportions are NOT equal.**

Both the A/A test and the A/B tests will be per event name. In the final A/B test, we will compare both control groups to the test group.


<span style="color:purple">**Actions taken:** </span>
- Select the most popular event. In each of the control groups, find the number of users who performed this action. Find their share. Check whether the difference between the groups is statistically significant. Repeat the procedure for all other events (it will save time if you create a special function for this test). Can you confirm that the groups were split properly?
- Do the same thing for the group with altered fonts. Compare the results with those of each of the control groups for each event in isolation. Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?
- What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions

In [None]:
alpha = 0.05 

def test_aab(a,b,e):
    logs_a = exp_piv[exp_piv.event == "user_id"][a].iloc[0]
    logs_b = exp_piv[exp_piv.event == "user_id"][b].iloc[0]

    event_a = exp_piv[exp_piv.event == e][a].iloc[0]
    event_b = exp_piv[exp_piv.event == e][b].iloc[0]

    p1 = event_a/logs_a

    p2 = event_b/logs_b

    p_combined = (event_a + event_b) / (logs_a + logs_b)

    difference = p1 - p2 

    z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/logs_a + 1/logs_b))

    distr = st.norm(0, 1)  

    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print("""Share of users from group {} with {} event is {:.2%}.""".format(a, e, (event_a/logs_a)))
    print("""Share of users from group {} with {} event is {:.2%}.""".format(b, e, (event_b/logs_b)))
    print('p-value: {:.3f}'.format(p_value))

    if (p_value < alpha):
        print("""Conclusion: 
Reject the Null Hypothesis (H0) related to event {} using significance level (alpha) of 0.05.""".format(e))
    else:
        print("""Conclusion: 
We are unable reject the Null Hypothesis (H0) in regards to event {} using significance level 
(alpha) of 0.05.""".format(e)) 
        

#### A/A Test - Statistical Significance of the Difference between Control Groups


In [None]:
events = ["OffersScreenAppear", "CartScreenAppear", "PaymentScreenSuccessful", "Tutorial"]

for e in events:
    test_aab(246, 247, e)
    print()
    print('-----------')
    print()

<span style="color:orange">**Conlcusions:**</span> 

- p-value is higher than the alpha (0.05) = We can't reject the null hypothesis that the behavior of either the two control groups is equal in the all of the events in the event funnel.



#### A/B Test - Statistical Significance of the Difference between Control Group 246 and Test Group 248

In [None]:
events = ["OffersScreenAppear", "CartScreenAppear", "PaymentScreenSuccessful", "Tutorial"]

for e in events:
    test_aab(246, 248, e)
    print()
    print('-----------')
    print()

<span style="color:orange">**Conlcusions:**</span> 

- p-value is higher than the alpha (0.05) = We can't reject the null hypothesis that the behavior of either the control group and the test group is equal in the all of the events in the event funnel.



#### A/B Test - Statistical Significance of the Difference between Control Group 247 and Test Group 248

In [None]:
events = ["OffersScreenAppear", "CartScreenAppear", "PaymentScreenSuccessful", "Tutorial"]

for e in events:
    test_aab(247, 248, e)
    print()
    print('-----------')
    print()

<span style="color:orange">**Conlcusions:**</span> 

- p-value is higher than the alpha (0.05) = We can't reject the null hypothesis that the behavior of either the control group and the test group is equal in the all of the events in the event funnel.



### A/B Test Comparing Both Control Groups and The Test Group

In [None]:
events = ["OffersScreenAppear", "CartScreenAppear", "PaymentScreenSuccessful", "Tutorial"]

for e in events:
    test_aab("control_groups", 248, e)
    print()
    print('-----------')
    print()

<span style="color:orange">**Conlcusions:**</span> 

- p-value is higher than the alpha (0.05) = We can't reject the null hypothesis that the behavior of either the two control groups or either control group and the test group is equal in the all of the events (other than the "MainScreenAppear" event). In every event, the combined control group had higher share than the test group, even though the test group has the highest number of users of the three groups



<h3 style="color:orange">Conlcusions:</h3> 

In this section:

- The groups are evenly distributed:
    - 248 has the most users (2,537)
    - 247 with 2,513
    - 246 with 2,484 users.
    
- We see some outliers on the cart and successfull payment screens, meaning we have users who buy a lot 
- We created a table to see the overall number of users per group and the number of users on each event 


- We have 2 control groups: **246** & **247**
- Performed A/A test  and A/B test
- The function applies z-test to check two independents data groups and decide on whether the sample are equals or not. Applied the function to each event (other than the "MainScreenAppear" event), first comparing the two control groups to each other in the A/A test, then 2 A/B tests comparing each control group to the test group and finally combining the control groups and comparing it to the test group.

**A/A and A/B tests Conclusions:**

- The control group had the highest share of the 3 on each event page. 
- The z-test took place 16 times and each one we were unable to reject the null hypothesis that there is no difference in the behavior of either the control groups and the test one. 
- Having so many test running might increase the risk of a Type I error (a false positive result of a statistical test) but decreasing the alpha won't have any bearing on the tests' results.



## Conclusions


- Behavior of users with the new font suggested by the designers is similar to those users of the control group. We can conclude that altering the font does not have effect on the user's behavior and do not stop them from using the app and completing a purchase

- What we discovered, unrelated to the font change is that the flow from the offers to check the cart and then make a purchase should be improve to retain more users between the offers and checking the cart, so we can increase our sales from the app. 