# Step 1. Open the data file and read the general information

## Project description
- Our mission is to investigate user behavior for the company's app. 
- The designers would like to change the fonts for the entire app, but the managers are afraid the users might find the new design intimidating. 
- We will conduct an A/A/B test.
- The managers will decide  to change the fonts for the entire app based on the results from our experiment.

## Import

In [None]:
!pip install plotly-express

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats as st
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats as st
import datetime as dt
import math as mth

%matplotlib inline

## Load data

In [None]:
try:
    logs = pd.read_csv('logs_exp_us.csv', sep="\s+")
except:
    logs = pd.read_csv('/datasets/logs_exp_us.csv', sep="\s+")

# Step 2. Prepare the data for analysis


## Preview and info

- EventName — event name
- DeviceIDHash — unique user identifier
- EventTimestamp — event time
- ExpId — experiment number: 246 and 247 are the control groups, 248 is the test group

In [None]:
logs.info()

no missing values

In [None]:
logs.head()

## Correct and prepare data 

make all columns lower with semicolumn

In [None]:
# make all columns lower 
logs.columns= logs.columns.str.lower()
logs.columns

In [None]:
# add semicolumn
logs.rename(columns={'eventname': 'event_name',
                 'deviceidhash': 'device_id_hash',
                 'eventtimestamp': 'event_timestamp',
                 'expid': 'exp_id'},
               inplace=True)
logs.columns

check event_name column 

In [None]:
logs['event_name'].value_counts()

looks ok. no need to correct

check exp_id column 

In [None]:
logs['exp_id'].value_counts()

rename values in exp_id column to A1, A2 and B

In [None]:
logs['exp_id'].replace({
    246: 'A1',
    247: 'A2',
    248: 'B'
}, inplace=True)
logs['exp_id'].value_counts()

In [None]:
# Change experiment id to categorical 
logs['exp_id'] = logs['exp_id'].astype('category')

Add a date and time column and a separate column for dates

In [None]:
logs['time_and_date']=pd.to_datetime(logs['event_timestamp'], unit='s')
logs['date'] = logs['time_and_date'].dt.date
logs['time'] = logs['time_and_date'].dt.time
# logs.drop(['time_and_date'], axis=1, inplace=True)
logs.head()

In [None]:
raw_logs = logs.copy(deep=True)

# Step 3. Study and check the data

## How many events are in the logs?

In [None]:
number_of_logs = logs.shape[0]
print(f'There are {number_of_logs} events are in the logs')

## How many users are in the logs?


In [None]:
number_of_users = len(logs['device_id_hash'].unique())
print(f'There are {number_of_users} users are in the logs')

## What's the average number of events per user?


In [None]:
average_number_of_events = round(np.mean(logs.groupby('device_id_hash')['event_timestamp'].count()))
print('the average number of events per user is', average_number_of_events)

In [None]:
dt.datetime(2020, 5, 17)

In [None]:
x= dt.datetime(2020, 5, 17)
x

In [None]:
dt.datetime.now()

## What period of time does the data cover? Find the maximum and the minimum date. Plot a histogram by date and time. Can you be sure that you have equally complete data for the entire period? Older events could end up in some users' logs for technical reasons, and this could skew the overall picture. Find the moment at which the data starts to be complete and ignore the earlier section. What period does the data actually represent?

In [None]:
fig = px.histogram(logs, x="time_and_date")
fig.show()

In [None]:
logs[(logs['date']<dt.date(2019,8,1))]

In this duration that is about an half of the time span we have there are about 1% of the events. This is very little and we can see in the histogram how distinguish it is. The data starts to be complete in the beginning of August. We can assume only then the experiment officially lunched.  

In [None]:
# leave only results from August
logs = logs[(logs['date']>=dt.date(2019,8,1))]
fig = px.histogram(logs, x="time_and_date")
fig.show()

Did you lose many events and users when excluding the older data?
Make sure you have users from all three experimental groups.

In [None]:
logs['exp_id'].value_counts()

looks ok

Let's find number of events at raw data, number of events at new data and the difference

In [None]:
print('The number of events at raw data is:', raw_logs.shape[0])
print('The number of events at new data is:', logs.shape[0])
print(f'The difference in events between new data and raw data is:\
 {round(1-logs.shape[0]/raw_logs.shape[0], 3)*100}%')

Let's find number of users in the raw data, number of users at new data and the difference

In [None]:
print('The number of users at raw data is:', raw_logs['device_id_hash'].nunique())
print('The number of users at new data is:', logs['device_id_hash'].nunique())
devison = logs['device_id_hash'].nunique()/raw_logs['device_id_hash'].nunique()
print(f'The difference in users between new data and raw data is: {round(1-devison, 3)*100}%')

We will compare the events in the groups before and after the removal of logs from July. 

In [None]:
raw_A1_A2_B = raw_logs['exp_id'].value_counts().reset_index()
raw_A1_A2_B.columns = ['group', 'events_count_raw']
new_A1_A2_B = logs['exp_id'].value_counts().reset_index()
new_A1_A2_B.columns = ['group', 'events_count_new']

In [None]:
raw_A1_A2_B

In [None]:
new_A1_A2_B

In [None]:
A1_A2_B = raw_A1_A2_B.merge(new_A1_A2_B, on='group')

In [None]:
A1_A2_B['difference, %'] =  A1_A2_B.apply(
    lambda x: round(x['events_count_new']/x['events_count_raw'], 2), axis=1)
A1_A2_B

For conclusion here we see that the removal of logs from July almost didn't change the users in groups and the amount of events and users in total.

# Step 4. Study the event funnel

## See what events are in the logs and their frequency of occurrence. Sort them by frequency.

In [None]:
events_logs = logs.groupby(
    'event_name'
)['time_and_date'].count().reset_index().sort_values(by='time_and_date', ascending=False)
events_logs.columns = ['event_name', 'frequency']

In [None]:
events_logs

Also lets see the frequency of each event in the histogram

In [None]:
x0 = logs[logs['event_name'] == 'Tutorial']['time_and_date']
x1 = logs[logs['event_name'] == 'MainScreenAppear']['time_and_date']
x2 = logs[logs['event_name'] == 'OffersScreenAppear']['time_and_date']
x3 = logs[logs['event_name'] == 'CartScreenAppear']['time_and_date']
x4 = logs[logs['event_name'] == 'PaymentScreenSuccessful']['time_and_date']



In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(x=x0, name='Tutorial'))
fig.add_trace(go.Histogram(x=x1, name='MainScreenAppear'))
fig.add_trace(go.Histogram(x=x2, name='OffersScreenAppear'))
fig.add_trace(go.Histogram(x=x3, name='CartScreenAppear'))
fig.add_trace(go.Histogram(x=x4, name='PaymentScreenSuccessful'))
# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()

We can see that during most of the day the "MainScreenAppear" is the most frequent event. The "CartScreenAppear" becomes more frequent in the noon- afternoon time.

## Find the number of users who performed each of these actions. Sort the events by the number of users. Calculate the proportion of users who performed the action at least once.

In [None]:
number_of_users = logs.groupby(
    'event_name'
)['device_id_hash'].nunique().reset_index().sort_values(by='device_id_hash', ascending=False)
number_of_users.columns = ['event_name', 'number_of_users']
sum_number_of_users = number_of_users['number_of_users'].sum()
number_of_users['proportion_of_users'] = round(number_of_users['number_of_users']/sum_number_of_users*100,1)
number_of_users

Only 4.2% of users watched the tutorial

## In what order do you think the actions took place. Are all of them part of a single sequence? You don't need to take them into account when calculating the funnel.

Lets see the total amount of users who loged because this number was changed since we removed results from July

In [None]:
number_of_users = len(logs['device_id_hash'].unique())
print(f'There are {number_of_users} users are in the logs')

From the proportion of users in each event we can assume that the sequence was as follows: 
MainScreenAppear > OffersScreenAppear > CartScreenAppear > PaymentScreenSuccessful
Not all of them part of a funnel. The Tutorial is not a part of the funnel.

## Use the event funnel to find the share of users that proceed from each stage to the next. (For instance, for the sequence of events A → B → C, calculate the ratio of users at stage B to the number of users at stage A and the ratio of users at stage C to the number at stage B.)

In [None]:
users = logs.pivot_table(
    index='device_id_hash',
    columns='event_name',
    values='time_and_date',
    aggfunc='min'
)

In [None]:
step_1 = ~users['MainScreenAppear'].isna()
step_2 = step_1 & (users['OffersScreenAppear'] > users['MainScreenAppear'])
step_3 = step_2 & (users['CartScreenAppear'] > users['OffersScreenAppear'])
step_4 = step_3 & (users['PaymentScreenSuccessful'] > users['CartScreenAppear'])

In [None]:
step_1

In [None]:
users

In [None]:
n_MainScreenAppear = users[step_1].shape[0]
n_OffersScreenAppear = users[step_2].shape[0]
n_CartScreenAppear = users[step_3].shape[0]
n_PaymentScreenSuccessful = users[step_4].shape[0]

In [None]:
fig = go.Figure(go.Funnel(
    y = ["Visitors", "Selected a product", "Started checkout", "Paid"],
    x = [n_MainScreenAppear, n_OffersScreenAppear, n_CartScreenAppear, n_PaymentScreenSuccessful],
    textposition = "inside",
    textinfo = "value+percent initial",
    opacity = 0.65, marker = {"color": ["deepskyblue", "lightsalmon", "tan", "teal", "silver"],
    "line": {"width": [4, 2, 2, 3, 1, 1], "color": ["wheat", "wheat", "wheat", "wheat", "wheat"]}},
    connector = {"line": {"color": "royalblue", "dash": "dot", "width": 3}})
    )

fig.show()

## At what stage do you lose the most users?

The stage where most user leave is from visitors to select a product. The largest share of users from the previous stage that leave is from start checkout to payment

## What share of users make the entire journey from their first event to payment?

In [None]:
print(
    f'The share of users make the entire journey from their first event to payment is {round(n_PaymentScreenSuccessful/n_MainScreenAppear*100)}%')

# Step 5. Study the results of the experiment

## How many users are there in each group?

In [None]:
users_in_groups = logs.groupby('exp_id')['device_id_hash'].nunique()
users_in_groups

Create group that combines group A1 and A2 and name it A - This will be useful if we will decide in the future to combine results for the control groups

In [None]:
logs_A1_A2 = logs[(logs['exp_id'] == 'A1') | (logs['exp_id'] == 'A2')]
logs_A1_A2["exp_id"].replace({"A1": "A", "A2": "A"}, inplace=True)
logs_A1_A2
frames = [logs, logs_A1_A2]
logs = pd.concat(frames)
users_in_groups = logs.groupby('exp_id')['device_id_hash'].nunique()
users_in_groups


## We have two control groups in the A/A test, where we check our mechanisms and calculations. See if there is a statistically significant difference between samples 246 and 247.

We changed the names 246 is A1 and 247 is A2

We See if there is a statistically significant difference between samples A1 and A2 in next section

## Select the most popular event. In each of the control groups, find the number of users who performed this action. Find their share. Check whether the difference between the groups is statistically significant. Repeat the procedure for all other events (it will save time if you create a special function for this test). Can you confirm that the groups were split properly?

Create a class for groups to calculate the number of users who performed any action

In [None]:
class Group:
    """ Study group class """
    def __init__(self, name, source_df):
        self.name = name
        self.df = source_df[source_df['exp_id'] == self.name]
        self.users_pivot = self.users_pivot()
        self.n_MainScreenAppear = self.step_1_users_count()
        self.n_OffersScreenAppear = self.step_2_users_count()
        self.n_CartScreenAppear = self.step_3_users_count()
        self.n_PaymentScreenSuccessful = self.step_4_users_count()
        self.n_Tutorial = self.tutorial_users_count()
        self.step_count_list = [self.n_MainScreenAppear,
                                self.n_OffersScreenAppear,
                                self.n_CartScreenAppear,
                                self.n_PaymentScreenSuccessful] 
            
    def users_pivot(self):
        my_pivot = self.df.pivot_table(
        index='device_id_hash',
        columns='event_name',
        values='time_and_date',
        aggfunc='min'
        ) 
        return my_pivot
    
    def step_1_users_count(self):
        step_1 = ~self.users_pivot['MainScreenAppear'].isna()
        count = self.users_pivot[step_1].shape[0]
        return count
    
    def step_1_users_count(self):
        self.step_1 = ~self.users_pivot['MainScreenAppear'].isna()
        count = self.users_pivot[self.step_1].shape[0]
        return count
    
    def step_2_users_count(self):
        self.step_2 = self.step_1 & (
            self.users_pivot['OffersScreenAppear'] > self.users_pivot['MainScreenAppear']
        )
        count = self.users_pivot[self.step_2].shape[0]
        return count
    
    def step_3_users_count(self):
        self.step_3 = self.step_2 & (
            self.users_pivot['CartScreenAppear'] > self.users_pivot['OffersScreenAppear']
        )
        count = self.users_pivot[self.step_3].shape[0]
        return count
    
    def step_4_users_count(self):
        self.step_4 = self.step_3 & (
            self.users_pivot['PaymentScreenSuccessful'] > self.users_pivot['CartScreenAppear']
        )
        count = self.users_pivot[self.step_4].shape[0]
        return count
    
    def tutorial_users_count(self):
        self.tutorial = ~self.users_pivot['Tutorial'].isna()
        count = self.users_pivot[self.tutorial].shape[0]
        return count

Calculate groups lists of users in steps

In [None]:
group_A1 = Group('A1', logs)
group_A1_list = group_A1.step_count_list

In [None]:
group_A1_list

In [None]:
group_A2 = Group('A2', logs)
group_A2_list = group_A2.step_count_list

In [None]:
group_A2_list

Create a class to conduct proportion test between two groups

In [None]:
class StatisticTestForTwoGroups:
    """ Conduct test for two groups """
    def __init__(self, group_1_step_list, group_2_step_list, alpha):
        self.group_1_step_list = group_1_step_list
        self.group_2_step_list = group_2_step_list
        self.alpha = alpha
        self.pvalue_list = self.calculate_pvalue()
        if min(self.pvalue_list) < self.alpha:
            text = 'Rejecting the null hypothesis:\
 there is a significant difference between the proportions'
        else:
            text = "Failed to reject the null hypothesis:\
 there is no reason to consider the proportions different"
        self.text = text
        
    def calculate_pvalue(self):
        pvalue_list =list()
        for i in range(3):
#             before = np.array([self.group_1_step_list[i], self.group_2_step_list[i]])
            # compare to the first step
            before = np.array([self.group_1_step_list[0], self.group_2_step_list[0]]) 
            after = np.array([self.group_1_step_list[i+1], self.group_2_step_list[i+1]])
            
            # selected proportion in the first group:
            p1 = after[0]/before[0]
            
            # selected proportion in the second group:
            p2 = after[1]/before[1]
            
            # selected proportion in the combined dataset:
            p_combined = (after[0] + after[1]) / (before[0] + before[1])
            
            # the difference between the datasets' proportions
            difference = p1 - p2 
            
            # calculating the statistic in standard deviations of the standard normal distribution
            z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/before[0] + 1/before[1]))
            
            # setting up the standard normal distribution (mean 0, standard deviation 1)
            distr = st.norm(0, 1)
            
            p_value = round((1 - distr.cdf(abs(z_value))) * 2,3)
            
            pvalue_list.append(p_value)
        
        return pvalue_list

Check whether the difference between the groups is statistically significant for all events.
The statistical test will be:

- H0: proportions in group A1 == proportions in group A2
- H1: proportions in group A1 != proportions in group A2
- Significance level: 0.05

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.05
- Else: We can't reject H0 using significance level 0.05

In [None]:
group_A1_A2_statistic = StatisticTestForTwoGroups(group_A1_list, group_A2_list, 0.05)

In [None]:
group_A1_A2_statistic.pvalue_list

In [None]:
group_A1_A2_statistic.text

The conclusion is that there is that the split to groups done right. Group A1 and group A2 was splitted with no fault and therefore we can proceed to the A B testing since all the difference between the groups will be change in the font.  

## Do the same thing for the group with altered fonts. Compare the results with those of each of the control groups for each event in isolation. Compare the results with the combined results for the control groups. What conclusions can you draw from the experiment?

## Group A1 and group B

Check whether the difference between the groups A1 and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A1 == proportions in group B
- H1: proportions in group A1 != proportions in group B
- Significance level: 0.05

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.05
- Else: We can't reject H0 using significance level 0.05

Calculate group lists of users in steps

In [None]:
group_B = Group('B', logs)
group_B_list = group_B.step_count_list
group_B_list

In [None]:
group_A1_B_statistic = StatisticTestForTwoGroups(group_A1_list, group_B_list, 0.05)
group_A1_B_statistic.pvalue_list

In [None]:
group_A1_B_statistic.text

## Group A2 and group B

Check whether the difference between the groups A2 and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A2 == proportions in group B
- H1: proportions in group A2 != proportions in group B
- Significance level: 0.05

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.05
- Else: We can't reject H0 using significance level 0.05

In [None]:
group_A2_B_statistic = StatisticTestForTwoGroups(group_A2_list, group_B_list, 0.05)
group_A2_B_statistic.pvalue_list

In [None]:
group_A2_B_statistic.text

## Group A (combined A1 and A2) and group B

Check whether the difference between the groups A and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A == proportions in group B
- H1: proportions in group A != proportions in group B
- Significance level: 0.05

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.05
- Else: We can't reject H0 using significance level 0.05

In [None]:
group_A = Group('A', logs)
group_A_list = group_A.step_count_list
group_A_list

In [None]:
group_A_B_statistic = StatisticTestForTwoGroups(group_A_list, group_B_list, 0.05)
group_A_B_statistic.pvalue_list

In [None]:
group_A_B_statistic.text

## What significance level have you set to test the statistical hypotheses mentioned above? Calculate how many statistical hypothesis tests you carried out. With a statistical significance level of 0.1, one in 10 results could be false. What should the significance level be? If you want to change it, run through the previous steps again and check your conclusions.

- In all tests we failed to reject the null hypothesis. So decreasing alpha will not help to catch false positive because We never rejected the null hypothesis. 
- We compared actually only the minimum value of each series of 3 p-value. Since the question is reject or accept. If one of the results was below alpha we would reject the null hypothesis for the comparison between the groups because we only need one instance of difference in proportion
- Anyway we will demonstrate the correction by applying Bonferroni correction. We used alpha of 0.05 and the test included comparison of group A1 and B, group A2 and B and group A and B. In each we had 3 tests so it is 3x3 = 9 overall. Then the corrected alpha will be 0.05/12 = 0.004


## Group A1 and group B

Check whether the difference between the groups A1 and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A1 == proportions in group B

- H1: proportions in group A1 != proportions in group B

- Significance level: 0.004

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.004

- Else: We can't reject H0 using significance level 0.004

In [None]:
group_A1_B_statistic = StatisticTestForTwoGroups(group_A1_list, group_B_list, 0.004)
group_A1_B_statistic.pvalue_list
group_A1_B_statistic.text

## Group A2 and group B

Check whether the difference between the groups A2 and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A2 == proportions in group B

- H1: proportions in group A2 != proportions in group B

- Significance level: 0.004

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.004

- Else: We can't reject H0 using significance level 0.004

In [None]:
group_A2_B_statistic = StatisticTestForTwoGroups(group_A2_list, group_B_list, 0.004)
group_A2_B_statistic.pvalue_list
group_A2_B_statistic.text

## Group A and group B

Check whether the difference between the groups A and B is statistically significant for all events. The statistical test will be:

- H0: proportions in group A == proportions in group B

- H1: proportions in group A != proportions in group B

- Significance level: 0.004

- If the minimum of all p-values < alpha: We can reject H0 using significance level 0.004

- Else: We can't reject H0 using significance level 0.004

In [None]:
group_A_B_statistic = StatisticTestForTwoGroups(group_A_list, group_B_list, 0.004)
group_A_B_statistic.pvalue_list
group_A_B_statistic.text

In all cases we failed to reject the null hypothesis

# Conclusion
- As we predicted, decreasing alpha according to Bonferroni correction did not change our results. 
- The change in fonts didn't cause any significant change in users behavior.
- By performing A/A test we have more confident that the split of groups didn't cause any faults and all the difference is the change in set of fonts in the app