# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are need it to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [99]:
# your-code
import pandas as pd
from statsmodels.stats.weightstats import ztest
from scipy import stats

import seaborn as sns
import matplotlib.pylab as plt

In [8]:
a_pt = pd.read_csv('./data/activity_pretest.csv')

In [63]:
#how many activity levels in the dataset and how many users per activity_level

activity_levels = a_pt['activity_level'].unique()
activity_levels.sort()
users = a_pt['userid'].nunique()


print('Number of activity levels :',len(activity_levels),
     '\nNumber of users : ', users)
u_x_a = pd.DataFrame([(a, a_pt[a_pt['activity_level'] == a]['userid'].nunique()) for a in activity_levels],
                       columns=['activity_level', 'users'])

u_x_a.head(5)

Number of activity levels : 21 
Number of users :  60000


Unnamed: 0,activity_level,users
0,0,60000
1,1,33688
2,2,33761
3,3,33634
4,4,33502


In [44]:
#how many activity levels per day

days = a_pt['dt'].unique()
days.sort()

al_x_day = pd.DataFrame([(d, a_pt[a_pt['dt'] == d]['activity_level'].nunique()) for d in days],
                       columns=['day', 'activity_levels'])

al_x_day.head(5)

Unnamed: 0,day,activity_levels
0,2021-10-01,21
1,2021-10-02,21
2,2021-10-03,21
3,2021-10-04,21
4,2021-10-05,21


In [45]:
#how many records per activity level

r_x_al = pd.DataFrame([(a, a_pt[a_pt['activity_level'] == a]['activity_level'].count()) for a in activity_levels],
                       columns=['activity_level', 'users'])
r_x_al.head(5)

Unnamed: 0,activity_level,users
0,0,909125
1,1,48732
2,2,49074
3,3,48659
4,4,48556


In [68]:
#how many records per activity level per day (mean)

a_pt.head()

a_pt.groupby(['dt', 'activity_level']).size().groupby(level=1).mean()

activity_level
0     29326.612903
1      1572.000000
2      1583.032258
3      1569.645161
4      1566.322581
5      1587.967742
6      1577.451613
7      1559.322581
8      1561.161290
9      1574.838710
10     1578.806452
11     1575.225806
12     1577.774194
13     1565.612903
14     1568.387097
15     1567.709677
16     1578.516129
17     1561.129032
18     1580.064516
19     1577.451613
20      790.967742
dtype: float64

### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testing.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [62]:
# your-code
active_df = a_pt[a_pt['activity_level'] > 0].reset_index(drop=True)
active_df.head(5)

Unnamed: 0,userid,dt,activity_level
0,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1
1,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1
2,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1
3,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,1
4,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1


In [116]:
#how many unique users per day (mean)
dau = active_df.groupby(['dt']).size().reset_index(name="DAU")
dau_1 = dau['DAU'].mean()

print('Average DAU:',dau_1)
dau.head(5)

Average DAU: 30673.387096774193


Unnamed: 0,dt,DAU
0,2021-10-01,30634
1,2021-10-02,30775
2,2021-10-03,30785
3,2021-10-04,30599
4,2021-10-05,30588


### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [90]:
# your-code
ctr_pt = pd.read_csv('./data/ctr_pretest.csv')
ctr_pt.head(5)

Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04


In [117]:
#CTR per day and total

ctr_x_day = ctr_pt.groupby('dt')['ctr'].mean().reset_index(name='ctr')
ctr_1 = ctr_x_day['ctr'].mean()

print('Average CTR:', ctr_1)
ctr_x_day.head(5)

Average CTR: 33.00024304382363


Unnamed: 0,dt,ctr
0,2021-10-01,32.993446
1,2021-10-02,32.991664
2,2021-10-03,32.995086
3,2021-10-04,32.992995
4,2021-10-05,33.004375


---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [103]:
# your-code
act_all = pd.read_csv('./data/activity_all.csv')
ctr_all = pd.read_csv('./data/ctr_all.csv')

In [122]:
act_pre = act_all[act_all['dt'] < '2021-11-01']
ctr_pre = ctr_all[ctr_all['dt'] < '2021-11-01']

act_pre_active = act_pre[act_pre['activity_level'] > 0]

In [120]:
print('Total users:', act_pre['userid'].nunique(),
      '\nGroup 0 users:', act_pre[act_pre['groupid'] == 0]['userid'].nunique(),
      '\nGroup 1 users:', act_pre[act_pre['groupid'] == 1]['userid'].nunique())

Total users: 60000 
Group 0 users: 29951 
Group 1 users: 30049


In [129]:
#daily active users per group

dau_g0_df = act_pre_active[act_pre_active['groupid'] == 0].groupby(['dt']).size().reset_index(name="DAU")
dau_g0 = dau_g0_df['DAU'].mean()

dau_g1_df = act_pre_active[act_pre_active['groupid'] == 1].groupby(['dt']).size().reset_index(name="DAU")
dau_g1 = dau_g1_df['DAU'].mean()

print('Average DAU group 0:', dau_g0,
      '\nAverage DAU group 1:', dau_g1)

Average DAU group 0: 15320.870967741936 
Average DAU group 1: 15352.516129032258


In [132]:
# Ztest between group 0 and 1 DAU
Z_score, p_value = ztest(dau_g0_df['DAU'], dau_g1_df['DAU'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: -1.4121065242323187 
p-value: 0.15791859802311015


In [139]:
ctr_g0_df = ctr_pre[ctr_pre['groupid'] == 0].groupby('dt')['ctr'].mean().reset_index(name='ctr')
ctr_g0 = ctr_g0_df['ctr'].mean()

ctr_g1_df = ctr_pre[ctr_pre['groupid'] == 1].groupby('dt')['ctr'].mean().reset_index(name='ctr')
ctr_g1 = ctr_g1_df['ctr'].mean()

print('Average CTR group 0:', ctr_g0,
      '\nAverage CTR group 1:', ctr_g1)

Average CTR group 0: 33.00093853567254 
Average CTR group 1: 32.999576637260105


In [141]:
# Ztest between group 0 and 1 CTR
Z_score, p_value = ztest(ctr_g0_df['ctr'], ctr_g1_df['ctr'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: 0.3813635623232328 
p-value: 0.7029334947610391


---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [142]:
# your-code
act_post = act_all[act_all['dt'] > '2021-10-31']
ctr_post = ctr_all[ctr_all['dt'] > '2021-10-31']

act_post_active = act_post[act_post['activity_level'] > 0]

In [145]:
print('Total users:', act_post['userid'].nunique(),
      '\nGroup 0 users:', act_post[act_post['groupid'] == 0]['userid'].nunique(),
      '\nGroup 1 users:', act_post[act_post['groupid'] == 1]['userid'].nunique())

Total users: 60000 
Group 0 users: 29951 
Group 1 users: 30049


In [146]:
#daily active users per group

dau_g0_df_post = act_post_active[act_post_active['groupid'] == 0].groupby(['dt']).size().reset_index(name="DAU")
dau_g0_post = dau_g0_df_post['DAU'].mean()

dau_g1_df_post = act_post_active[act_post_active['groupid'] == 1].groupby(['dt']).size().reset_index(name="DAU")
dau_g1_post = dau_g1_df_post['DAU'].mean()

print('Average DAU group 0:', dau_g0_post,
      '\nAverage DAU group 1:', dau_g1_post)

Average DAU group 0: 15782.0 
Average DAU group 1: 29302.433333333334


In [147]:
# Ztest between group 0 and 1 DAU
Z_score, p_value = ztest(dau_g0_df_post['DAU'], dau_g1_df_post['DAU'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: -198.89904948926164 
p-value: 0.0


In [148]:
ctr_g0_df_post = ctr_post[ctr_post['groupid'] == 0].groupby('dt')['ctr'].mean().reset_index(name='ctr')
ctr_g0_post = ctr_g0_df_post['ctr'].mean()

ctr_g1_df_post = ctr_post[ctr_post['groupid'] == 1].groupby('dt')['ctr'].mean().reset_index(name='ctr')
ctr_g1_post = ctr_g1_df_post['ctr'].mean()

print('Average CTR group 0:', ctr_g0_post,
      '\nAverage CTR group 1:', ctr_g1_post)

Average CTR group 0: 32.996949636224016 
Average CTR group 1: 37.996960401253006


In [149]:
# Ztest between group 0 and 1 CTR
Z_score, p_value = ztest(ctr_g0_df_post['ctr'], ctr_g1_df_post['ctr'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: -1833.515051435323 
p-value: 0.0


---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

In [170]:
# your-conclusions

print(f'-The results present an increase of {round((dau_g1_post/dau_g1 - 1)*100, 2)}% in the average DAU',
      f'\n and an increase of {round((ctr_g1_post/ctr_g1 - 1)*100, 2)}% in the average CTR.',
      f'\n-As the experience presents statistical relevance with an advantage in the general KPIs, implementing',
      f'\nthe changes to the general public is recommended.')

-The results present an increase of 90.86% in the average DAU 
 and an increase of 15.14% in the average CTR. 
-As the experience presents statistical relevance with an advantage in the general KPIs, implementing 
the changes to the general public is recommended.


---