# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are need it to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [1]:
import numpy as np
import pandas as pd

from statsmodels.stats.weightstats import ztest
from scipy import stats

import seaborn as sns
import matplotlib.pylab as plt



In [2]:
# your-code
act_pretest = pd.read_csv('data/activity_pretest.csv')
act_pretest.head()

Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0


In [3]:
alevels = act_pretest['activity_level'].unique()
len(alevels)

21

In [4]:
act_pretest.groupby(by='activity_level').count()

Unnamed: 0_level_0,userid,dt
activity_level,Unnamed: 1_level_1,Unnamed: 2_level_1
0,909125,909125
1,48732,48732
2,49074,49074
3,48659,48659
4,48556,48556
5,49227,49227
6,48901,48901
7,48339,48339
8,48396,48396
9,48820,48820


In [5]:
act_pretest.groupby(by=['activity_level','dt']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,userid
activity_level,dt,Unnamed: 2_level_1
0,2021-10-01,29366
0,2021-10-02,29225
0,2021-10-03,29215
0,2021-10-04,29401
0,2021-10-05,29412
...,...,...
20,2021-10-27,810
20,2021-10-28,800
20,2021-10-29,784
20,2021-10-30,780


### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testinG.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [6]:
# your-code

act_pretest = pd.read_csv('data/activity_pretest.csv')
act_pretest


Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [7]:
act_pretest.groupby(by='dt')['userid'].count()

dt
2021-10-01    60000
2021-10-02    60000
2021-10-03    60000
2021-10-04    60000
2021-10-05    60000
2021-10-06    60000
2021-10-07    60000
2021-10-08    60000
2021-10-09    60000
2021-10-10    60000
2021-10-11    60000
2021-10-12    60000
2021-10-13    60000
2021-10-14    60000
2021-10-15    60000
2021-10-16    60000
2021-10-17    60000
2021-10-18    60000
2021-10-19    60000
2021-10-20    60000
2021-10-21    60000
2021-10-22    60000
2021-10-23    60000
2021-10-24    60000
2021-10-25    60000
2021-10-26    60000
2021-10-27    60000
2021-10-28    60000
2021-10-29    60000
2021-10-30    60000
2021-10-31    60000
Name: userid, dtype: int64

In [29]:
dau = act_pretest[act_pretest['activity_level'] != 0].groupby(by='dt')['userid'].count()
dau

dt
2021-10-01    30634
2021-10-02    30775
2021-10-03    30785
2021-10-04    30599
2021-10-05    30588
2021-10-06    30639
2021-10-07    30637
2021-10-08    30600
2021-10-09    30902
2021-10-10    30581
2021-10-11    30489
2021-10-12    30715
2021-10-13    30761
2021-10-14    30716
2021-10-15    30637
2021-10-16    30708
2021-10-17    30741
2021-10-18    30694
2021-10-19    30587
2021-10-20    30795
2021-10-21    30705
2021-10-22    30573
2021-10-23    30645
2021-10-24    30815
2021-10-25    30616
2021-10-26    30673
2021-10-27    30661
2021-10-28    30734
2021-10-29    30723
2021-10-30    30628
2021-10-31    30519
Name: userid, dtype: int64

1st insight
==

There are a mean of 30673 active users per day for a total of 60000

---

### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [9]:
# your-code


ctr_pretest = pd.read_csv('data/ctr_pretest.csv')
ctr_pretest

Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04
...,...,...,...
950870,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,32.33
950871,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,30.09
950872,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,35.71
950873,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,34.76


In [47]:
ctr_pretest.groupby(by='dt').describe()

Unnamed: 0_level_0,ctr,ctr,ctr,ctr,ctr,ctr,ctr,ctr
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
dt,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
2021-10-01,30634.0,32.993446,1.736421,30.0,31.48,32.99,34.51,36.0
2021-10-02,30775.0,32.991664,1.735812,30.0,31.48,32.99,34.49,36.0
2021-10-03,30785.0,32.995086,1.73152,30.0,31.5,32.99,34.5,36.0
2021-10-04,30599.0,32.992995,1.734027,30.0,31.49,32.98,34.505,36.0
2021-10-05,30588.0,33.004375,1.733007,30.0,31.51,33.01,34.5,36.0
2021-10-06,30639.0,33.018564,1.723247,30.0,31.54,33.04,34.5,36.0
2021-10-07,30637.0,32.9885,1.733208,30.0,31.49,32.98,34.49,36.0
2021-10-08,30600.0,32.998654,1.728809,30.0,31.51,33.0,34.49,36.0
2021-10-09,30902.0,33.005082,1.734601,30.0,31.5,33.0,34.52,36.0
2021-10-10,30581.0,33.007134,1.728713,30.0,31.5,33.02,34.5,36.0


1st Insight
==

Users behavior is similar for each day

---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [10]:
# your-code

act_all = pd.read_csv('data/activity_all.csv')
act_all.head()


Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0


In [11]:
ctr_all = pd.read_csv('data/ctr_all.csv')
ctr_all.head()

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95


In [12]:
act_before = act_all[act_all['dt'] < '2021-11-01']
act_before

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0
...,...,...,...,...
3625439,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,0,20
3625440,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,1,20
3625441,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,1,20
3625442,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,0,20


In [13]:
act_before.groupby(by='groupid')['activity_level'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
groupid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,928481.0,5.245635,6.521184,0.0,0.0,1.0,10.0,20.0
1,931519.0,5.240952,6.520811,0.0,0.0,1.0,10.0,20.0


In [14]:
ctr_before = ctr_all[ctr_all['dt'] < '2021-11-01']
ctr_before

Unnamed: 0,userid,dt,groupid,ctr
808703,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,0,34.28
808704,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,34.67
808705,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,0,34.77
808706,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,0,35.42
808707,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,0,35.04
...,...,...,...,...
1759573,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,1,32.33
1759574,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,1,30.09
1759575,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,1,35.71
1759576,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,1,34.76


In [15]:
ctr_before.groupby(by=['groupid'])['ctr'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
groupid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,474947.0,33.000913,1.7337,30.0,31.5,33.0,34.5,36.0
1,475928.0,32.999572,1.729657,30.0,31.5,33.0,34.5,36.0


In [39]:
act_before.groupby(by='groupid')['activity_level'].mean()

groupid
0    5.245635
1    5.240952
Name: activity_level, dtype: float64

In [40]:
Z_score, p_value = ztest(act_before[act_before['groupid'] == 0]['activity_level'], act_before[act_before['groupid'] == 1]['activity_level'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: 0.48969932620912937 
p-value: 0.6243466787190262


In [38]:
ctr_before.groupby(by='groupid')['ctr'].mean()

groupid
0    33.000913
1    32.999572
Name: ctr, dtype: float64

In [41]:
Z_score, p_value = ztest(ctr_before[ctr_before['groupid'] == 0]['ctr'], ctr_before[ctr_before['groupid'] == 1]['ctr'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: 0.3775817380268587 
p-value: 0.7057413330705573


In [31]:
dau_all = act_all[act_all['activity_level'] != 0].groupby(by='dt').count()
dau_all

Unnamed: 0_level_0,userid,groupid,activity_level
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-10-01,30634,30634,30634
2021-10-02,30775,30775,30775
2021-10-03,30785,30785,30785
2021-10-04,30599,30599,30599
2021-10-05,30588,30588,30588
...,...,...,...
2021-11-26,44529,44529,44529
2021-11-27,44636,44636,44636
2021-11-28,44556,44556,44556
2021-11-29,44645,44645,44645


---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [25]:
# your-code

act_after = act_all[act_all['dt'] >= '2021-11-01']
ctr_after = ctr_all[ctr_all['dt'] >= '2021-11-01']

### Activity ztest

In [36]:
act_after.groupby(by='groupid')['activity_level'].mean()

groupid
0    5.402211
1    9.996304
Name: activity_level, dtype: float64

In [43]:
Z_score, p_value = ztest(act_after[act_after['groupid'] == 0]['activity_level'], act_after[act_after['groupid'] == 1]['activity_level'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: -498.4000989153473 
p-value: 0.0


### CTR ztest

In [37]:
ctr_after.groupby(by='groupid')['ctr'].mean()

groupid
0    32.996978
1    37.996959
Name: ctr, dtype: float64

In [44]:
Z_score, p_value = ztest(ctr_after[ctr_after['groupid'] == 0]['ctr'], ctr_after[ctr_after['groupid'] == 1]['ctr'])

print(f'Z_score: {Z_score}', f'\np-value: {p_value}')

Z_score: -1600.7913068017688 
p-value: 0.0


Experiment effects on DAU
===

In [32]:
dau_before = act_before[act_before['activity_level'] != 0].groupby(by='dt').count()
dau_before

Unnamed: 0_level_0,userid,groupid,activity_level
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-10-01,30634,30634,30634
2021-10-02,30775,30775,30775
2021-10-03,30785,30785,30785
2021-10-04,30599,30599,30599
2021-10-05,30588,30588,30588
2021-10-06,30639,30639,30639
2021-10-07,30637,30637,30637
2021-10-08,30600,30600,30600
2021-10-09,30902,30902,30902
2021-10-10,30581,30581,30581


In [33]:
dau_after = act_after[act_after['activity_level'] != 0].groupby(by='dt').count()
dau_after

Unnamed: 0_level_0,userid,groupid,activity_level
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-11-01,45307,45307,45307
2021-11-02,45313,45313,45313
2021-11-03,45355,45355,45355
2021-11-04,45307,45307,45307
2021-11-05,45381,45381,45381
2021-11-06,45297,45297,45297
2021-11-07,45388,45388,45388
2021-11-08,45382,45382,45382
2021-11-09,45239,45239,45239
2021-11-10,45330,45330,45330


---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

In [45]:
# your-conclusions

    # About experiment effects in DAU:
        # There is an increase of active users after experiment.
        
    # About experiments effects on Activity Level and CTR:
        # There is an important increase in the activity level of the users (from aprox 5.5 to aprox 10).
        # There is an important increase in the ctr of the users (from aprox 33 to aprox 38).
        
# Conclussion:
    # The experiment has been successful.
    

---