# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are needed to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [60]:
import numpy as np
import pandas as pd
from statsmodels.stats.weightstats import ztest
from scipy import stats
from scipy.stats import t
import matplotlib.pyplot as plt
import warnings
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import ttest_ind
warnings.filterwarnings("ignore")

In [2]:
activity = pd.read_csv("./data/activity_pretest.csv")
activity.head(10)

Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0
5,9b2f41cf-350d-4073-b9d4-3848d0c0b1b5,2021-10-01,0
6,82b1f3a8-57cc-4d2e-96c4-3664150f53e5,2021-10-01,0
7,9dcc4eed-c222-4323-b2f6-d91edaba5d0e,2021-10-01,0
8,c55c0d67-6b95-4d19-bf7d-4c33911da83f,2021-10-01,0
9,40992374-c58b-4004-a7b4-1aa737a4b636,2021-10-01,0


In [3]:
activity.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1860000 entries, 0 to 1859999
Data columns (total 3 columns):
 #   Column          Dtype 
---  ------          ----- 
 0   userid          object
 1   dt              object
 2   activity_level  int64 
dtypes: int64(1), object(2)
memory usage: 42.6+ MB


In [4]:
activity["activity_level"].unique()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

In [5]:
users_activity = activity["activity_level"].value_counts().reset_index()
users_activity

Unnamed: 0,activity_level,count
0,0,909125
1,5,49227
2,2,49074
3,18,48982
4,10,48943
5,16,48934
6,12,48911
7,6,48901
8,19,48901
9,11,48832


In [6]:
len(activity["userid"].unique())

60000

In [7]:
activity_per_day = activity.groupby(["dt"])[["activity_level"]].nunique().reset_index()
activity_per_day

Unnamed: 0,dt,activity_level
0,2021-10-01,21
1,2021-10-02,21
2,2021-10-03,21
3,2021-10-04,21
4,2021-10-05,21
5,2021-10-06,21
6,2021-10-07,21
7,2021-10-08,21
8,2021-10-09,21
9,2021-10-10,21


In [8]:
activity_per_level = activity.groupby(["activity_level"])[["userid"]].nunique().reset_index()
activity_per_level

Unnamed: 0,activity_level,userid
0,0,60000
1,1,33688
2,2,33761
3,3,33634
4,4,33502
5,5,33820
6,6,33789
7,7,33337
8,8,33365
9,9,33636


### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testing.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [9]:
DAU = activity[activity["activity_level"]!= 0]
DAU = DAU.groupby(["dt"])[["userid"]].nunique().reset_index()
DAU.columns = ["timestamp", "dau"]
DAU

Unnamed: 0,timestamp,dau
0,2021-10-01,30634
1,2021-10-02,30775
2,2021-10-03,30785
3,2021-10-04,30599
4,2021-10-05,30588
5,2021-10-06,30639
6,2021-10-07,30637
7,2021-10-08,30600
8,2021-10-09,30902
9,2021-10-10,30581


### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [10]:
ctr = pd.read_csv("./data/ctr_pretest.csv")
ctr.head(10)

Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04
5,83d875d5-bb3e-433c-ae0e-a851dca902b3,2021-10-01,34.52
6,33f95ebe-71ec-4a1a-8670-5db74ba32779,2021-10-01,35.59
7,79e7125f-fcc7-440d-8621-e090c78015b6,2021-10-01,34.87
8,9151badd-368c-46cc-a86c-d2d25034ce25,2021-10-01,33.2
9,b1917103-6199-42bd-bf72-c97ec31937a8,2021-10-01,33.13


In [11]:
ctr_per_day = ctr.groupby(["dt"])[["ctr"]].mean().reset_index()
ctr_per_day

Unnamed: 0,dt,ctr
0,2021-10-01,32.993446
1,2021-10-02,32.991664
2,2021-10-03,32.995086
3,2021-10-04,32.992995
4,2021-10-05,33.004375
5,2021-10-06,33.018564
6,2021-10-07,32.9885
7,2021-10-08,32.998654
8,2021-10-09,33.005082
9,2021-10-10,33.007134


In [12]:
ctr_mean = ctr["ctr"].mean()
ctr_mean

33.00024155646116

---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [13]:
"""Grupo de control = 0. Grupo del experimento = 1"""

'Grupo de control = 0. Grupo del experimento = 1'

In [14]:
activity_all = pd.read_csv("./data/activity_all.csv")
ctr_all = pd.read_csv("./data/ctr_all.csv")

In [23]:
#Ctr del grupo de control antes del experimento

gc_ctr_before = ctr_all[ctr_all["groupid"]==0]
gc_ctr_before = gc_ctr_before[gc_ctr_before["dt"] < "2021-11-01"]
gc_ctr_before

Unnamed: 0,userid,dt,groupid,ctr
808703,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,0,34.28
808704,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,34.67
808705,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,0,34.77
808706,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,0,35.42
808707,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,0,35.04
...,...,...,...,...
1744262,b2f30e48-d012-4687-a93f-8000ab04e565,2021-10-31,0,31.54
1744263,1184bf5f-7036-4a6e-afc1-b0c8f1dba2de,2021-10-31,0,31.92
1744264,8b20e638-d933-489f-8139-7d7ca93aa8e0,2021-10-31,0,32.05
1744265,2071887d-c673-4d39-81dc-5c384bcb8458,2021-10-31,0,30.25


In [26]:
#Ctr del grupo de control después del experimento

gc_ctr_after= ctr_all[ctr_all["groupid"]==0]
gc_ctr_after = gc_ctr_after[gc_ctr_after["dt"] >= "2021-11-01"]
gc_ctr_after

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2274125,26c10c02-8ede-4beb-be14-32f8cca044ff,2021-11-12,0,33.28
2274126,ae235c4b-96a7-4f34-923e-08531a5f340a,2021-11-12,0,34.15
2274127,81daf7da-ba09-451f-b100-f15ed284977e,2021-11-12,0,35.79
2274128,38338581-e093-4202-8c9f-975004e221e3,2021-11-12,0,31.82


In [28]:
#Activity level del grupo de control antes del experimento

gc_activity_before = activity_all[activity_all["groupid"]==0]
gc_activity_before = gc_activity_before[gc_activity_before["dt"]< "2021-11-01"]
gc_activity_before

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
6,82b1f3a8-57cc-4d2e-96c4-3664150f53e5,2021-10-01,0,0
7,9dcc4eed-c222-4323-b2f6-d91edaba5d0e,2021-10-01,0,0
...,...,...,...,...
3625427,2ffce3bd-f7c6-4752-9141-ad887eea6938,2021-10-31,0,20
3625429,1a0dc2cf-c05a-40ad-86b8-d24809295ee2,2021-10-31,0,20
3625430,59f581ac-ff18-40f7-8253-cd8e7612bded,2021-10-31,0,20
3625439,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,0,20


In [30]:
#Activity level del grupo de control después del experimento

gc_activity_after = activity_all[activity_all["groupid"]==0]
gc_activity_after= gc_activity_after[gc_activity_after["dt"] >= "2021-11-01"]
gc_activity_after

Unnamed: 0,userid,dt,groupid,activity_level
909125,d2646662-269f-49de-aab1-8776afced9a3,2021-11-01,0,0
909126,d6d51bbc-4005-4b61-86ae-a3e239235341,2021-11-01,0,0
909127,65cc7f97-ca08-4ac8-9076-8abbbea5d95d,2021-11-01,0,0
909128,1d8901b1-5fea-4376-8370-843363886e18,2021-11-01,0,0
909129,2614b27d-1449-4d38-8f8d-524daf95361a,2021-11-01,0,0
...,...,...,...,...
3659988,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,0,20
3659989,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,0,20
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20


In [38]:
#DAU del grupo de control antes del experimento

gc_DAU_before = activity_all[activity_all["groupid"]==0]
gc_DAU_before = gc_DAU_before[gc_DAU_before["activity_level"]!= 0]
gc_DAU_before = gc_DAU_before[gc_DAU_before["dt"] < "2021-11-01"]
gc_DAU_before = gc_DAU_before.groupby(["dt"])[["userid"]].nunique().reset_index()
gc_DAU_before.columns = ["timestamp", "dau"]
gc_DAU_before

Unnamed: 0,timestamp,dau
0,2021-10-01,15337
1,2021-10-02,15354
2,2021-10-03,15423
3,2021-10-04,15211
4,2021-10-05,15126
5,2021-10-06,15335
6,2021-10-07,15346
7,2021-10-08,15357
8,2021-10-09,15371
9,2021-10-10,15277


In [39]:
#DAU del grupo de control después del experimento

gc_DAU_after = activity_all[activity_all["groupid"]==0]
gc_DAU_after = gc_DAU_after[gc_DAU_after["activity_level"]!= 0]
gc_DAU_after = gc_DAU_after[gc_DAU_after["dt"] >= "2021-11-01"]
gc_DAU_after = gc_DAU_after.groupby(["dt"])[["userid"]].nunique().reset_index()
gc_DAU_after.columns = ["timestamp", "dau"]
gc_DAU_after

Unnamed: 0,timestamp,dau
0,2021-11-01,15989
1,2021-11-02,16024
2,2021-11-03,16049
3,2021-11-04,16040
4,2021-11-05,16045
5,2021-11-06,15991
6,2021-11-07,16133
7,2021-11-08,16119
8,2021-11-09,15953
9,2021-11-10,15990


In [34]:
#Ctr del grupo de experimento antes del experimento

ge_ctr_before = ctr_all[ctr_all["groupid"]==1]
ge_ctr_before= ge_ctr_before[ge_ctr_before["dt"] < "2021-11-01"]
ge_ctr_before

Unnamed: 0,userid,dt,groupid,ctr
824040,381e40b0-5529-4bc6-a3f6-6a687c7cde66,2021-10-01,1,31.27
824041,1797453f-f558-42f6-9a2f-55b95dd37e71,2021-10-01,1,32.18
824042,f8efefba-4782-4104-8fbf-7f4381dfb6d6,2021-10-01,1,31.20
824043,8a18c870-b2e2-4a47-9b30-0859f5854dcc,2021-10-01,1,31.19
824044,d472fbc3-d580-49f7-9ba4-ef002cc80606,2021-10-01,1,35.62
...,...,...,...,...
1759573,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,1,32.33
1759574,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,1,30.09
1759575,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,1,35.71
1759576,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,1,34.76


In [35]:
#ctr del grupo de experimento después del experimento

ge_ctr_after = ctr_all[ctr_all["groupid"]==1]
ge_ctr_after= ge_ctr_after[ge_ctr_after["dt"] >= "2021-11-01"]
ge_ctr_after

Unnamed: 0,userid,dt,groupid,ctr
15973,cd5df711-42f7-4684-9ae8-f6a72383bb28,2021-11-13,1,40.39
15974,fe630199-265b-4542-a103-a74d66abeb22,2021-11-13,1,37.70
15975,4b519a79-b1a4-40b0-9369-be9e2a2699af,2021-11-13,1,35.47
15976,30a8c7b1-ed8a-4cf2-888e-b8e110ba88d9,2021-11-13,1,40.07
15977,88ab26e4-2e67-4397-a5ec-8c2a384372f5,2021-11-13,1,40.76
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [36]:
#Activity level del grupo de experimento antes del experimento

ge_activity_before = activity_all[activity_all["groupid"]==1]
ge_activity_before = ge_activity_before[ge_activity_before["dt"]< "2021-11-01"]
ge_activity_before

Unnamed: 0,userid,dt,groupid,activity_level
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0
5,9b2f41cf-350d-4073-b9d4-3848d0c0b1b5,2021-10-01,1,0
8,c55c0d67-6b95-4d19-bf7d-4c33911da83f,2021-10-01,1,0
11,de9807bb-a7ff-4334-812e-34bb15a8f573,2021-10-01,1,0
...,...,...,...,...
3625437,93179304-6690-4932-bb68-6db1a18c747a,2021-10-31,1,20
3625438,a2551ab2-abd6-46a1-9f05-e9d2318ddf35,2021-10-31,1,20
3625440,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,1,20
3625441,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,1,20


In [37]:
#Activity level del grupo de experimento después del experimento

ge_activity_after = activity_all[activity_all["groupid"]==1]
ge_activity_after = ge_activity_after[ge_activity_after["dt"]>= "2021-11-01"]
ge_activity_after

Unnamed: 0,userid,dt,groupid,activity_level
909137,39e33daf-6964-46a1-8b99-036ba08de05f,2021-11-01,1,0
909150,e1cf870f-b7c8-46e7-83cd-31a86a31375c,2021-11-01,1,0
909156,b1306532-9772-4e87-a4f5-ee5fef48783c,2021-11-01,1,0
909164,038d0ef3-3f78-465a-9c2f-ff3fe11b932a,2021-11-01,1,0
909167,f6e31dd1-7842-4dad-9270-c3b4f4fc8a59,2021-11-01,1,0
...,...,...,...,...
3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


In [42]:
#DAU del grupo de experiemento antes del experimento

ge_DAU_before = activity_all[activity_all["groupid"]==1]
ge_DAU_before = ge_DAU_before[ge_DAU_before["activity_level"]!= 0]
ge_DAU_before = ge_DAU_before[ge_DAU_before["dt"] < "2021-11-01"]
ge_DAU_before = ge_DAU_before.groupby(["dt"])[["userid"]].nunique().reset_index()
ge_DAU_before.columns = ["timestamp", "dau"]
ge_DAU_before

Unnamed: 0,timestamp,dau
0,2021-10-01,15297
1,2021-10-02,15421
2,2021-10-03,15362
3,2021-10-04,15388
4,2021-10-05,15462
5,2021-10-06,15304
6,2021-10-07,15291
7,2021-10-08,15243
8,2021-10-09,15531
9,2021-10-10,15304


In [43]:
#DAU del grupo de experimento después del experimento

ge_DAU_after = activity_all[activity_all["groupid"]==1]
ge_DAU_after = ge_DAU_after[ge_DAU_after["activity_level"]!= 0]
ge_DAU_after = ge_DAU_after[ge_DAU_after["dt"] >= "2021-11-01"]
ge_DAU_after = ge_DAU_after.groupby(["dt"])[["userid"]].nunique().reset_index()
ge_DAU_after.columns = ["timestamp", "dau"]
ge_DAU_after

Unnamed: 0,timestamp,dau
0,2021-11-01,29318
1,2021-11-02,29289
2,2021-11-03,29306
3,2021-11-04,29267
4,2021-11-05,29336
5,2021-11-06,29306
6,2021-11-07,29255
7,2021-11-08,29263
8,2021-11-09,29286
9,2021-11-10,29340


---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [None]:
"""Hecho arriba"""

---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

In [None]:

"""
H0: 
H1: 
alpha = 0.05
"""

w0 antes 0.33
w0 después 0.33

w1 antes = 0.33
w1 después = 0.38

la 1 no rechazamos (antes del experimento grupo de control y grupo de experimento)
la 2 no rechazamos (grupo de control antes y después)
la 3 rechazamos (grupo experimento antes y después)
la 4 rechazamos (grupo de control después y grupo de experimento después)

Sacar las medias de todo eso

###CON Z-TEST###

In [66]:
#1. ctr en grupo de control antes vs. ctr en grupo de experimento antes:

"""
h0 -> ctr0 == ctr1
h1 -> ctr0 != ctr1
alpha = 0.05
"""

hypothesis_mean = gc_ctr_before["ctr"].mean()
sample_mean = ge_ctr_before["ctr"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_ctr_before["ctr"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es mayor que el alpha, 
por lo que no podemos rechazar la hipótesis nula. Los valores son
suficientemente semejantes"""

Hypothesis mean: 33.00091277553074 
Sample mean: 32.99957172093258 
Probability threshold: 0.05
Z_score: -0.5348810179309715 
p_value: 0.5927321350392496


'El p-value es mayor que el alpha, \npor lo que no podemos rechazar la hipótesis nula.'

In [51]:
#2. ctr en grupo de control antes y en el grupo de control después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = gc_ctr_before["ctr"].mean()
sample_mean = gc_ctr_after["ctr"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(gc_ctr_after["ctr"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es mayor que el alpha, 
por lo que no podemos rechazar la hipótesis nula. Los valores son
suficientemente semejantes"""

Hypothesis mean: 33.00091277553074 
Sample mean: 32.996977569382835 
Probability threshold: 0.05
Z_score: -1.562285308592539 
p_value: 0.11822079141634732


'El p-value es mayor que el alpha, \npor lo que no podemos rechazar la hipótesis nula. No obstante,\nla diferencia entre ambas medias es muy pequeña.'

In [53]:
#3. ctr en el grupo de experimento antes y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = ge_ctr_before["ctr"].mean()
sample_mean = ge_ctr_after["ctr"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_ctr_after["ctr"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis
nula. Las diferencias son significativas."""

Hypothesis mean: 32.99957172093258 
Sample mean: 37.99695912626142 
Probability threshold: 0.05
Z_score: 2704.6702332058303 
p_value: 0.0


'El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis\nnula. Las diferencias son significativas.'

In [55]:
#4.ctr en el grupo de control después y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = gc_ctr_after["ctr"].mean()
sample_mean = ge_ctr_after["ctr"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_ctr_after["ctr"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza. La diferencia
es significativa"""

Hypothesis mean: 32.996977569382835 
Sample mean: 37.99695912626142 
Probability threshold: 0.05
Z_score: 2706.0742317170393 
p_value: 0.0


'El p-value es menor que el alpha, por lo que podemos rechazar\nla hipótesis nula con un 95% de confianza. La diferencia\nes significativa'

In [67]:
#1.DAU grupo control antes y grupo de experimento antes

"""
h0 -> ctr0 == ctr1
h1 -> ctr0 != ctr1
alpha = 0.05
"""

hypothesis_mean = gc_DAU_before["dau"].mean()
sample_mean = ge_DAU_before["dau"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_DAU_before["dau"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis
nula con un 95% de confianza."""

Hypothesis mean: 15320.870967741936 
Sample mean: 15352.516129032258 
Probability threshold: 0.05
Z_score: 2.0360483788020862 
p_value: 0.041745497531099254


'El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis\nnula con un 95% de confianza. Los valores iniciales son suficiente-\nmente semejantes'

In [68]:
#2. DAU grupo de control antes vs. grupo de control después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = gc_DAU_before["dau"].mean()
sample_mean = gc_DAU_after["dau"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(gc_DAU_after["dau"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis
nula con un 95% de confianza"""

Hypothesis mean: 15320.870967741936 
Sample mean: 15782.0 
Probability threshold: 0.05
Z_score: 6.806419825868515 
p_value: 1.0005744899952103e-11


'El p-value es menor que el alpha, por lo que podemos rechazar la hipótesis\nnula con un 95% de confianza. Los resultados son suficientemente\nsemejantes.'

In [58]:
#3. DAU grupo de experimento antes y grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = ge_DAU_before["dau"].mean()
sample_mean = ge_DAU_after["dau"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_DAU_after["dau"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos
rechazar la hipótesis nula con un 95% de confianza. Las diferencias
son significativas."""

Hypothesis mean: 15352.516129032258 
Sample mean: 29302.433333333334 
Probability threshold: 0.05
Z_score: 2511.9434560917125 
p_value: 0.0


In [59]:
#4.DAU grupo de control después y grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
hypothesis_mean = gc_DAU_after["dau"].mean()
sample_mean = ge_DAU_after["dau"].mean()
alpha = 0.05
print(f'Hypothesis mean: {hypothesis_mean}',
      f'\nSample mean: {sample_mean}',
      f'\nProbability threshold: {alpha}')
Z_score, p_value = ztest(ge_DAU_after["dau"], value=hypothesis_mean)
print(f'Z_score: {Z_score}', f'\np_value: {p_value}')

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza. Las diferencias
son significativas."""

Hypothesis mean: 15782.0 
Sample mean: 29302.433333333334 
Probability threshold: 0.05
Z_score: 2434.6068537754113 
p_value: 0.0


###CON T-TEST###

In [71]:
#1. ctr en grupo de control antes vs. ctr en grupo de experimento antes:

"""
h0 -> ctr0 == ctr1
h1 -> ctr0 != ctr1
alpha = 0.05
"""

s1 = gc_ctr_before
s2 = ge_ctr_before

stat, p_value = ttest_ind(s1["ctr"], s2["ctr"], equal_var=False)
print(stat, p_value)

""" p-value es mayor que el alpha, por lo que no podemos 
rechazar la hipótesis nula. Las semejanzas son suficientes"""

0.37758082852101504 0.7057420930977805


' p-value es mayor que el alpha, por lo que no podemos \nrechazar la hipótesis nula. Las semejanzas son suficientes'

In [77]:
#2. ctr en grupo de control antes y en el grupo de control después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
s1 = gc_ctr_before
s2 = gc_ctr_after

stat, p_value = ttest_ind(s1["ctr"], s2["ctr"], equal_var=False)
print(stat, p_value)

"""El p-value es mayor que el alpha, por lo que no podemos
rechazar la hipótesis nula."""

1.1054092221847036 0.2689825899222552


'El p-value es mayor que el alpha, por lo que no podemos\nrechazar la hipótesis nula.'

In [79]:
#3. ctr en el grupo de experimento antes y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""

s2 = ge_ctr_before
s1 = ge_ctr_after

stat, p_value = ttest_ind(s1["ctr"], s2["ctr"], alternative="greater")
print(stat, p_value)

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza"""

1603.8146799084154 0.0


'El p-value es menor que el alpha, por lo que podemos rechazar\nla hipótesis nula con un 95% de confianza'

In [88]:
#4.ctr en el grupo de control después y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""

s2 = gc_ctr_after
s1 = ge_ctr_after

stat, p_value = ttest_ind(s1["ctr"], s2["ctr"], alternative="greater")
print(stat, p_value)

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza"""

1600.7913068017688 0.0


'El p-value es menor que el alpha, por lo que podemos rechazar\nla hipótesis nula con un 95% de confianza'

In [81]:
#1.DAU grupo control antes y grupo de experimento antes

"""
h0 -> ctr0 == ctr1
h1 -> ctr0 != ctr1
alpha = 0.05
"""

s1 = gc_DAU_before
s2 = ge_DAU_before

stat, p_value = ttest_ind(s1["dau"], s2["dau"], equal_var=False)
print(stat, p_value)


"""El p-value es mayor que el alpha, por lo que no podemos rechazar
la hipótesis nula. Los valores son suficientemente semejantes."""

-1.4121065242323185 0.16309165273757295


'El p-value es mayor que el alpha, por lo que no podemos rechazar\nla hipótesis nula. Los valores son suficientemente semejantes.'

In [83]:
#2. DAU en grupo de control antes y en el grupo de control después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""
s1 = gc_DAU_before
s2 = gc_DAU_after

stat, p_value = ttest_ind(s1["dau"], s2["dau"], equal_var=False)
print(stat, p_value)

"""El p-value es mayor que el alpha, por lo que no podemos
rechazar la hipótesis nula."""

-6.621030565164659 1.7441848378784162e-07


'El p-value es mayor que el alpha, por lo que no podemos\nrechazar la hipótesis nula.'

In [84]:
#3. DAU en el grupo de experimento antes y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""

s2 = ge_DAU_before
s1 = ge_DAU_after

stat, p_value = ttest_ind(s1["dau"], s2["dau"], alternative="greater")
print(stat, p_value)

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza. Hay un aumento muy marcado."""

834.2853199319381 3.936847607277901e-122


'El p-value es menor que el alpha, por lo que podemos rechazar\nla hipótesis nula con un 95% de confianza'

In [87]:
#4.DAU en el grupo de control después y el grupo de experimento después

"""
h0 -> ctr0 == ctr0
h1 -> ctr0 != ctr0
alpha = 0.05
"""

s2 = gc_DAU_after
s1 = ge_DAU_after

stat, p_value = ttest_ind(s1["dau"], s2["dau"], alternative="greater")
print(stat, p_value)

"""El p-value es menor que el alpha, por lo que podemos rechazar
la hipótesis nula con un 95% de confianza. Hay un aumento muy marcado."""

198.89904948926164 3.295301792053622e-84


'El p-value es menor que el alpha, por lo que podemos rechazar\nla hipótesis nula con un 95% de confianza'

###CONCLUSIONES###

In [None]:
"""
El cambio parece ser adecuado y coherente. Las muestras del grupo
de control y del grupo del experimento son suficientemente semejantes
al comienzo del experimento y suficientemente distintas al final.
Hay un acusado crecimiento del ctr y del DAU entre los
usuarios del grupo del experimento, especialmente en el último caso,
donde el crecimiento es aún más marcado.
"""