In [35]:
Image(url="https://tactilegames.com/wp-content/uploads/2018/05/cookie-cats.png", width = 600)

In [36]:
# Import libraries.
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import math
from scipy.stats import shapiro
from scipy.stats import lognorm
from scipy.stats import bootstrap
import statsmodels.api as sm
import statistics
from scipy.stats import kstest
%matplotlib inline

# Read data.
cats = pd.read_csv(r"C:\Users\jonat\data_projects\ab_testing\data\cookie_cats.csv")

cats.head(5)

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "connect three"-style puzzle game where the player must connect tiles of the same color to clear the board and win the level.  As players progress through the levels of the game, they will occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress. In addition to driving in-app purchases, these gates serve the important purpose of giving players an enforced break from playing the game, hopefully resulting in that the player's enjoyment of the game being increased and prolonged. 

Tactile Games are interested in assessing how they can increase in-game purchases within Cookie Cats. Based on user-research, Tactile Games will have come up with a basic hypothesis that changing the gate placement within the game will lead to greater retention rates and therefore, greater in-game purchases.

In order to verify this however, they will want to follow a rigorous procedure for A/B testing. It will be as follows:
1. Define objectives. 
2. Hypothesis. 
   a) Null Hypothesis. Changing the gate placement from level 30 to level 40 will not increase retention rates 7 days later.
   b) Alternative Hypothesis. Changing the gate placement from level 30 to level 40 will increase retention rates 7 days later.
3. Metrics.
5. Assumptions.
   a) Independence. We will assume that independence of observations between the gate_30 and gate_40 groups has been controlled for.
   b) Randomization. We will assume groups between the two groups have been randomly assigned.
   c) Sample size.
   d) Normality. 
6. Statistical test.

### 1. Define objectives.
Does placing the gate at a farther level in the game lead to higher retention rates.

### 2. Hypothesis
Null Hypothesis. Changing the gate placement from level 30 to level 40 will not increase retention rates 7 days later.
Alternative Hypothesis. Changing the gate placement from level 30 to level 40 will increase retention rates 7 days later.

### 3. Metrics.
Retention rate (7-day). This is equivalent to the percentage of users that came back to the game 7 days after installing.

### 4. Assumptions.

1. Independence. 
2. Randomization.
3. Sample size. 
4. Normality.

Independence, Randomization, and Sample Size, are all assumptions which will have been assured by Tactile Games in their design of this experiment. 

Both the treatment and control groups are of significant sample size.

In [13]:
cats['version'].value_counts()

gate_40    45489
gate_30    44700
Name: version, dtype: int64

Normality however, is also an assumption we can test.

There are multiple ways of testing for normality:
- Histograms.  
- Quintile plot.

However, because of the size of our two sample groups, we are able to fulfill the conditions of the Central Limit Thorem, stating that the distribution of the sampling means will approximate that of a normal distribution, given that the sample sizes are large enough. 

Given that our data fulfills the criteria of Independence, Randomization, Sample size, and Normality, we are able to pursue a parametric test.

### 5. Statistical test.

And since we are working with comparing the proportions of two sample groups - i.e. the relative share of users that return 7 days after installing the game within the gate_30 and gate_40 group - it makes sense to complete a Z-test for proportions. 

Alternatively, bootstrapping can also be pursued, should we want to use a non-parametric method. 

### z-test for proportions.
To perform a z-test for proportions, we follow these steps:
1. Calculate the proportions.
2. Calculate the pooled proportion.
3. Perform the z-test.
4. Find the p-value and determine significance level.
5. Make a decision.

1. Calculate the proportions.

In [25]:
# Calculate retention rate for gate_30
total_users_gate_30 = cats[cats['version'] == 'gate_30'].shape[0]
retained_users_gate_30 = cats[(cats['version'] == 'gate_30') & (cats['retention_7'] == True)].shape[0]
retention_rate_gate_30 = round((retained_users_gate_30 / total_users_gate_30), 2)

# Calculate retention rate for gate_40
total_users_gate_40 = cats[cats['version'] == 'gate_40'].shape[0]
retained_users_gate_40 = cats[(cats['version'] == 'gate_40') & (cats['retention_7'] == True)].shape[0]
retention_rate_gate_40 = round((retained_users_gate_40 / total_users_gate_40), 2)

print(f"Retention rate for gate_30 at retention_7: {retention_rate_gate_30}")
print(f"Retention rate for gate_40 at retention_7: {retention_rate_gate_40}")

Retention rate for gate_30 at retention_7: 0.19
Retention rate for gate_40 at retention_7: 0.18


2. Calculate the pooled proportions and sample size.

In [26]:
total_users = total_users_gate_30 + total_users_gate_40
total_retained_users = retained_users_gate_30 + retained_users_gate_40
pooled_proportion = round((total_retained_users / total_users), 2)
print(f"Pooled Proportion (overall retention rate): {pooled_proportion}")

n_gate_40 = cats['version'].value_counts()[0]
n_gate_30 = cats['version'].value_counts()[1]

Pooled Proportion (overall retention rate): 0.19


3. Perform the z-test.

$$ Z = \frac{( \hat{p}_{\text{gate\_40}} - \hat{p}_{\text{gate\_30}} ) - 0}{\sqrt{ \hat{p}(1 - \hat{p}) \left( \frac{1}{n_{\text{gate\_30}}} + \frac{1}{n_{\text{gate\_40}}} \right) }} $$

In [27]:
z = (retention_rate_gate_40 - retention_rate_gate_30 - 0) / math.sqrt(pooled_proportion * (1 - pooled_proportion) * (1/n_gate_30 + 1/n_gate_40))

4. Find the p-value and determine significance level.

In [32]:
p_value = 1 - norm.cdf(z)
alpha = 0.05

5. Make a decision.

In [33]:
if p_value < alpha:
    print(f"Reject the null hypothesis at alpha = {alpha}. p-value = {p_value}")
else:
    print(f"Fail to reject the null hypothesis at alpha = {alpha}. p-value = {p_value}")

Fail to reject the null hypothesis at alpha = 0.05. p-value = 0.999935264255483


We fail to reject the null hypothesis, therefore suggesting that there isn't enough evidence to support the claim that moving the gate from level 30 to level 40 results in a higher retention rate.