# Project Background

Data is based on Cookie Cats, mobile puzzle game developed by Ractile Entertainments. In this game, the player is expected to connect tiles of the same color to clear the board and win the level. It also features singing cats. As players progress through the levels of the game, they will **occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress**. In addition to in-app purchases, these gates serve the important purpose of giving players an enforced break from playing the game, hopefully resulting in improved enjoyment of the game.  

## Problem formulism

**Where should the gates be placed?** Initially the first gate was placed at level 30. In this project, **we are going to analyze an AB-test where we moved the first gate in Cookie Cates from level 30 to level 40. In particular, we want to measure the impact on player retention based on these gates.**

# Basic Imports 

In [90]:
import pandas as pd
import numpy as np
from scipy import stats

## Read the data

In [57]:
df = pd.read_csv("./cookie_cats.csv")

In [81]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


In [78]:
gate_30 = df[df['version'] == 'gate_30']
gate_40 = df[df['version'] == 'gate_40']

In [82]:
gate_30['retention_1'] = gate_30['retention_1'].apply(lambda x: 1 if x == True else 0)
gate_30['retention_7'] = gate_30['retention_7'].apply(lambda x: 1 if x == True else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [92]:
gate_40['retention_1'] = gate_40['retention_1'].apply(lambda x: 1 if x == True else 0)
gate_40['retention_7'] = gate_40['retention_7'].apply(lambda x: 1 if x == True else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


## Comparison between the two gates at 1 day retention period

In [93]:
a = gate_30['retention_1']
b = gate_40['retention_1']

In [95]:
print("Gate 30, one day retention = ", np.round(a.mean(),4))
print("Gate 40, one day retention = ", np.round(b.mean(),4))

Gate 30, one day retention =  0.4482
Gate 40, one day retention =  0.4423


**Interpretation**: One day retention at gate 30 for players is 44.82%, while for gate 40 is slightly lower at 44.2%. Significance testing based on t-test shows p values greater than alpha = 0.05. Therefore, we conclude that the difference between gates at level 30 and 40 are not statistically significant and hence the results are inconclusive. 

In [96]:
# Using built-in t-test:
t, p = stats.ttest_ind(a, b)
print("t:\t", t, "p:\t", p)

t:	 1.7840979256519656 p:	 0.07441111525563184


## Comparison between the two gates at 7 days retention period

In [98]:
a7 = gate_30['retention_7']
b7 = gate_40['retention_7']

In [102]:
print("Gate 30, seven days retention = ", np.round(a7.mean(),4))
print("Gate 40, seven days retention = ", np.round(b7.mean(),4))

Gate 30, seven days retention =  0.1902
Gate 40, seven days retention =  0.182


In [103]:
# Using built-in t-test:
t, p = stats.ttest_ind(a7, b7)
print("t:\t", t, "p:\t", p)

t:	 3.1644994996802778 p:	 0.0015540151201088365


**Interpretation**: Difference for gates at level 30 & 40 are statistically significant at 7 days retention period. 

**Take-away:** It would be better to place gates at level 30 as it renders users with higher one day as well seven days retention. Results become statistically significant at 7 days retention data. 