* `Cookie Cats` is popular mobile puzzle games.
* As players progress through the levels of the game, they will occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress. In addition to driving in-app purchases, these gates serve the important purpose of giving players an enforced break from playing the game, hopefully resulting in that the player's enjoyment of the game being increased and prolonged.
* But where should the gates be placed? Initially the first gate was placed at level 30, but in this notebook we're going to analyze an AB-test where we moved the first gate in Cookie Cats from level 30 to level 40. In particular, we will look at the impact on player retention. But before we get to that, a key step before undertaking any analysis is understanding the data. So let's load it in and take a look!

### Import modules

In [9]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

### Import Data

In [4]:
df = pd.read_csv("cookie_cats.csv")

In [5]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
userid            90189 non-null int64
version           90189 non-null object
sum_gamerounds    90189 non-null int64
retention_1       90189 non-null bool
retention_7       90189 non-null bool
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB


* We have 90,189 players that installed game while AB test was running.
* About features
    - `userid` - a unique number that identifies each player.
    - `version` - whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40).
    - `sum_gamerounds - the number of game rounds played by the player during the first 14 days after install.
    - `retention_1` - did the player come back and play 1 day after installing?
    - `retention_7` - did the player come back and play 7 days after installing?

* When a player installed the game, he or she was randomly assigned to either gate_30 or gate_40. As a sanity check, let's see if there are roughly the same number of players in each AB group.

In [8]:
df.groupby('version')['userid'].count()

version
gate_30    44700
gate_40    45489
Name: userid, dtype: int64

* There are roughly same number of players in each groups.
* Main goal of this analysis is how gate placement affects the player retention.
* Distribution of number of game rounds player played during 1st week of playing game

In [None]:
plt.