<a href="https://colab.research.google.com/github/jiao-xx/projects/blob/main/Mobile_Games_A_B_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Background

Cookie Cats is a popular game on mobile phones where you match tiles of the same color. As players progress through the levels of the game, **they will occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress.**

The game developers are trying to figure out the best place to make players wait. They first made players wait after level 30. Now, they're testing if making them wait after level 40 is better. We will study the results to see which is better for keeping players playing.


- A "gate" is a point or level where players are stopped and either have to wait for a certain period of time or make an in-app purchase to continue playing. It's a way to pause the player's progression in the game. The idea behind it is to encourage players to spend money in the game or to give them a break so they don't burn out and continue to enjoy the game over a longer period of time.

# About the Data

In [1]:
# Google Colab-specific
from google.colab import drive

drive.mount('/content/drive')

%cd drive/MyDrive/Projects

Mounted at /content/drive
/content/drive/MyDrive/Projects


In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('cookie_cats.csv')
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


We have data from 90,189 people who played the game during a test. The data includes:

- **userid**: A special number for each player.
- **version**: Tells us if the player saw a stop at level 30 or level 40.
- **sum_gamerounds**: How many times they played in the first week after getting the game.
- **retention_1**: Did they play again the next day?
- **retention_7**: Did they play again a week later?

When someone started the game, they were randomly chosen to see the stop at either level 30 or 40.

In [4]:
# check missing value
df.info()

# seems like there is no missing value

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   userid          90189 non-null  int64 
 1   version         90189 non-null  object
 2   sum_gamerounds  90189 non-null  int64 
 3   retention_1     90189 non-null  bool  
 4   retention_7     90189 non-null  bool  
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB


# Analyzing

In [5]:
# Count the number of players in each group
df.groupby('version').count()

Unnamed: 0_level_0,userid,sum_gamerounds,retention_1,retention_7
version,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
gate_30,44700,44700,44700,44700
gate_40,45489,45489,45489,45489


- 45,489 players encountered the gate at level 40.
- 44,700 players encountered the gate at level 30.

## Comparing 1-day vs 7-day Retention

In [10]:
# Calculate the 1-day and 7-day retention rates for each version
retention_1_gate_30 = df[df['version'] == 'gate_30']['retention_1'].mean()
retention_1_gate_40 = df[df['version'] == 'gate_40']['retention_1'].mean()

retention_7_gate_30 = df[df['version'] == 'gate_30']['retention_7'].mean()
retention_7_gate_40 = df[df['version'] == 'gate_40']['retention_7'].mean()

retention_rates = {
    "1-Day Retention (Gate 30)": retention_1_gate_30,
    "1-Day Retention (Gate 40)": retention_1_gate_40,
    "7-Day Retention (Gate 30)": retention_7_gate_30,
    "7-Day Retention (Gate 40)": retention_7_gate_40
}

retention_rates


{'1-Day Retention (Gate 30)': 0.4481879194630872,
 '1-Day Retention (Gate 40)': 0.44228274967574577,
 '7-Day Retention (Gate 30)': 0.19020134228187918,
 '7-Day Retention (Gate 40)': 0.18200004396667327}

- 1-Day Retention for Gate 30: Approximately 44.82%
- 1-Day Retention for Gate 40: Approximately 44.23%
- 7-Day Retention for Gate 30: Approximately 19.02%
- 7-Day Retention for Gate 40: Approximately 18.20%

At first glance, it seems that placing the gate at level 30 results in slightly higher retention rates for both 1-day and 7-day measurements.

To determine if these differences are statistically significant, we can conduct hypothesis tests. Specifically, we can use a chi-squared test for independence since the retention variables are categorical (True or False).

## Chi - square

In [12]:
from scipy.stats import chi2_contingency

# Create contingency tables for the 1-day and 7-day retention rates
contingency_1_day = pd.crosstab(df['version'], df['retention_1'])
contingency_7_day = pd.crosstab(df['version'], df['retention_7'])

# Perform chi-squared tests
chi2_stat_1_day, p_val_1_day, _, _ = chi2_contingency(contingency_1_day)
chi2_stat_7_day, p_val_7_day, _, _ = chi2_contingency(contingency_7_day)

chi2_results = {
    "1-Day Retention": {"Chi2 Statistic": chi2_stat_1_day, "p-value": p_val_1_day},
    "7-Day Retention": {"Chi2 Statistic": chi2_stat_7_day, "p-value": p_val_7_day}
}

chi2_results


{'1-Day Retention': {'Chi2 Statistic': 3.1591007878782262,
  'p-value': 0.07550476210309086},
 '7-Day Retention': {'Chi2 Statistic': 9.959086799559167,
  'p-value': 0.0016005742679058301}}

- 1-Day Retention:

Chi2 Statistic: 3.16

p-value: 0.0755

For the 1-day retention rate, the p-value is greater than 0.05, suggesting that the observed differences in retention between the two groups might not be statistically significant at the commonly used 5% significance level.

- 7-Day Retention:

Chi2 Statistic: 9.96

p-value: 0.0016

For the 7-day retention rate, the p-value is less than 0.05, indicating that the differences in retention between the two groups are statistically significant at the 5% significance level.

# Conclusion


- For 1-day retention, we don't have enough evidence to say that there's a significant difference in retention between the two groups.

- For 7-day retention, it seems that placing the gate at level 30 leads to a significantly higher retention rate compared to placing it at level 40.

Given these results, the game developers might consider keeping the gate at level 30 to maximize player retention over the course of a week.