![](https://i.ytimg.com/vi/iPxZIp0cbJE/maxresdefault.jpg)

# Introduction

This dataset is from a DataCamp project: https://www.datacamp.com/projects/184.
### Project Description from DataCamp

Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "connect three" style puzzle game where the player must connect tiles of the same color in order to clear the board and win the level. It also features singing cats. We're not kidding!

As players progress through the game they will encounter gates that force them to wait some time before they can progress or make an in-app purchase. In this project, we will analyze the result of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention and game rounds.

To complete this project, you should be comfortable working with pandas DataFrames and with using the pandas plot method. You should also have some understanding of hypothesis testing and bootstrap analysis.

 ## Dataset Story
* userid: A unique number that identifies each player.

* version: Whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40).

* sum_gamerounds: The number of game rounds played by the player during the first 14 days after install.

* retention_1: Did the player come back and play 1 day after installing?

* retention_7: Did the player come back and play 7 days after installing?

# Importing necessary libraries and settings

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# !pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest
import scipy.stats as stats

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.5f' % x)
pd.set_option('display.width', 500)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/mobile-games-ab-testing/cookie_cats.csv


# Reading & Understanding Dataset

In [2]:
df_ = pd.read_csv("/kaggle/input/mobile-games-ab-testing/cookie_cats.csv")
df = df_
def check_df(dataframe):
    print("##################### First 10 Observations #####################")
    print(dataframe.head(10))
    print("##################### Column names #####################")
    print(df.info())
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Quantiles #####################")
    print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("##################### Types #####################")
    print(dataframe.dtypes)
check_df(df)

##################### First 10 Observations #####################
   userid  version  sum_gamerounds  retention_1  retention_7
0     116  gate_30               3        False        False
1     337  gate_30              38         True        False
2     377  gate_40             165         True        False
3     483  gate_40               1        False        False
4     488  gate_40             179         True         True
5     540  gate_40             187         True         True
6    1066  gate_30               0        False        False
7    1444  gate_40               2        False        False
8    1574  gate_40             108         True         True
9    1587  gate_40             153         True        False
##################### Column names #####################
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   useri

In [3]:
df.groupby("version").agg({"sum_gamerounds":["count", "median", "mean", "std", "max"],
                          "retention_1":["count", "median", "mean", "std", "max"],
                          "retention_7":["count", "median", "mean", "std", "max"]})

Unnamed: 0_level_0,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,retention_1,retention_1,retention_1,retention_1,retention_1,retention_7,retention_7,retention_7,retention_7,retention_7
Unnamed: 0_level_1,count,median,mean,std,max,count,median,mean,std,max,count,median,mean,std,max
version,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
gate_30,44700,17.0,52.45626,256.71642,49854,44700,0.0,0.44819,0.49731,True,44700,0.0,0.1902,0.39246,True
gate_40,45489,16.0,51.29878,103.29442,2640,45489,0.0,0.44228,0.49666,True,45489,0.0,0.182,0.38585,True


We get rid of outliers

In [4]:
# Summary Stats: sum_gamerounds
df.describe([0.01, 0.05, 0.10, 0.20, 0.80, 0.90, 0.95, 0.99, 0.999, 0.9999])[["sum_gamerounds"]].T

Unnamed: 0,count,mean,std,min,1%,5%,10%,20%,50%,80%,90%,95%,99%,99.9%,99.99%,max
sum_gamerounds,90189.0,51.87246,195.05086,0.0,0.0,1.0,1.0,3.0,16.0,67.0,134.0,221.0,493.0,1073.624,2012.9508,49854.0


In [5]:
df.sum_gamerounds.value_counts().sort_index(ascending=False).head(20)

49854    1
2961     1
2640     1
2438     1
2294     1
2251     1
2156     1
2124     1
2063     1
2015     1
1906     1
1816     1
1714     1
1705     1
1697     1
1687     1
1667     1
1643     2
1573     1
1559     1
Name: sum_gamerounds, dtype: int64

In [6]:
df = df[df.sum_gamerounds < df.sum_gamerounds.max()]



In [7]:
df["version"] = np.where(df["version"] == "gate_30", "A", "B")
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,A,3,False,False
1,337,A,38,True,False
2,377,B,165,True,False
3,483,B,1,False,False
4,488,B,179,True,True


we tag versions as A and B for hypothesis testing

# A/B Testing

### Assumptions:

- Check normality
- If Normal Distribution, check homogeneity

### Steps:
- Split & Define Control Group & Test Group
- Apply Shapiro Test for normality
- If parametric apply Levene Test for homogeneity of variances
- If Parametric + homogeneity of variances apply T-Test
- If Parametric - homogeneity of variances apply Welch Test
- If Non-parametric apply Mann Whitney U Test directly

In [8]:
df.groupby("version").agg({"sum_gamerounds":["count", "median", "mean", "std", "max"],})

Unnamed: 0_level_0,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds,sum_gamerounds
Unnamed: 0_level_1,count,median,mean,std,max
version,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
A,44699,17.0,51.34211,102.0576,2961
B,45489,16.0,51.29878,103.29442,2640


Version A: if the player was put in the control group at a gate at level 30  
Version B: if the player was put in the control group at a gate at level 40

#### H0: M1 = M2 (There is no statiscally difference in the average number of game rounds played by the player during the first 14 days after install between Version A and Version B.)

#### H1: M1 != M2 (There is statiscally difference in the average number of game rounds played by the player during the first 14 days after install between Version A and Version B.)



In [9]:
def hypothesis_testing(dataframe, group, target):
    groupA = dataframe[dataframe[group] == "A"][target]
    groupB = dataframe[dataframe[group] == "B"][target]


    test_stat_A, pvalue_A = shapiro(groupA)
    print('GroupA: Test Stat = %.10f, p-value = %.10f' % (test_stat_A, pvalue_A))

    test_stat_B, pvalue_B = shapiro(groupB)
    print('GroupB: Test Stat = %.10f, p-value = %.10f' % (test_stat_B, pvalue_B))

    if pvalue_A >= 0.05 and pvalue_B >= 0.05:
        test_stat_lev, pvalue_lev = levene(groupA, groupB)
        print('Levene: Test Stat = %.10f, p-value = %.10f' % (test_stat_lev, pvalue_lev))
        if leveneTest >= 0.05:
            test_stat_t, pvalue_t = ttest_ind(groupA, groupB, equal_var=True)
            print('Ttest: Test Stat = %.10f, p-value = %.10f' % (test_stat_t, pvalue_t))
            if pvalue_t >= 0.05:
                print(f"{pvalue_t} !< 0.05, We cannot reject H0")
            else:
                print(f"{pvalue_t} < 0.05, We reject H0")
        else:
            test_stat_t, pvalue_t = ttest_ind(groupA, groupB, equal_var=False)
            print('Ttest: Test Stat = %.10f, p-value = %.10f' % (test_stat_t, pvalue_t))
            if pvalue_t >= 0.05:
                print(f"{pvalue_t} !< 0.05, We cannot reject H0")
            else:
                print(f"{pvalue_t} < 0.05, We reject H0")
    else:
        test_stat_man, pvalue_man = mannwhitneyu(groupA, groupB)
        print('Mann: Test Stat = %.4f, p-value = %.4f' % (test_stat_man, pvalue_man))
        if pvalue_man >= 0.05:
            print(f"{pvalue_man} ≮  0.05, We fail reject H0")
        else:
            print(f"{pvalue_man} < 0.05, We reject H0")

In [10]:
hypothesis_testing(df, "version", "sum_gamerounds")



GroupA: Test Stat = 0.4886442423, p-value = 0.0000000000
GroupB: Test Stat = 0.4825654030, p-value = 0.0000000000
Mann: Test Stat = 1024285761.5000, p-value = 0.0509
0.05089155279145376 ≮  0.05, We fail reject H0


# Summary:
##### Failing to reject H0 in A/B testing indicates that there is not enough evidence to suggest that there is a statistically significant difference between the two groups being compared. This can be interpreted as evidence that the treatment or change being tested did not have a significant impact on the outcome being measured, and that any observed differences could be due to random chance or variability in the data.  
