INSTALLING AND IMPORTING REQUIRED LIBRARIES

In [None]:
# !pip install statsmodels

In [None]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)


## Reading and Examining the Dataset

In [None]:
df = pd.read_csv("/kaggle/input/mobile-games-ab-testing-cookie-cats/cookie_cats.csv")

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.nunique()

In [None]:
 df.describe().T

In [None]:
df.groupby("version")["sum_gamerounds"].mean()

In [None]:
df.groupby

In [None]:
df.groupby("version")["retention_1"].value_counts()

In [None]:
df.groupby("version")["retention_7"].value_counts()

## Retention_1 and Retention_7 Rates

In [None]:
retention_map = {False:0, True: 1}
df["retention_1"] = df["retention_1"].map(retention_map)
df["retention_7"] = df["retention_7"].map(retention_map)

In [None]:
df.groupby("version")["retention_1"].mean()

In [None]:
df.groupby("version")["retention_7"].mean()

# ESTABLISHING HYPOTHESES

HO: M1 = M2

There is no significiantly statistically difference between GATE 30 AND GATE 40 GROUPS according to sum of gamerounds.

H1: M1 != M2
... There is.

CONTROL OF ASSUMPTIONS
1. Assumption of Normality (Shapiro Test)

HO: M1 = M2

H1: M1 != M2

In [None]:
ttest_stats, pvalue = shapiro(df.loc[df["version"] == 'gate_30', "sum_gamerounds"])
print('Test_stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

In [None]:
ttest_stats, pvalue = shapiro(df.loc[df["version"] == 'gate_40', "sum_gamerounds"])
print('Test_stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

Pvalue < 0.05. So, null Hypothesis (HO) can reject. As a result, we need to use Mannwhitneyu test.

2. Checking Homogeneity of Variance (Levene Test):

HO: M1 = M2 : Variances are homogeneous.

H1: M1 != M2 : Variances are not homogeneous.

Since the normality assumption is not met, we do not actually need to perform this test. But I show it for information.

In [None]:
ttest_stats, pvalue = levene(df.loc[df["version"] == 'gate_30', "sum_gamerounds"],
                              df.loc[df["version"] == 'gate_40', "sum_gamerounds"])
print('Test_stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

The homogeneity of variances hypothesis (HO) could not be rejected.

H0 is rejected. Because n both cases, p-value < 0.05. The normality assumption was rejected because it was less than 0.05. Therefore, we must use mannwhitneyu test.

In [None]:
ttest_stats, pvalue = mannwhitneyu(df.loc[df["version"] == 'gate_30', "sum_gamerounds"],
                              df.loc[df["version"] == 'gate_40', "sum_gamerounds"])
print('Test_stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

Pvalue is greater than 0.05, even if it is a very small amount. So, H0 cannot reject.

There is no statistically significant difference between the 2 groups.

However, since the pvalue is very close to 0.05, continuing the experiment will allow us to get better results.

Maybe we should do a test on retention.

In [None]:
df.groupby("version")["retention_7"].mean()

In [None]:
df.groupby("version")["retention_7"].value_counts()

In [None]:
gate_30_retention_7_succ = df.loc[(df["version"] == 'gate_30') & (df["retention_7"] == 1), 'retention_7' ].count()
gate_40_retention_7_succ = df.loc[(df["version"] == 'gate_40') & (df["retention_7"] == 1), 'retention_7' ].count()
gate_30_nobs = df.loc[df["version"] == 'gate_30' ].shape[0]
gate_40_nobs = df.loc[df["version"] == 'gate_30' ].shape[0]


In [None]:
ttest_stats, pvalue = proportions_ztest(count= [gate_30_retention_7_succ, gate_40_retention_7_succ],
                                   nobs= [gate_30_nobs, gate_40_retention_7_succ])
print('Test_stats= %.4f , Pvalue= %.4f' % (ttest_stats, pvalue))

Pvalue < 0.05. H0 was rejected.

As a result, Although there was no statistically significant difference in the total number of game rounds, a statistically significant difference was found when retention data was tested after 7 days.

The conversion rate of Gate_30 was higher than Gate_40.