# AB Testing Project
I have found an A/B dataset on kaggle.com, I am going to perform some hypothesis testing to compare the A/B tests and derive meaningful insights. 

**Purpose of this notebook:**
- Demonstrate my understanding of statistics in a business context. 
- Demonstrate my understanding of python in a data analysis context.

A dataset from kaggle.com has been used for this analysis. 

# Project 
## A/B Testing with Cookie Cats Dataset
### Problem Statement
Cookie Cats is an online mobile game. The dev team have decided that they want to see how changing a feature will affect their players. The feature being changed is the in-game level at which the player meets a gate. They want to understand the effects of a player encountering the gate later in the game (level 40) compared to earlier (level 30). The purpose of this analysis is to investigate whether there is a statistically significant effect of gate placement on player retention.

### Data
In order to make it clear what the data is showing, a description of each field / column has been added: 
- userid: The UUID of a given player. 
- version: This field refers to the A/B group, and contains either 'gate_30' or 'gate_40' as a value, the number referring to the level. 
- sum_gamerounds: This is the total number of rounds of Cookie Cats completed by a player during the first 14 days after install. 
- retention_1: Boolean value whether or not a player came back to play 1 day after installing. 
- retention_7: Boolean value whether or not a player came back to play 7 days after installing. 

**What relationships can be explored in this data to answer the problem statement? **
1. Effect of 'version' on 'sum_gamerounds'. 
2. Effect of 'version' on 'retention_1'. 
3. Effect of 'version' on 'retention_7'. 

### Hypotheses 
Measuring the effect of version on sum_gamerounds: 
- H0: Gate placement at level 40 does not have an effect on the number of rounds completed by a user vs placement at level 30. 
- H1: Gate placement at level 40 does have an effect on the number of rounds completed by a user vs placement at level 30. 

Measuring the effect of version on retention_1: 
- H0: Gate placement at level 40 does not have an effect on the 1 day retention of a user vs level 30. 
- H1: Gate placement at level 40 does have an effect on the 1 day retention of a user vs level 30. 

Measuring the effect of version on retention_2: 
- H0: Gate placement at level 40 does not have an effect on the 7 day retention of a user vs level 30. 
- H1: Gate placement at level 40 does have an effect on the 7 day retention of a user vs level 30. 

### Analysis

In [29]:
#Import libs
import pandas as pd 
import matplotlib.pyplot as plt
import math

#import data 
df = pd.read_csv('cookie_cats.csv')

df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


In [7]:
df.shape

(90189, 5)

90,189 rows, 5 columns. 
Clean data, then split into A/B groups. 

In [25]:
#check for duplicated UIDs 
df.duplicated(subset=['userid']).any()

False

In [26]:
#Check for empty rows
df[df.isna().any(axis=1)]

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7


In [27]:
#Split the datasets
a30 = df[df['version'] == 'gate_30']
b40 = df[df['version'] == 'gate_40']

a30.shape

(44700, 5)

In [28]:
b40.shape

(45489, 5)

### Addressing hypothesis 1
Does the A/B group have an effect on the number of rounds completed? Concretely, is there a statistically significant difference between the number of rounds completed between groups?

**Approach**

Population statistics are unknown (mean, sd), therefore a t-test will be used to compare sample means. 
It is unknown if A should be better than B, or vice versa, therefore a 2 tailed test will be used. 

A standard confidence level on 95% will be used. 

**Formula**

t statistic = (meanA - meanB) / root( (sdA^2 / n of A ) + (sdB^2 / n of B))

In [31]:
#mean 
aMu = A30['sum_gamerounds'].mean()
bMu = B40['sum_gamerounds'].mean()
#Sample size
aN = A30.shape[0]
bN = B40.shape[0]
#StDev
aSigma = A30['sum_gamerounds'].std()
bSigma = B40['sum_gamerounds'].std()
#test statistic
tstat = (aMu - bMu) / math.sqrt( ((aSigma**2) / aN) + ((bSigma**2) / bN) )
tstat

0.8854374331270672

In [33]:
#degrees of freedom
degfreedom = min(aN, bN) - 1
degfreedom

44699

Looking up a critical value for a 2-tailed t test with df = 44699, alpha = 0.05 => 1.960

The value of 0.8854374331270672 is very well within the critical value and acceptance region. For this reason we can say with at least 95% confidence, (and actually far more than this), that the A/B test had no effect on the number of levels completed.

**We have failed to reject the null hypothesis.**

In [36]:
#This result is unsurprising, since the difference between the means  of the two groups is actually very small: 
aMu - bMu 

1.157488453953249

### Addressing hypothesis 2
Does the A/B group have a statistically significant effect on 1 day user retention? 

**Approach**

It is unknown if A should be better than B, or vice versa, therefore a 2 Tailed test will be used. 

A standard confidence level of 95% will be used. 

**Formula**

Z statistic = (proportionA - proportionB) / root( ((proportionA * 1-proportionA) / (aN)) +  ((proportionB * 1-proportionB) / (bN)) )

In [46]:
#sample sizes
aN
bN
#Proportions 
propA = a30[ a30['retention_1'] == True].shape[0] / aN
propB = b40[ b40['retention_1'] == True].shape[0] / bN
#inverse proportions
propnotA = 1-propA
propnotB = 1-propB
#Z Statistic 
(propA - propB) / math.sqrt( ((propA*propnotA) / aN) + ((propB*propnotB) / bN) )

1.7840972718297632

Looking this value up in a Z-table, this gives a probability of => 0.9625.

The threshold for a 95% confidence on a 2 tailed test is 1- (0.05/2) = 0.975

0.9625 < 0.975, therefore there is insufficient evidence to conclude with 95% certainty that the observed difference between sample proportions is a result of something other than chance. 

**We have failed to reject the null hypothesis**

In [48]:
#The difference between proportions in this case was: 
propA - propB
#only half a percent

0.005905169787341458

### Addressing hypothesis 3
The same approach to answering hypothesis 2 can be applied to answering hypothesis 3

In [49]:
#sample sizes
aN
bN
#Proportions 
propA2 = a30[ a30['retention_7'] == True].shape[0] / aN
propB2 = b40[ b40['retention_7'] == True].shape[0] / bN
#inverse proportions
propnotA2 = 1-propA2
propnotB2 = 1-propB2
#Z Statistic 
(propA2 - propB2) / math.sqrt( ((propA2*propnotA2) / aN) + ((propB2*propnotB2) / bN) )

3.1640640401490985

Looking this value up in a Z table gives a probability of => 0.9992.

the threshold for aa 95% confidence on a 2 tailed test is 1- (0.05/2) = 0.975

0.9992 > 0.975, therefore there is a less than 5% chance of the observed results occuring if the null hypothesis were true. 

**The null hypothesis is rejected, the alternative hypothesis is accepted. The placement of the gate does have a statistically significant effect on the 7 day retention of users.**  The observed effect is in the favour of A.