# Data analysis for King  
## Analyzing Super Math Saga Datasets

This AB test was aimed to assess whether or not increasing the difficulty in selected game's levels and offering a second chance (in case of failure) for a small amount of currency improves the overal "quality" of the game as the company understand it, increasing revenue without sensibly affecting user's engagement. (*this was inferred*)

In order to fulfill this objective a modification in the game was proposed and it had to be tested to understand its effects.  The test offererd two diﬀerent game experiences that we called A and B, group A being the control group where the experience is kept as is, and group B being the experiment group that is exposed to the new experience.  
We set the assignment process to randomly distribute players among the groups: 80% to group A (control) and 20% to group B (test). The experiment ran from 2017-05-04 to 2017-05-22.

The key metrics for this test were revenue and engagement reflected user purchases per day game rounds ended per day. The first we wanted to increase and the second to maintain.

This report is structured as follows:  
1. Exploratory Data Analysis: Data integrity first glance insights.  
2. Statistical Analysis.  
3. Insights.
4. Final conclusions and recommendations.

## 1. Exploratory Data Analysis

In [1]:
#Importing the libraries and reading the whole datasets to analyze
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics
pd.set_option('display.float_format', lambda x: '%.3f' % x)
import google.cloud.bigquery.magics

google.cloud.bigquery.magics.context.use_bqstorage_api = True
%load_ext google.cloud.bigquery

In [3]:
%%bigquery assignment
SELECT
  playerid,
  abtest_group,
  assignment_date,
  install_date,
  conversion_date
FROM
  `king-ds-recruit-candidate-305`.abtest.assignment ;

In [4]:
assignment

Unnamed: 0,playerid,abtest_group,assignment_date,install_date,conversion_date
0,4075399,A,2017-05-04,2016-02-10,
1,8656643,A,2017-05-04,2016-03-26,
2,19536870,A,2017-05-04,2016-07-14,2016-07-19
3,3207631,A,2017-05-04,2016-02-01,
4,31527808,A,2017-05-04,2016-11-10,
...,...,...,...,...,...
10331051,51066353,B,2017-05-22,2017-05-22,
10331052,51142855,B,2017-05-22,2017-05-22,
10331053,51100144,B,2017-05-22,2017-05-22,
10331054,51043351,B,2017-05-22,2017-05-22,


In [3]:
%%bigquery activity
SELECT
  activity.playerid,
  activity.activity_date,
  activity.purchases,
  activity.gameends
FROM
  `king-ds-recruit-candidate-305`.abtest.activity
;

KeyboardInterrupt: 

In [None]:
activity

The datasets were extracted from two BigQuery tables. One was the assignment table which contained useful information regarding players and the other one was the activity table focused in daily activity records of the players. This activity table contains the observations for the AB test.

Assignment table exploration:  
The assignment table contains players assigned to the A/B test and attributes related to each player.  
• playerid: Unique numeric identiﬁer for each player  
• abtest_group: The group the player was assigned to (A or B)  
• assignment_date: The date when the player was assigned to the test  
• install_date: The date when the player installed the game  
• conversion_date: The date when the player made their ﬁrst purchase  

No duplicated userids, 10.331.056 total players in selected for the test.   
No nulls except for conversions field which was expected.  
Assignment date was string type not datatime type.  
All userids were assigned to a test group.  
No assignment or conversion dates prior to install date.  
2.859.23 players with conversions.

Activity table exploration:  
The activity table contains the test observations:  
• playerid: Unique numeric identiﬁer for each player  
• activity_date: The date of activity  
• purchases: Number of purchases made this day  
• gameends: Number of gamerounds played this day

There are 214.878.701 records of which none have null values and 90.625.439 are dated before the start of the test.

To proceed with deeper analysis I had to join the two tables to extract more complete information.  
So far the data was complete, reliable and clean. 



In [8]:
%%bigquery main_dataset
SELECT
  activity.playerid,
  activity.activity_date,
  activity.purchases,
  activity.gameends,
  assignment.abtest_group AS ABGroup,
  assignment.install_date,
  assignment.conversion_date
FROM
  `king-ds-recruit-candidate-305`.abtest.activity
LEFT JOIN
  `king-ds-recruit-candidate-305`.abtest.assignment
ON
  activity.playerid = assignment.playerid
;

KeyboardInterrupt: 

In [None]:
main_dataset

Main_dataset is the product of left joining Activity table with Assignment table. This way I can have usefull information attached the test observations.

This dataset contained 214878701 observations of which 90625439 observations were previous to the test start.  
The actual AB test observations were 124253262 split into the control group A (75%) with 99419615 observations and the test group B (25%) with 24833647 observations. These group observations were evenly distributed over 19 days (AB test length).

 

In [4]:
%%bigquery pre_test_activity
SELECT 
  playerid,
  activity_date,
  purchases,
  gameends,
  ABgroup,
  install_date,
  conversion_date
FROM `king-ds-recruit-candidate-305.abtest.main_dataset`
WHERE activity_date < '2017-05-04'
;

KeyboardInterrupt: 

In [None]:
pre_test_activity

In [3]:
%%bigquery Agroup
SELECT 
  playerid,
  activity_date,
  purchases,
  gameends,
  ABgroup,
  install_date,
  conversion_date
FROM `king-ds-recruit-candidate-305.abtest.main_dataset`
WHERE activity_date >= '2017-05-04' AND  ABgroup = 'A'
;

KeyboardInterrupt: 

In [None]:
Agroup

In [None]:
%%bigquery Bgroup
SELECT 
  playerid,
  activity_date,
  purchases,
  gameends,
  ABgroup,
  install_date,
  conversion_date
FROM `king-ds-recruit-candidate-305.abtest.main_dataset`
WHERE activity_date >= '2017-05-04' AND  ABgroup = 'B'

In [None]:
Bgroup

## 2. Statistical Analysis

### Baseline metrics:  
- Average purchases prior to the test
- Average gameends prior to the test
- Standar deviation of population purchases (since the sample is big enough I'll use sample's standar deviation prior to the test)
- Standar deviation of population gameends (since the sample is big enough I'll use sample's standar deviation prior to the test)

In [None]:
print('''Average purchases prior to the test: {}
Average gameends prior to the test: {}
Standard deviation of population purchases: {}
Standard deviation of population purchases: {}
'''.format(np.mean(pre_test_activity.purchases), np.mean(pre_test_activity.gameends), statistics.stdev(pre_test_activity.purchases),  statistics.stdev(pre_test_activity.gameends)))

### Test metrics:  
- Sample size Bgroup
- Average purchases Agroup
- Average purchases Bgroup
- Average gameends Agroup
- Average gameends Bgroup

In [None]:
print('''Sample size Bgroup: {}  
Average purchases Bgroup: {}
Average gameends Bgroup: {}
Standard deviation Bgroup purchases: {}
Standard deviation Bgroup gameends: {}
Average purchases Agroup: {}
Average gameends Agroup: {}
'''.format(Bgroup.count(), np.mean(Bgroup.purchases), np.mean(Bgroup.gameends), statistics.stdev(Bgroup.purchases), statistics.stdev(Bgroup.gameends), np.mean(Bgroup.purchases), np.mean(Agroup.gameends)))

### Hypothesis testing

Before extracting insights from the A/B test results I conducted an Hypothesis testing to understand if the effects seen in the test results were statistically significant.  

I set a significance level of 5% so the risk of a false positive would be very unlikelly. The test was a one-tail test, meaning that the rejetion region for the Null Hypothesis goes from the critical value to the infinite.

Hypothesis for purchases:

The Null Hypothesis, the one that states that the effect shown in AB test results is mere chance, says as follows:  
- The average purchase metric for the Super Math Saga game is 0.03061 so an increase of average purchases is product of the variability of data and chance.   

The Alternative Hypothesis, the one that allowes me to confidently extract insights of the AB test results, says as follows:  
- The introduction of a new feature in the Super Math Saga produced an increase in the average purchase metric setting it above 0.03061.

$$ Null Hypothesis = H_0: \mu = 0.03061$$  
$$ Alternative Hypothesis = H_1: \mu > 0.03061$$

So the rejection region for the Null Hypothesis started at z= 1.64 wich was the zscore value in the table of normal distribution acumulatin an area of 0.5 to the right.  
If the Test Statistic was greater than the z-value, thus falling in the rejection region, the Null Hypothesis would be rejected and so the AB test results would be statistically significant.

$$Test Statistic = z= \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}}$$


$$\ \overline{X} = 0.03265  
\ \mu = 0.03061  
\ \sigma = 0.7715  
\ n = 24833647$$

In [37]:
z = (0.03265-0.03061)/(0.7715/(np.sqrt(24833647)))
print(z)

13.176937583689883


With the test statistic value falling inside the rejection region, the Null Hypothesis was rejected and the Alternative Hypothesis was accepted meaning that the tested effect was statistically significant.
- The introduction of a new feature in the Super Math Saga produced an increase in the average purchase metric setting it above 0.03061.

Hypothesis for gameends:  

The Null Hypothesis, the one that states that the effect shown in AB test results is mere chance, says as follows:  
- The average gameends metric for the Super Math Saga game is 13.1803 and a decrease of average gameends is product of the variability of data and chance.  

The Alternative Hypothesis, the one that allowes me to confidently extract insights of the AB test results, says as follows:  
- The introduction of a new feature in the Super Math Saga produced a decrease in the average gameends metric setting it below 13.1803.

$$ Null Hypothesis = H_0: \mu = 13.1803$$  
$$ Alternative Hypothesis = H_1: \mu < 13.1803 $$

So the rejection region for the Null Hypothesis started at z= -1.64 wich was the value in the zscore table of normal distribution which acumulates an area of 0.5 to the left.  
If the Test Statistic was greater than the z-value, thus falling in the rejection region, the Null Hypothesis was rejected and the AB test results would be statistically significant.

$$Test Statistic = z= \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}}$$


$$\ \overline{X} = 12.9323  
\ \mu = 13.1803  
\ \sigma = 10.2361  
\ n = 24833647$$

In [38]:
(12.9323-13.1803)/(10.2361/(np.sqrt(24833647)))

-120.73617487984009

With the test statistic value falling inside the rejection region, the Null Hypothesis was rejected and the Alternative Hypothesis was accepted meaning that the tested effect was statistically significant.
- The introduction of a new feature in the Super Math Saga produced a decrease in the average gameends metric setting it below 13.1803.

To conclude, it seems both effects produced by the experience modification of the game in group B are strong enough to consider them.

**Important disclaimer**   
The results of this hypothesis testing are arguably not conclusive because the distribution of the population's data wasn't normal as the purchase data presented extreme skewness. 
Aditional transformation of the data or more specific tests should be made in order to produce more conclusive results.
For this exercise I'll assume the distributions were normal since the sample sizes were big enough and I think a more complicated analysis would fall outside the scoope of the test for my candidacy.

## 3. Insights

Now that I've concluded that the effects of the A/B test are statistically significant I can proceed to extract insights from them.  
For this I used the visualization software Tableau Desktop connected to the BigQuery server.

### Purchases metric 
The new game experience had a noticeable effect on the purchase metric. **B Group average purchase is above baseline average by 6.6%** (0.03265/0.03061= 1.066). Control group was slightly below baseline average by 0.5%.  

Interestingly, disaggregating the metric on a daily basis showed that the effect was stronger on the first half of the AB test and lost strength throughout the secong half falling near the daily baseline average. Control group stayed near the baseline  daily average purchase as expected.

![](Dashboard1.png)

The effect could also be noticed comparing the total daily purchases within group B before and after the start of the AB test. Showing an **13% increase** from daily totals around 37645 to daily totals around 42675. Control group showed no significant change as expected.

![](Dashboard1.3.png)

### On the side of game ends   
There was an noticeable effect in engagement after the introduction of the new game experience in group B. The **average game ends droped 1.83%**, from 13.1803 baseline metric to 12.9324 B group average game ends.

By disaggregating on a daily basis was noticeable that the daily average game ends metric droped rapidly since the introduction of the new experience to the group B. Control group stayed near the baseline value.



![](Dashboard2.2.png)

### Segmented by player aging (install date)  
A possible segmentation of players could have been by seniority (aging) since the installation day. 

On the side of purchases the same pattern of rapid increase at start and then slow remision to the baseline happened in the three older groups. Nonetheless, taking into account total average the "Mature players" group showed higher receptivity to the new experience, being the group with highest average purchase. Control group stayed near the baseline except for the oldest segment which seemed to be a little farther from the baseline but not enough to consider it significant.

![](Dashboard3.png)

The segmentation for engagement didn't show significant variations throughout the segments except for "Fresh" players. A possible explanation for this could have been that very new players (less than a week) didn't have a comparison point in the past, so they couldn't notice the effect negative effect of sudden increased difficulty. This assumption should be investigated in more dept.

![](Dashboard4.png)

## 4. Final conclusion and recomendations

At first sight, the new experience had a positive effect on the game, as the increase on the revenue was greater than the decrease in engagement.  
Nonetheless, it is possible that the detected revenue effect could dissipate over time. For instance, the initial increase in purchases may be due to the player's delay in adjusting to the new increased difficulty. After some time, the players could learn to play better and stop purchasing help to continue playing.  
On the other hand, the negative effect on engagement seemed to be more persistent on the long run. If proven, the decision to use the group B experience wouldn't be correct. To infer if these effects could be long lasting I would increase the length of the AB test.  
Deeper investigation will be needed to asses this intuitions.  
In any case, I recommend performing the test for a greater number of days to better understand the duration of the effects produced by the treatment. In addition, perform a 50/50 split for test and control group to facilitate the comparison of total quantities.

Perhaps giving the option for watching a short add instead of paying currency could lessen the negative impact in engagement. There is a huge majority of casual players who will avoid investing real money in a game but are used to watch advertisements constantly and that won't be an issue to them. Besides, a dollar for a south american player has a higher value relatively to a USA player or an Australian player, but watching an add just cost them seconds, a currency that given the case could be perceived as costless in comparison to real life money.