## Digital Advertising A/B Testing Result Analysis
Jing Li

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats

In [2]:
gamefun = pd.read_excel("GameFun.xlsx")
gamefun.head()

Unnamed: 0,id,test,purchase,site,impressions,income,gender,gamer
0,1956,0,0,site1,0,100,1,0
1,45821,1,0,site1,20,70,1,0
2,59690,1,0,site1,22,100,1,0
3,18851,0,0,site1,13,90,1,0
4,60647,1,0,site1,12,60,1,0


In [3]:
gamefun.describe()

Unnamed: 0,id,test,purchase,impressions,income,gender,gamer
count,40048.0,40048.0,40048.0,40048.0,40048.0,40048.0,40048.0
mean,33965.738564,0.701433,0.064697,66.115137,55.006243,0.647473,0.601478
std,19485.640324,0.457635,0.245994,95.188408,13.718012,0.477763,0.4896
min,1.0,0.0,0.0,0.0,20.0,0.0,0.0
25%,17133.75,0.0,0.0,4.0,50.0,0.0,0.0
50%,33968.5,1.0,0.0,20.0,50.0,1.0,1.0
75%,50797.25,1.0,0.0,88.0,60.0,1.0,1.0
max,67747.0,1.0,1.0,403.0,140.0,1.0,1.0


### 1. Before evaluating the effect of an experiment, it is important to make sure that the experiment was executed correctly. Check whether the test and control groups are probabilistically equivalent on their observables.

In [4]:
table_1 = pd.pivot_table(gamefun, values=['income','gender','gamer'], columns=['test'], aggfunc=np.mean)
# Calculate the differece in percentage(*100)
table_1['% diff'] = ((table_1[1] - table_1[0]) / table_1[0])
table_1

test,0,1,% diff
gamer,0.601823,0.601331,-0.000817
gender,0.647905,0.647289,-0.00095
income,55.166012,54.938236,-0.004129


In [5]:
# T-test for income
t_stat, p_val = stats.ttest_ind(gamefun[gamefun['test']==0]['income'], gamefun[gamefun['test']==1]['income'], equal_var=True)
print("t score is: ",t_stat)
print("p value is: ",p_val)
if p_val < 0.05:
    print("There is statistically significant difference in average income between test and control group.")
else:
    print("There is no statistically significant difference in average income between test and control group.")

t score is:  1.520640253683462
p value is:  0.1283580345995143
There is no statistically significant difference in average income between test and control group.


In [6]:
# T-test for gender
t_stat, p_val = stats.ttest_ind(gamefun[gamefun['test']==0]['gender'], gamefun[gamefun['test']==1]['gender'], equal_var=True)
print("t score is: ",t_stat)
print("p value is: ",p_val)
if p_val < 0.05:
    print("There is statistically significant difference in average gender between test and control group.")
else:
    print("There is no statistically significant difference in average gender between test and control group.")

t score is:  0.11804408014871089
p value is:  0.906033323148871
There is no statistically significant difference in average gender between test and control group.


In [7]:
# T-test for gamer
t_stat, p_val = stats.ttest_ind(gamefun[gamefun['test']==0]['gamer'], gamefun[gamefun['test']==1]['gamer'], equal_var=True)
print("t score is: ",t_stat)
print("p value is: ",p_val)
if p_val < 0.05:
    print("There is statistically significant difference in average gamer between test and control group.")
else:
    print("There is no statistically significant difference in average gamer between test and control group.")

t score is:  0.09199349089131977
p value is:  0.9267036713286598
There is no statistically significant difference in average gamer between test and control group.


#### Summary

Let's take income as an example. The average income is 55.17k and 54.94k for control and test group respectively and the percentage differece is only 0.41% which indicates the two groups are very similar regarding to income. In order to verify the two groups of people are really statistically similar in income, I used two sample t-test to test the hypothesis that the average income of the two groups are equal. The result tells p value is greater than 0.05(significance level), we can not reject the null hypothesis which means I accept it.

It is the same for gender and gamer.

So the result shows that in terms of these three aspects, different customers have equivalent chance to be selected in test or control group, which means we have probabilistic equivalent sample for the two groups. It guarantees the accuracy and efficiency of the A/B test.

### 2. Evaluate the average purchase rates in the test and control for the following groups. For each comparison, report the average purchase rate for the test, average purchase rate for the control and the absolute difference between the test and control.

#### a. Comparison 1: All customers

In [8]:
table_2 = pd.pivot_table(gamefun, values=['purchase'], columns=['test'], aggfunc=np.mean)
table_2['diff'] = table_2[1] - table_2[0]
table_2

test,0,1,diff
purchase,0.036213,0.076822,0.040609


#### b. Comparison 2: Male vs Female customers

In [9]:
table_3 = pd.pivot_table(gamefun, index=['gender'] ,values=['purchase'], columns=['test'], aggfunc=np.mean)['purchase']
table_3['diff'] = table_3[1] - table_3[0]
table_3

test,0,1,diff
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.034442,0.080945,0.046503
1,0.037176,0.074575,0.037399


#### c. Comparison 3: Gamers vs Non-Gamers Customers

In [10]:
table_4 = pd.pivot_table(gamefun, index=['gamer'] ,values=['purchase'], columns=['test'], aggfunc=np.mean)['purchase']
table_4['diff'] = table_4[1] - table_4[0]
table_4

test,0,1,diff
gamer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.037387,0.035092,-0.002295
1,0.035436,0.104487,0.069051


#### d. Comparison 4: Female Gamers vs Male Gamers

In [11]:
table_5 = pd.pivot_table(gamefun[gamefun['gamer']==1], index=['gender'] ,values=['purchase'], columns=['test'], aggfunc=np.mean)['purchase']
table_5['diff'] = table_5[1] - table_5[0]
table_5

test,0,1,diff
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.032041,0.110092,0.078051
1,0.037275,0.101404,0.064129


### 3. Assess the expected revenue in the test vs. control for the following comparisons

#### a. Comparison 1: All customers

In [12]:
# For the purchasers in test group, the company has a cost of $25 for each one. 
# So the revenue of each test purchaser is 12.5.
# The revenue of each control purchaser is 37.5 since they don't have the $25 credits.
# When calculating the relative increase percentage, I standardized the revenue according to the number of users in test and control group.
test_rev = gamefun[(gamefun['test']==1)]['purchase'].sum() * 12.5
control_rev = gamefun[(gamefun['test']==0)]['purchase'].sum() * 37.5
print("The expected total revenue of test group is",test_rev)
print("The expected total revenue of control group is",control_rev)
print("The relative percentage increase is", round((test_rev/7-control_rev/3)/(control_rev/3)*100,2) ,"%")

The expected total revenue of test group is 26975.0
The expected total revenue of control group is 16237.5
The relative percentage increase is -28.8 %


#### b. Comparison 4: Female Gamers vs Male Gamers

In [13]:
# Female Gamers
test_rev_fg = gamefun[(gamefun['test']==1) & (gamefun['gender']==0) & (gamefun['gamer']==1)]['purchase'].sum() * 12.5
control_rev_fg = gamefun[(gamefun['test']==0) & (gamefun['gender']==0) & (gamefun['gamer']==1)]['purchase'].sum() * 37.5
print("The expected total revenue of female gamers in test group is ",test_rev_fg)
print("The expected total revenue of female gamers in control group is ",control_rev_fg)
print("The relative percentage increase is", round((test_rev_fg/7-control_rev_fg/3)/(control_rev_fg/3)*100,2) ,"%")

The expected total revenue of female gamers in test group is  8250.0
The expected total revenue of female gamers in control group is  3037.5
The relative percentage increase is 16.4 %


In [14]:
# Male Gamers
test_rev_mg = gamefun[(gamefun['test']==1) & (gamefun['gender']==1) & (gamefun['gamer']==1)]['purchase'].sum() * 12.5
control_rev_mg = gamefun[(gamefun['test']==0) & (gamefun['gender']==1) & (gamefun['gamer']==1)]['purchase'].sum() * 37.5
print("The expected total revenue of male gamers in test group is ",test_rev_mg)
print("The expected total revenue of male gamers in control group is ",control_rev_mg)
print("The relative percentage increase is", round((test_rev_mg/7-control_rev_mg/3)/(control_rev_mg/3)*100,2) ,"%")

The expected total revenue of male gamers in test group is  13812.5
The expected total revenue of male gamers in control group is  6525.0
The relative percentage increase is -9.28 %


### 4. Recommendations to the management team summarizing the expected financial outcome for GameFun

Yes, Game Fun should run the promotion in the future but they should only focus on female gamers. 

According to the A/B test result, the purchase rate increased 4% after promotion for all customers. It is more than doubbled than before. However, in terms of the expected revenue, it decreased around 28.8% which indicates the promotion ads is not a profitable way.

From the analysis result from Q2, the promotion effect on purchase rate of male vs. female is similar. But for gamers vs. non-gamers, the t-c difference for non-gamers is negative which means the promotion actually discouraged them to purchase. It's understandable that people don't play games feels annoyed when we show too much game ads to them. 

Then I explored the male gamers and female gamers additionally. It turns out the promotion effect on purchase rate for the two group are similar(6.4% vs. 7.8%). However for expected revenue, the promotion ads result in 16.4% increase for female gamers but 9.28% decrease for male gamers. Consequently, Game-Fun should focus on female gamers for promotion ads in the future.

### 5. Further compare the two site on keeping promotion ads for female gamers

#### a. Calculate the purchase rate in the two sites for female gamers in Test and Control group seperately

In [15]:
table_6 = pd.pivot_table(gamefun[(gamefun['gamer']==1) & (gamefun['gender']==0)], index=['site'] ,values=['purchase'], columns=['test'], aggfunc=np.mean)['purchase']
table_6['diff'] = table_6[1] - table_6[0]
table_6

test,0,1,diff
site,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
site1,0.049261,0.146761,0.0975
site2,0.028746,0.102856,0.07411


#### b. The expected revenue for the two site

In [16]:
test_rev_1 = gamefun[(gamefun['test']==1) & (gamefun['site']=='site1') & (gamefun['gamer']==1) & (gamefun['gender']==0)]['purchase'].sum() * 12.5
control_rev_1 = gamefun[(gamefun['test']==0) & (gamefun['site']=='site1') & (gamefun['gamer']==1) & (gamefun['gender']==0)]['purchase'].sum() * 37.5
print("The expected revenue of site 1 female gamers in test group is ",test_rev_1)
print("The expected revenue of site 1 female gamers in control group is ",control_rev_1)
print("The relative percentage increase is", round((test_rev_1/7-control_rev_1/3)/(control_rev_1/3)*100,2) ,"%")

The expected revenue of site 1 female gamers in test group is  1812.5
The expected revenue of site 1 female gamers in control group is  750.0
The relative percentage increase is 3.57 %


In [17]:
test_rev_2 = gamefun[(gamefun['test']==1) & (gamefun['site']=='site2') & (gamefun['gamer']==1) & (gamefun['gender']==0)]['purchase'].sum() * 12.5
control_rev_2 = gamefun[(gamefun['test']==0) & (gamefun['site']=='site2') & (gamefun['gamer']==1) & (gamefun['gender']==0)]['purchase'].sum() * 37.5
print("The expected revenue of site 2 female gamers in test group is ",test_rev_2)
print("The expected revenue of site 2 female gamers in control group is ",control_rev_2)
print("The relative percentage increase is", round((test_rev_2/7-control_rev_2/3)/(control_rev_2/3)*100,2) ,"%")

The expected revenue of site 2 female gamers in test group is  6437.5
The expected revenue of site 2 female gamers in control group is  2287.5
The relative percentage increase is 20.61 %


#### c. Conclusion for comparing the two site

As we can see, the promotion effect on the purchase rate for female gamers(as we have decided to focus on female gamers) in the two sites is 9.7% and 7.4% respectively. In addition, the promotion has raised 3.57% and 20.61% for the expected revenue on site 1 and 2 respectively. We could say that the promotion ads for site 2 is more effective so we could focus more on this site. However, of course we should take the ads cost into consideration to evaluate ROI from both sites to make final desicion.