The company of this exercise is a social network. They decided to add a feature called: Recommended Friends, i.e. they suggest people you may know.

A data scientist has built a model to suggest 5 people to each user. These potential friends will be shown on the user newsfeed. At ﬁrst, the model is tested just on a random subset of users to see how it performs compared to the newsfeed without the new feature.

The test has been running for some time and your boss asks you to check the results. You are asked to check, for each user, the number of pages visited during their ﬁrst session since the test started. If this number increased, the test is a success.

# Index
* [Answer question 1](#Answer-question-1)
* [Answer question 2 and 3](#Answer-question-2-and-3)
    * [Browsers' impact](#Browsers'-impact)
    * [First time impact](#First-time-impact)
    * [Browser and 'First time' combined impact](#Browser-and-'First-time'-combined-impact)

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as ss

# Answer question 1
<span style='color:blue'>Is the test winning? That is, should 100% of the users see the Recommended Friends feature?</span>

In [2]:
tests = pd.read_csv("test_table.csv",index_col='user_id')
users = pd.read_csv("user_table.csv",index_col='user_id')

tests = tests.join(users)

tests['date'] = pd.to_datetime(tests.date)
tests['signup_date'] = pd.to_datetime(tests.signup_date)

In [3]:
tests.head()#glance

Unnamed: 0_level_0,date,browser,test,pages_visited,signup_date
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
600597,2015-08-13,IE,0,2,2015-01-19
4410028,2015-08-26,Chrome,1,5,2015-05-11
6004777,2015-08-17,Chrome,0,8,2015-06-26
5990330,2015-08-27,Safari,0,8,2015-06-25
3622310,2015-08-07,Firefox,0,1,2015-04-17


In [6]:
def run_ttest(df):
    vp_in_test = df.loc[tests.test == 1, 'pages_visited']
    test_mean = vp_in_test.mean()
    
    vp_in_ctrl = df.loc[tests.test == 0, 'pages_visited']
    ctrl_mean = vp_in_ctrl.mean()
    
    result = ss.ttest_ind(vp_in_ctrl, vp_in_test, equal_var=False)
    conclusion = 'Significant' if result.pvalue < 0.05 else 'Not Significant'
    
    return pd.Series({'n_test':vp_in_test.shape[0],
                      'n_ctrl': vp_in_ctrl.shape[0],
                      'mean_test': test_mean,
                      'mean_ctrl': ctrl_mean,
                      'test-ctrl': test_mean - ctrl_mean,
                      'pvalue':result.pvalue,
                      'conclusion':conclusion})

In [7]:
run_ttest(tests)

conclusion    Not Significant
mean_ctrl             4.60839
mean_test             4.59969
n_ctrl                  49846
n_test                  50154
pvalue               0.577452
test-ctrl         -0.00870091
dtype: object

according to above Hypothesis Test result, there is ** no significant improvement in test group **. 

# Answer question 2 and 3
* <span style='color:blue'>Is the test performing similarly for all user segments or are there diﬀerences among diﬀerent segments?</span>
* <span style='color:blue'>If you identiﬁed segments that responded diﬀerently to the test, can you guess the reason? Would this change your point 1 conclusions?</span>

In [8]:
tests['n_days_after_sign'] = (tests.date - tests.signup_date).dt.days
tests['first_time'] = (tests.n_days_after_sign == 0).astype(int)

## Browsers' impact

In [10]:
tests.groupby('browser').apply(run_ttest)

Unnamed: 0_level_0,conclusion,mean_ctrl,mean_test,n_ctrl,n_test,pvalue,test-ctrl
browser,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Chrome,Significant,4.613341,4.69068,21453,21974,0.0009434084,0.077339
Firefox,Significant,4.600164,4.714259,10972,10786,0.0005817199,0.114095
IE,Significant,4.598478,4.685985,10906,10974,0.007829509,0.087507
Opera,Significant,4.546438,0.0,1109,1018,2.253e-321,-4.546438
Safari,Not Significant,4.63818,4.692336,5406,5402,0.2411738,0.054156


from above result, we can see that, by applying this "Recommend Friend" feature
* #page_visited in Chrome, Firefox, IE are significantly increased.
* #page_visited in Opera has reduced to zero, <span style='color:orange;font-size:1.2em'>maybe there is some bug in implementation on Opera, which stops user visiting further pages.</span>
* #page_visited in Safari has no significant improvement, <span style='color:orange;font-size:1.2em'>maybe because the recommended friends aren't shown in a noticeable position.</span>

## First time impact

In [11]:
tests.groupby('first_time').apply(run_ttest)

Unnamed: 0_level_0,conclusion,mean_ctrl,mean_test,n_ctrl,n_test,pvalue,test-ctrl
first_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Not Significant,4.603284,4.622379,39890,40109,0.261837,0.019095
1,Significant,4.628867,4.509109,9956,10045,0.001742,-0.119758


above result shows:
* for old users, the new feature improve #page_visted, but the change isn't significant
* for first-time new users, ** this new feature significantly decrease #page_visited **.

this is a strange observation. Since I already suspect there is <span style='color:red;font-size:1.2em'>some bug in Opera's implementation (which reduce the visited number to 0 after using this new feature)</span>, so I need to split the dataset further by browser.

## Browser and 'First time' combined impact

In [13]:
ttest_result = tests.groupby(['browser','first_time']).apply(run_ttest)

In [14]:
ttest_result

Unnamed: 0_level_0,Unnamed: 1_level_0,conclusion,mean_ctrl,mean_test,n_ctrl,n_test,pvalue,test-ctrl
browser,first_time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Chrome,0,Significant,4.607945,4.701512,17092,17525,0.0002290889,0.093567
Chrome,1,Not Significant,4.634488,4.648011,4361,4449,0.8149175,0.013523
Firefox,0,Significant,4.59059,4.757306,8842,8657,3.692901e-06,0.166716
Firefox,1,Not Significant,4.639906,4.53922,2130,2129,0.2210706,-0.100686
IE,0,Significant,4.590576,4.721494,8744,8779,0.0002669847,0.130918
IE,1,Not Significant,4.630435,4.543964,2162,2195,0.2808421,-0.086471
Opera,0,Significant,4.594564,0.0,883,833,7.204927000000001e-255,-4.594564
Opera,1,Significant,4.358407,0.0,226,185,1.222949e-68,-4.358407
Safari,0,Not Significant,4.638254,4.720973,4329,4315,0.1000829,0.08272
Safari,1,Not Significant,4.637883,4.578657,1077,1087,0.6015241,-0.059226


In [16]:
# old users on each browser
ttest_result.xs(0,level=1)

Unnamed: 0_level_0,conclusion,mean_ctrl,mean_test,n_ctrl,n_test,pvalue,test-ctrl
browser,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Chrome,Significant,4.607945,4.701512,17092,17525,0.0002290889,0.093567
Firefox,Significant,4.59059,4.757306,8842,8657,3.692901e-06,0.166716
IE,Significant,4.590576,4.721494,8744,8779,0.0002669847,0.130918
Opera,Significant,4.594564,0.0,883,833,7.204927000000001e-255,-4.594564
Safari,Not Significant,4.638254,4.720973,4329,4315,0.1000829,0.08272


for old users, the conclusion is the same as general ['browser impact'](#Browsers'-impact), which is:
* #page_visited in Chrome, Firefox, IE are significantly increased.
* #page_visited in Opera has reduced to zero, <span style='color:orange;font-size:1.2em'>maybe there is some bug in implementation on Opera, which stops user visiting further pages.</span>
* #page_visited in Safari has no significant improvement, <span style='color:orange;font-size:1.2em'>maybe because the recommended friends aren't shown in a noticeable position.</span>

In [17]:
# first-time new users on each browser
ttest_result.xs(1,level=1)

Unnamed: 0_level_0,conclusion,mean_ctrl,mean_test,n_ctrl,n_test,pvalue,test-ctrl
browser,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Chrome,Not Significant,4.634488,4.648011,4361,4449,0.8149175,0.013523
Firefox,Not Significant,4.639906,4.53922,2130,2129,0.2210706,-0.100686
IE,Not Significant,4.630435,4.543964,2162,2195,0.2808421,-0.086471
Opera,Significant,4.358407,0.0,226,185,1.222949e-68,-4.358407
Safari,Not Significant,4.637883,4.578657,1077,1087,0.6015241,-0.059226


besides Opera which may have a bug, all changes for new users are not significant, and there is even some drop after applying this new feature.

this may because: <span style='color:orange'> the friend recommendation engine may be based on a user's previous social activity on the site. then for new users, since they don't have any previous history for recommendation engine to use, the recommendation result is like random guess, which cannot draw new user's interest. But the recommended friends may occupy some space on the page, so it may even decrease the #page_visited for new users a little bit.</span>

from this observation, I suspect the Recommedation Engine suffers <span style='color:red;font-size:1.2em;font-weight:bold'>'the cold start'</span> problem.