Company XYZ has started a new referral program on Oct, 31. Each user who refers a new user will get 10$ in credit when the new user buys something.

The program has been running for almost a month and the Growth Product Manager wants to know if it's been successful. She is very excited cause, since the referral program started, the company saw a spike in number of users and wants you to be able to give her some data she can show to her boss.

* Can you estimate the impact the program had on the site?
* Based on the data, what would you suggest to do as a next step?
* The referral program wasn't really tested in a rigorous way. It simply started on a given day for all users and you are drawing conclusions by looking at the data before and after the test started. What kinds of risks this approach presents? Can you think of a better way to test the referral program and measure its impact?

In [42]:
import datetime
import pandas as pd
import scipy.stats as ss
import matplotlib.pyplot as plt
plt.style.use('ggplot')

# Index
* [Load the data](#Load-the-data)
* [Hypothesis test on all data](#Hypothesis-test-on-all-data)
* [Hypothesis test grouped by country](#Hypothesis-test-grouped-by-country)
    * [daily spent change in each country](#daily-spent-change-in-each-country)
    * [daily customers change in each country](#daily-customers-change-in-each-country)
    * [daily transactions change in each country](#daily-transactions-change-in-each-country)
    * [Country-based conclusion](#Country-based-conclusion)
* [Answer question 1](#Answer-question-1)
* [Answer question 2](#Answer-question-2)
* [Answer question 3](#Answer-question-3)


## Load the data

In [43]:
referral = pd.read_csv("referral.csv")
del referral['device_id']
referral['date'] = pd.to_datetime( referral.date )

In [44]:
referral.head()# glance the data

Unnamed: 0,user_id,date,country,money_spent,is_referral
0,2,2015-10-03,FR,65,0
1,3,2015-10-03,CA,54,0
2,6,2015-10-03,FR,35,0
3,7,2015-10-03,UK,73,0
4,7,2015-10-03,MX,35,0


In [45]:
dt_referral_starts = datetime.datetime(2015,10,31)

In [63]:
referral.date.describe()

count                   97341
unique                     56
top       2015-11-14 00:00:00
freq                     3303
first     2015-10-03 00:00:00
last      2015-11-27 00:00:00
Name: date, dtype: object

In [60]:
(pd.Series(referral.date.unique()) >= dt_referral_starts).value_counts()

True     28
False    28
dtype: int64

There are 28 days before the program, and 28 days after the program. User Referral program starts right in the middle, 

## Hypothesis test on all data

In [47]:
def count_spent(df):
    d = {}
    d['n_purchase'] = df.shape[0]# number of purchase in that day
    d['total_spent'] = df.money_spent.sum() # total money spent in that day
    d['n_customer'] = df.user_id.unique().shape[0] # how many customers access the store that day
    return pd.Series(d)

In [48]:
def daily_statistics(df):
    """
    given a dataframe
    1.  group by day, and return '#purchase','total spent money','#customers' on each day
    2.  split daily data into two groups, before the program and after the program
    3.  for each 'sale index' ('#purchase','total spent money','#customers'), 
        calculate the mean before/after the program, their difference, and pvalue 
    """
    grpby_day = df.groupby('date').apply(count_spent)

    grpby_day_before = grpby_day.loc[grpby_day.index < dt_referral_starts, :]
    grpby_day_after = grpby_day.loc[grpby_day.index >= dt_referral_starts, :]

    d = []
    colnames = ['total_spent','n_purchase','n_customer']
    for col in colnames:
        pre_data = grpby_day_before.loc[:,col]
        pre_mean = pre_data.mean()

        post_data = grpby_day_after.loc[:,col]
        post_mean = post_data.mean()

        result = ss.ttest_ind(pre_data, post_data, equal_var=False)
        # either greater or smaller, just one-tail test
        pvalue = result.pvalue / 2 

        d.append({'mean_pre':pre_mean,'mean_post':post_mean,'mean_diff':post_mean - pre_mean,
                  'pvalue':pvalue})

    # re-order the columns
    return pd.DataFrame(d,index = colnames).loc[:,['mean_pre','mean_post','mean_diff','pvalue']]

In [49]:
daily_statistics(referral)

Unnamed: 0,mean_pre,mean_post,mean_diff,pvalue
total_spent,71657.0,83714.392857,12057.392857,0.135194
n_purchase,1690.75,1785.714286,94.964286,0.348257
n_customer,1384.464286,1686.964286,302.5,0.059545


<a id='whole_result'></a>although after launching the 'user referral' program, in all three 'sale index', i.e., 'daily purchase activity', 'daily money spent', 'daily customers', are all increased, however, <span style='color:orange;font-size:1.5em;font-weight:bold'>none of those increment are significant</span>. (by using a ** 0.05 ** significant level)

## Hypothesis test grouped by country

In [50]:
referral.country.value_counts()

UK    15493
FR    15396
US    15280
IT    11446
DE    11093
ES     9831
CA     9440
MX     8133
CH     1229
Name: country, dtype: int64

In [51]:
daily_stat_bycountry = referral.groupby('country').apply(daily_statistics)

In [52]:
daily_stat_bycountry

Unnamed: 0_level_0,Unnamed: 1_level_0,mean_pre,mean_post,mean_diff,pvalue
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CA,total_spent,7468.428571,7880.428571,412.0,0.351704
CA,n_purchase,177.142857,160.0,-17.142857,0.233985
CA,n_customer,173.285714,159.178571,-14.107143,0.268256
CH,total_spent,1536.321429,1023.892857,-512.428571,0.006941
CH,n_purchase,26.821429,17.071429,-9.75,0.003072
CH,n_customer,26.714286,17.071429,-9.642857,0.003142
DE,total_spent,9856.75,8013.964286,-1842.785714,0.081459
DE,n_purchase,232.142857,164.035714,-68.107143,0.011798
DE,n_customer,224.964286,163.25,-61.714286,0.015665
ES,total_spent,6648.642857,8660.571429,2011.928571,0.037522


from above result, we know <span style='color:blue;font-weight:bold'>'User Referral' program has different effect in different countries</span>. The program boosts the sales in some country, but in some other countries, <span style='color:red;font-weight:bold'>it even decrease the sales.</span>

### daily spent change in each country

In [53]:
daily_stat_bycountry.xs('total_spent',level=1).sort_values(by='pvalue')

Unnamed: 0_level_0,mean_pre,mean_post,mean_diff,pvalue
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CH,1536.321429,1023.892857,-512.428571,0.006941
MX,4975.464286,7033.214286,2057.75,0.00967
IT,7651.571429,10193.428571,2541.857143,0.02573
FR,10385.25,13635.0,3249.75,0.031843
ES,6648.642857,8660.571429,2011.928571,0.037522
UK,11213.535714,14196.428571,2982.892857,0.04849
DE,9856.75,8013.964286,-1842.785714,0.081459
US,11921.035714,13077.464286,1156.428571,0.248874
CA,7468.428571,7880.428571,412.0,0.351704


from above result, if we loose the significant level=0.1, then
* <span style='color:orange;font-weight:bold'>daily spent in 'CH' and 'DE' are significantly decreased.</span>
* <span style='color:orange;font-weight:bold'>'MX','IT','FR','ES','UK', their daily spent are significant increased.</span>
* <span style='color:orange;font-weight:bold'>'US' and 'CA' has some improvement in daily spent, but NOT significant.</span>

### daily customers change in each country

In [61]:
daily_stat_bycountry.xs('n_customer',level=1).sort_values(by='pvalue')

Unnamed: 0_level_0,mean_pre,mean_post,mean_diff,pvalue
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CH,26.714286,17.071429,-9.642857,0.003142
DE,224.964286,163.25,-61.714286,0.015665
MX,124.392857,163.107143,38.714286,0.026203
FR,236.5,302.535714,66.035714,0.041124
IT,176.535714,226.107143,49.571429,0.043911
ES,153.392857,193.214286,39.821429,0.057954
UK,255.571429,286.321429,30.75,0.204398
CA,173.285714,159.178571,-14.107143,0.268256
US,273.178571,261.107143,-12.071429,0.36886


from above result, 
* <span style='color:orange;font-weight:bold'>daily customers in 'CH' and 'DE' are significantly decreased.</span>
* <span style='color:orange;font-weight:bold'>'MX','IT','FR','ES', their daily customers are significant increased.

### daily transactions change in each country

In [62]:
daily_stat_bycountry.xs('n_purchase',level=1).sort_values(by='pvalue')

Unnamed: 0_level_0,mean_pre,mean_post,mean_diff,pvalue
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CH,26.821429,17.071429,-9.75,0.003072
DE,232.142857,164.035714,-68.107143,0.011798
MX,126.464286,164.0,37.535714,0.03243
IT,180.857143,227.928571,47.071429,0.057454
FR,244.142857,305.714286,61.571429,0.058996
ES,156.607143,194.5,37.892857,0.072638
CA,177.142857,160.0,-17.142857,0.233985
UK,264.285714,289.035714,24.75,0.261183
US,282.285714,263.428571,-18.857143,0.307801


## Country-based conclusion

* <span style='color:orange;font-weight:bold;font-size:1.5em'>the program fails in CH and DE, it significantly decrease the sales in these two countries.</span>
* <span style='color:orange;font-weight:bold;font-size:1.5em'>the program succeeds in 'MX','IT','FR','ES', it significantly increase the sales.</span>
* <span style='color:orange;font-weight:bold;font-size:1.5em'>the program doesn't seem have any significant effect on UK,CA,US, especially on CA and US.</span>

## Answer question 1
Can you estimate the impact the program had on the site?

according to the analysis above, the program [doesn't seem have significant impacts to the whole company as a whole](#whole_result).

however, based on each country, I find the program has [different impact on different country](#Country-based-conclusion):
* ** the program fails in CH and DE, it significantly decrease the sales in these two countries.**
* ** the program succeeds in 'MX','IT','FR','ES', it significantly increase the sales. **
* ** the program doesn't seem have any significant effect on UK,CA,US, especially on CA and US.**

## Answer question 2
Based on the data, what would you suggest to do as a next step?

1. first I suggest perform more accurate A/B test ([see question 3's answer](#Answer-question-3)) and collect more data, to study the impact of the program
2. since the program has different impact in different country, I suggest studying the reason of such difference. ** for example, does the program has any cultural conflicts in CH and DE? **

## Answer question 3
The referral program wasn't really tested in a rigorous way. It simply started on a given day for all users and you are drawing conclusions by looking at the data before and after the test started. What kinds of risks this approach presents? Can you think of a better way to test the referral program and measure its impact?

this approach isn't an accurate A/B test. "User Referral" program isn't the only difference between control group and test group. for example, there may be some special holiday after Oct 31 in some country. or just because the weather get colder after Oct 31, people's requirement on some goods are increased.

To get more accurate impact of the program, we need to perform a more careful A/B test. for example:
* during the same peroid of time
* randomly split the customers into two groups, and let only one group know the User Referral program.
* run the experiment some time, then perform the t-test to see whether some 'sale performance index' (e.g., daily spent, daily customers, daily transactions) have significant changes or not.