## 1. Import libraries

In [3]:
import pandas as pd
import numpy as np
# import scipy.stats as stats
# from statsmodels.stats import proportion as proptests

import matplotlib.pyplot as plt
% matplotlib inline

UsageError: Line magic function `%` not found.


## 2. Load data

In [4]:
train_df = pd.read_csv('data/training.csv')
print('Loaded training data: size {}'.format(train_df.shape))
test_df = pd.read_csv('data/test.csv')
print('Loaded testing data: size {}'.format(test_df.shape))
print('----------------')
print('train_df:')
train_df.head()

Loaded training data: size (84534, 10)
Loaded testing data: size (41650, 10)
----------------
train_df:


Unnamed: 0,ID,Promotion,purchase,V1,V2,V3,V4,V5,V6,V7
0,1,No,0,2,30.443518,-1.165083,1,1,3,2
1,3,No,0,3,32.15935,-0.645617,2,3,2,2
2,4,No,0,2,30.431659,0.133583,1,1,4,2
3,5,No,0,0,26.588914,-0.212728,2,1,4,2
4,8,Yes,0,3,28.044331,-0.385883,1,1,2,2


## 3. Calculate number of customers and purchasers in treatment and control group

* `Treatment` group: received promotion, need to test the effect of promotion program in this group
* `Control` group: NOT received promotion

In [5]:
totalCustomer = train_df.shape[0]
numberCustomerTreatment = train_df['Promotion'].value_counts()[0]
numberCustomerControl = train_df['Promotion'].value_counts()[1]
numberPurchaserTreatment = train_df.groupby('Promotion')['purchase'].sum()[1]
numberPurchaserControl = train_df.groupby('Promotion')['purchase'].sum()[0]

print('Total customers: {}'.format(totalCustomer))
print('Number of customers in treatment group: {}'.format(numberCustomerTreatment))
print('Number of customers in control group: {}'.format(numberCustomerControl))
print('Number of purchasers in treatment group: {}'.format(numberPurchaserTreatment))
print('Number of purchasers in control group: {}'.format(numberPurchaserControl))

Total customers: 84534
Number of customers in treatment group: 42364
Number of customers in control group: 42170
Number of purchasers in treatment group: 721
Number of purchasers in control group: 319


## 4. Calculate IRR and NIR

`IRR (Incremental Response Rate)` depicts how many more customers purchased the product with the promotion, as compared to if they didn't receive the promotion.

$$ IRR = \frac{purchasers_{treatment}}{customers_{treatment}} - \frac{purchasers_{control}}{customers_{control}} $$


`NIR (Net Incremental Revenue)` depicts how much is made (or lost) by sending out the promotion

$$ NIR = (10\cdot purchasers_{treatment} - 0.15 \cdot customers_{treatment}) - 10 \cdot purchasers_{control}$$

In [6]:
IRR = (numberPurchaserTreatment/numberCustomerTreatment) - (numberPurchaserControl/numberCustomerControl)
print('IRR = {}'.format(IRR))

NIR = (10*numberPurchaserTreatment - 0.15*numberCustomerTreatment) - 10*numberPurchaserControl
print('NIR = {}'.format(NIR))

IRR = 0.009454547819772702
NIR = -2334.5999999999995


## 5. State and test hypothesis

`Permutation test` method is used to test hypotheses.

The permutation test is a resampling-type test used to compare the values on an outcome variable between two or more groups. In the case of the permutation test, resampling is done on the group labels. The idea here is that, under the null hypothesis, the outcome distribution should be the same for all groups, whether control or experimental. Thus, we can emulate the null by taking all of the data values as a single large group. Applying labels randomly to the data points (while maintaining the original group membership ratios) gives us one simulated outcome from the null.

The rest follows similar to the sampling approach to a standard hypothesis test, except that we haven't specified a reference distribution to sample from – we're sampling directly from the data we've collected. After applying the labels randomly to all the data and recording the outcome statistic many times, we compare our actual, observed statistic against the simulated statistics. A p-value is obtained by seeing how many simulated statistic values are as or more extreme as the one actually observed, and a conclusion is then drawn.

### 5.1. Hypothesis 1

* H0: $numberCustomerTreatment = numberCustomerControl$
* Ha: $numberCustomerTreatment \ne numberCustomerControl$

$\alpha$ = 0.05 (significance level)

This hypothesis is to test whether the difference between number of customers in treatment group and control group is statistically significant. If it's statistical significant, this will require us to revise random assignment procedures and re-do data collection. The ultimate purpose is to make sure our inferences on the desired metrics are not founded on bias due to sampling population.


### 5.2. Test hypothesis 1

In [7]:
sample_size = int(round(train_df.shape[0] * 0.2,0)) #get 20% of the population as sample

sample_differences = []

for _ in range(10000):
    
    sub_sample = train_df.sample(sample_size, replace=True)
    treatment_ratio = ((sub_sample.Promotion == "Yes").sum())/sub_sample.shape[0]
    control_ratio = (sub_sample.Promotion == "No").sum()/sub_sample.shape[0]
    sample_differences.append(treatment_ratio - control_ratio)

observe_difference = numberCustomerTreatment/totalCustomer - numberCustomerControl/totalCustomer

# Determining the signifigance of our result 
p_val = (sample_differences > observe_difference).mean()
p_val

0.5034

Because `p_value = 0.5034` > $\alpha$ = 0.05, not enough evidence to reject null hypothesis. That means there's no statistical difference in the sampling population. We can rely on this the sampling population for further inferential analysis with desired metrics.

### 5.3. Hypothesis 2

* H0: Incremental Response Rate  = 0
* Ha: Incremental Response Rate > 0

$\alpha$ = 0.05 (significance level)

This hypothesis is to test whether the promotion program had a positive effect on `IRR` metric

### 5.4. Test hypothesis 2

In [8]:
sample_size = int(round(train_df.shape[0] * 0.2,0)) #get 20% of the population as sample

sample_differences = []

n_trials = 10000

for _ in range(n_trials):
    
    sub_sample = train_df.sample(sample_size, replace=True)

    purchase_treatment = sub_sample[sub_sample['Promotion'] == "Yes"].purchase.sum()
    customer_treatment = sub_sample[sub_sample['Promotion'] == "Yes"].shape[0]
    purchase_control = sub_sample[sub_sample['Promotion'] == "No"].purchase.sum()
    customer_control = sub_sample[sub_sample['Promotion'] == "No"].shape[0]

    sample_IRR = purchase_treatment/customer_treatment - purchase_control/customer_control

    sample_differences.append(sample_IRR)

p_null = np.random.normal(sum(sample_differences)/sample_size, np.std(sample_differences), n_trials)

# Determining the signifigance of our result 
p_val = (sample_differences > p_null).mean()
print('p_value: {}'.format(p_val))

p_value: 0.9458


`p_value` = 0.9458 > $\alpha$ = 0.05 which means p_value is above null distribution. Therefore the null hypothesis is rejected. There is a statistical increase in IRR when the customers received promotion.

### 5.5. Hypothesis 3

* H0: Net Incremental Revenue = 0
* Ha: Net Incremental Revenue > 0

$\alpha$ = 0.05 (significance level)

This hypothesis is to test whether the promotion program had a positive effect on `NIR` metric



### 5.6. Test hypothesis 3

In [9]:
sample_size = int(round(train_df.shape[0] * 0.2,0)) #get 20% of the population as sample

sample_differences = []

n_trials = 10000

for _ in range(n_trials):
    
    sub_sample = train_df.sample(sample_size, replace=True)

    purchase_treatment = sub_sample[sub_sample['Promotion'] == "Yes"].purchase.sum()
    customer_treatment = sub_sample[sub_sample['Promotion'] == "Yes"].shape[0]
    purchase_control = sub_sample[sub_sample['Promotion'] == "No"].purchase.sum()
    customer_control = sub_sample[sub_sample['Promotion'] == "No"].shape[0]

    sample_NIR = (10*purchase_treatment - 0.15*customer_treatment) - 10*purchase_control

    sample_differences.append(sample_NIR)

p_null = np.random.normal(0, np.std(sample_differences), 10000)

# Determining the signifigance of our result 
p_val = (sample_differences > p_null).mean()
print('p_value: {}'.format(p_val))

p_value: 0.0123


`p_value` = 0.0123 < $\alpha$ = 0.05 which means p_value is below null distribution, it's still in the confidence interval. Therefore, there's not enough evidence to reject the null hypothesis. There's no effect on NIR metric when the customers received promotion.