## Portfolio Exercise: Starbucks
 
#### Background Information

The dataset you will be provided in this portfolio exercise was originally used as a take-home assignment provided by Starbucks for their job candidates. The data for this exercise consists of about 120,000 data points split in a 2:1 ratio among training and test files. In the experiment simulated by the data, an advertising promotion was tested to see if it would bring more customers to purchase a specific product priced at $10. Since it costs the company 0.15 to send out each promotion, it would be best to limit that promotion only to those that are most receptive to the promotion. Each data point includes one column indicating whether or not an individual was sent a promotion for the product, and one column indicating whether or not that individual eventually purchased that product. Each individual also has seven additional features associated with them, which are provided abstractly as V1-V7.

#### Optimization Strategy

Your task is to use the training data to understand what patterns in V1-V7 to indicate that a promotion should be provided to a user. Specifically, your goal is to maximize the following metrics:

* **Incremental Response Rate (IRR)** 

IRR depicts how many more customers purchased the product with the promotion, as compared to if they didn't receive the promotion. Mathematically, it's the ratio of the number of purchasers in the promotion group to the total number of customers in the purchasers group (_treatment_) minus the ratio of the number of purchasers in the non-promotional group to the total number of customers in the non-promotional group (_control_).

$$ IRR = \frac{purch_{treat}}{cust_{treat}} - \frac{purch_{ctrl}}{cust_{ctrl}} $$


* **Net Incremental Revenue (NIR)**

NIR depicts how much is made (or lost) by sending out the promotion. Mathematically, this is 10 times the total number of purchasers that received the promotion minus 0.15 times the number of promotions sent out, minus 10 times the number of purchasers who were not given the promotion.

$$ NIR = (10\cdot purch_{treat} - 0.15 \cdot cust_{treat}) - 10 \cdot purch_{ctrl}$$

For a full description of what Starbucks provides to candidates see the [instructions available here](https://drive.google.com/open?id=18klca9Sef1Rs6q8DW4l7o349r8B70qXM).

Below you can find the training data provided.  Explore the data and different optimization strategies.

#### How To Test Your Strategy?

When you feel like you have an optimization strategy, complete the `promotion_strategy` function to pass to the `test_results` function.  
From past data, we know there are four possible outomes:

Table of actual promotion vs. predicted promotion customers:  

<table>
<tr><th></th><th colspan = '2'>Actual</th></tr>
<tr><th>Predicted</th><th>Yes</th><th>No</th></tr>
<tr><th>Yes</th><td>I</td><td>II</td></tr>
<tr><th>No</th><td>III</td><td>IV</td></tr>
</table>

The metrics are only being compared for the individuals we predict should obtain the promotion – that is, quadrants I and II.  Since the first set of individuals that receive the promotion (in the training set) receive it randomly, we can expect that quadrants I and II will have approximately equivalent participants.  

Comparing quadrant I to II then gives an idea of how well your promotion strategy will work in the future. 

Get started by reading in the data below.  See how each variable or combination of variables along with a promotion influences the chance of purchasing.  When you feel like you have a strategy for who should receive a promotion, test your strategy against the test dataset used in the final `test_results` function.

In [71]:
# load in packages
from itertools import combinations

from test_results import test_results, score
import numpy as np
import time
import pandas as pd
import scipy as sp
import sklearn as sk
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
from xgboost import XGBClassifier
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import f1_score
import hyperopt
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt import space_eval
import pickle
from sklearn.metrics import f1_score
import numpy as np
import gc
import math
def f1_eval(y_pred, dtrain):
    y_true = dtrain.get_label()
    err = 1-f1_score(y_true, np.round(y_pred))
    return 'f1_err', err
import decimal
from sklearn.model_selection import KFold
def float_range(start, stop, step):
    start = decimal.Decimal(start)
    stop = decimal.Decimal(stop)
    step = decimal.Decimal(step)
    while (start < stop):
        yield float(start)
        start += step
# load in the data
train_data = pd.read_csv('./training.csv')
train_data.head()

Unnamed: 0,ID,Promotion,purchase,V1,V2,V3,V4,V5,V6,V7
0,1,No,0,2,30.443518,-1.165083,1,1,3,2
1,3,No,0,3,32.15935,-0.645617,2,3,2,2
2,4,No,0,2,30.431659,0.133583,1,1,4,2
3,5,No,0,0,26.588914,-0.212728,2,1,4,2
4,8,Yes,0,3,28.044332,-0.385883,1,1,2,2


In [13]:
data_dir = "./data"

In [83]:
train_data.describe()

Unnamed: 0,ID,purchase,V1,V2,V3,V4,V5,V6,V7
count,84534.0,84534.0,84534.0,84534.0,84534.0,84534.0,84534.0,84534.0,84534.0
mean,62970.972413,0.012303,1.500662,29.9736,0.00019,1.679608,2.327643,2.502898,1.701694
std,36418.440539,0.110234,0.868234,5.010626,1.000485,0.46663,0.841167,1.117349,0.457517
min,1.0,0.0,0.0,7.104007,-1.68455,1.0,1.0,1.0,1.0
25%,31467.25,0.0,1.0,26.591501,-0.90535,1.0,2.0,2.0,1.0
50%,62827.5,0.0,2.0,29.979744,-0.039572,2.0,2.0,3.0,2.0
75%,94438.75,0.0,2.0,33.344593,0.826206,2.0,3.0,4.0,2.0
max,126184.0,1.0,3.0,50.375913,1.691984,2.0,4.0,4.0,2.0


In [85]:
# Cells for you to work and document as necessary - 
# definitely feel free to add more cells as you need

## Exploratory analysis

In [86]:
train_data["Promotion"].value_counts()

Yes    42364
No     42170
Name: Promotion, dtype: int64

If we treat the act of giving promotion as a treatment given by the company to its customers and those that were not given promotion as the control group, then we can see that there is nearly equal number of customers that belong to both the groups.

### Checking for the distribution of values of target variable of our interest

In [87]:
train_data["purchase"].value_counts()

0    83494
1     1040
Name: purchase, dtype: int64

It is clear from this numbers that there is a high imbalance in the number of customers who chose to purchase the product vs those who didn't. We need to take care of this while using this dataset for trainin the machine learning algorithm by using some technique like oversampling from under represented (minority) value `1` for target variable `purchase`. **SMOTE** is one useful technique that generates balanced dataset for training purpose while also introducing some variations in the input variables while oversampling the data with minority target value.

## Approach 1
### Predicting if the customer will make the purchase only after receiving at the promotion.

We are given dataset that includes customers that have been given and not given the promotion. As it costs the company 1.5$ to promote to the customer, it will try to avoid promoting it to the customers who are:
- Not likely to purchase even after receiving the promotion
- Are going to purchase even without receiving the promotion  

Company is interested in giving the promotion to the customers who are likely to make the purchase only after receiving the promotion. The job of the preditive model is to predict whether the given customer falls into this category. If yes, then our algorithm will suggest the company to give promotion to that customer, otherwise it won't suggest to give promotion to that customer.

A statistical model can be trained to decide whether to give customer the promotion or not by training it with dataset where each customer is labeled as 1 in the output variable if he has been shown promotion and has purchased to product, and 0 for the rest of the scenarios. We can name this new variable as `response` as it indicates whether the customer resonded positively to our promotion.

In [219]:
train_data_1 = train_data.copy()

In [220]:
train_data_1["response"] = (train_data_1["Promotion"] == "Yes") & (train_data_1["purchase"] == 1)

In [221]:
features = ["V"+str(x) for x in range(1,8)] + ["Promotion"]

In [222]:
X = train_data_1[features]

In [223]:
Y = train_data_1["response"]

In [224]:
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size=0.2, random_state=42)

#### Generating balanced training dataset using Synthetic Minority Over-sampling Technique (SMOTE)

In [225]:
sm = SMOTE(random_state=42, ratio=1.0)

In [226]:
X_balanced_train, Y_balanced_train = sm.fit_resample(X_train, Y_train)

Converting back to dataframe and series

In [227]:
X_balanced_train = pd.DataFrame(X_balanced_train, columns=features)

In [228]:
X_balanced_train.columns

Index(['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7'], dtype='object')

In [229]:
Y_balanced_train = pd.Series(Y_balanced_train)

In [304]:
cv = GridSearchCV(estimator=XGBClassifier(), param_grid={
        "max_depth": range(5,8,1),
        "min_child_weight": [5, 10, 20, 50],
        "gamma": [0, 0.1, 0.2],
        "random_state": [42],
        "n_estimators": [1000]
        },         
        scoring="f1", cv=3)


start_time = time.time()
fit_params= {
            "eval_set": [(X_valid, Y_valid)],
            "eval_metric": f1_eval,
            "early_stopping_rounds":20,
            "verbose": 50
        }
cv.fit(X_balanced_train, Y_balanced_train, **fit_params)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))

[0]	validation_0-error:0.388419	validation_0-f1_err:0.971593
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[3]	validation_0-error:0.217957	validation_0-f1_err:0.969992

[0]	validation_0-error:0.388537	validation_0-f1_err:0.971602
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.23996	validation_0-f1_err:0.969878

[0]	validation_0-error:0.388537	validation_0-f1_err:0.971602
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.214763	validation_0-f1_err:0.969042

[0]	validation_0-error:0.388419	validation_0-f1_err:0.971593
Mult

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[0]	validation_0-error:0.276631	validation_0-f1_err:0.97134

[0]	validation_0-error:0.277163	validation_0-f1_err:0.971393
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[2]	validation_0-error:0.263678	validation_0-f1_err:0.970396

[0]	validation_0-error:0.288165	validation_0-f1_err:0.972067
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.174957	validation_0-f1_err:0.971748

[0]	validation_0-error:0.277163	validation_0-f1_err:0.971393
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds

[0]	validation_0-error:0.359555	validation_0-f1_err:0.972484
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[9]	validation_0-error:0.196723	validation_0-f1_err:0.971946

[0]	validation_0-error:0.359378	validation_0-f1_err:0.972471
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[8]	validation_0-error:0.239191	validation_0-f1_err:0.971182

[0]	validation_0-error:0.359378	validation_0-f1_err:0.972471
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[1]	validation_0-error:0.253859	validation_0-f1_err:0.97148

[0]	validation_0-error:0.359378	validation_0-f1_err:0.972471
Mult

Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.239782	validation_0-f1_err:0.969856

[0]	validation_0-error:0.38836	validation_0-f1_err:0.971589
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.214586	validation_0-f1_err:0.969017

[0]	validation_0-error:0.391613	validation_0-f1_err:0.971819
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[3]	validation_0-error:0.217898	validation_0-f1_err:0.969984

[0]	validation_0-error:0.391909	validation_0-f1_err:0.971839
Multiple eval metrics have been passed: 'validation_0-f1_err' wil

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[2]	validation_0-error:0.264092	validation_0-f1_err:0.970441

[0]	validation_0-error:0.287869	validation_0-f1_err:0.972039
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[8]	validation_0-error:0.154315	validation_0-f1_err:0.971695

[0]	validation_0-error:0.280002	validation_0-f1_err:0.971675
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 rounds.
Stopping. Best iteration:
[5]	validation_0-error:0.214527	validation_0-f1_err:0.971605

[0]	validation_0-error:0.290235	validation_0-f1_err:0.972261
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 20 round

In [305]:
cv.best_params_

{'gamma': 0.2, 'max_depth': 7, 'min_child_weight': 5, 'random_state': 42}

In [306]:
# This will help us deciding number of estimators
xgb = XGBClassifier(n_estimators=1000)
best_params_xgb = cv.best_params_
xgb.set_params(**best_params_xgb)
xgb.fit(X=X_balanced_train, y=Y_balanced_train.values.ravel(), eval_set=[(X_valid, Y_valid)], eval_metric=f1_eval, early_stopping_rounds=10, verbose=10)

[0]	validation_0-error:0.290235	validation_0-f1_err:0.972261
Multiple eval metrics have been passed: 'validation_0-f1_err' will be used for early stopping.

Will train until validation_0-f1_err hasn't improved in 10 rounds.
[10]	validation_0-error:0.14497	validation_0-f1_err:0.974553
Stopping. Best iteration:
[7]	validation_0-error:0.170521	validation_0-f1_err:0.971689



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.2,
              learning_rate=0.1, max_delta_step=0, max_depth=7,
              min_child_weight=5, missing=None, n_estimators=1000, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [307]:
optimal_n_estimators = xgb.best_ntree_limit

We have found out optimal max_depth and number of estimators for XGBoost algorithm for our case. Train the XGBoost on entire training dataset for using it in promotion strategy.

In [308]:
X_balanced, Y_balanced = sm.fit_sample(X,Y)
X_balanced = pd.DataFrame(X_balanced, columns=features)
Y_balanced = pd.Series(Y_balanced)

In [309]:
xgb = XGBClassifier(max_depth=best_params_xgb["max_depth"],
                    gamma=best_params_xgb["gamma"],
                    min_child_weight=best_params_xgb["min_child_weight"],
                    n_estimators=optimal_n_estimators,
                    random_state=42)
xgb.fit(X_balanced, Y_balanced)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.2,
              learning_rate=0.1, max_delta_step=0, max_depth=7,
              min_child_weight=5, missing=None, n_estimators=8, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [310]:
pickle.dump(xgb, open(data_dir + '/xgb_best_approach_1.pkl', 'wb'))

In [None]:
model = pickle.load(open(data_dir + "/xgb_best_approach_1.pkl", 'rb'))

In [312]:
def promotion_strategy(df):
    '''
    INPUT 
    df - a dataframe with *only* the columns V1 - V7 (same as train_data)

    OUTPUT
    promotion_df - np.array with the values
                   'Yes' or 'No' related to whether or not an 
                   individual should recieve a promotion 
                   should be the length of df.shape[0]
                
    Ex:
    INPUT: df
    
    V1	V2	  V3	V4	V5	V6	V7
    2	30	-1.1	1	1	3	2
    3	32	-0.6	2	3	2	2
    2	30	0.13	1	1	4	2
    
    OUTPUT: promotion
    
    array(['Yes', 'Yes', 'No'])
    indicating the first two users would recieve the promotion and 
    the last should not.
    '''
    
    test = df
    
    preds = model.predict(test)
    promotion = []
    for pred in preds:
        if pred:
            promotion.append('Yes')
        else:
            promotion.append('No')
    promotion = np.array(promotion)
    
    
    return promotion

In [313]:
test_results(promotion_strategy)

Nice job!  See how well your strategy worked on our test data below!

Your irr with this strategy is 0.0206.

Your nir with this strategy is 259.25.
We came up with a model with an irr of 0.0188 and an nir of 189.45 on the test set.

 How did you do?


(0.020606371512773836, 259.25)

## Approach 2

- Indicating whether a person has received promotion as input variable and training a single model to predict whether the person will make purchase or not. 
- For the purpose of deciding whether to send promotion to the person, we can first calculate the probability of person making purchase after receiving promotion and without receiving promotion by senting `promotion` input variable as 1 or 0 respectively and calculating the difference between the two probabailities. If the difference turns out to be greater than some threshold value, in that case we can send promotion to the person. 
- The value of threshold can be decided using Hyper Parameter optimization technique while using NIR formula for calcualting the return value of the objective function to be minimized i.e. the score can be -NIR.
- Here I have used **log loss as evaluation metric** while tuning hyper parameters of XGBoost as I am more interested in calculating accurate probabilities of person giving certain response than just predicting right response.

In [26]:
train_data_1 = train_data.copy()

In [27]:
train_data_1["response"] = train_data_1["purchase"] == 1

In [28]:
train_data_1["response"].unique()

array([False,  True])

In [29]:
features = ["V"+str(x) for x in range(1,8)] + ["Promotion"]

In [6]:
# X = pd.concat([train_data_1[features],pd.get_dummies(train_data_1["Promotion"])], axis=1)

In [30]:
X = pd.get_dummies(train_data_1[features])

In [31]:
X.shape

(84534, 9)

In [32]:
features=X.columns

In [34]:
Y = train_data_1["response"]

In [35]:
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size=0.2, random_state=42)

#### Generating balanced training dataset using Synthetic Minority Over-sampling Technique (SMOTE)

In [390]:
sm = SMOTE(random_state=42, ratio=1.0)

In [391]:
X_balanced_train, Y_balanced_train = sm.fit_resample(X_train, Y_train)

Converting back to dataframe and series

In [392]:
X_balanced_train = pd.DataFrame(X_balanced_train, columns=features)

In [393]:
X_balanced_train.columns

Index(['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'Promotion_No',
       'Promotion_Yes'],
      dtype='object')

In [394]:
Y_balanced_train = pd.Series(Y_balanced_train)

In [395]:
cv = GridSearchCV(estimator=XGBClassifier(), param_grid={
        "max_depth": range(5,8,1),
        "min_child_weight": [5, 10, 20, 50],
        "gamma": [0, 0.1, 0.2],
        "random_state": [42],
        "n_estimators": [1000]
        },
        scoring="f1",
         cv=3)


start_time = time.time()
fit_params= {
            "eval_set": [(X_valid, Y_valid)],
            "eval_metric": "logloss",
            "early_stopping_rounds":20,
            "verbose": 50
        }
cv.fit(X_balanced_train, Y_balanced_train, **fit_params)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))

[0]	validation_0-logloss:0.655244
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.251369
[100]	validation_0-logloss:0.171265
[150]	validation_0-logloss:0.129084
[200]	validation_0-logloss:0.102182
[250]	validation_0-logloss:0.083953
[300]	validation_0-logloss:0.075784
[350]	validation_0-logloss:0.072174
[400]	validation_0-logloss:0.071588
Stopping. Best iteration:
[381]	validation_0-logloss:0.071486

[0]	validation_0-logloss:0.656642
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.262351
[100]	validation_0-logloss:0.179657
[150]	validation_0-logloss:0.134394
[200]	validation_0-logloss:0.109404
[250]	validation_0-logloss:0.092806
[300]	validation_0-logloss:0.08443
[350]	validation_0-logloss:0.079844
[400]	validation_0-logloss:0.077083
[450]	validation_0-logloss:0.075457
[500]	validation_0-logloss:0.074352
[550]	validation_0-logloss:0.073695
[600]	validation_0-logloss:0.073369
[650]	validation

[900]	validation_0-logloss:0.078195
[950]	validation_0-logloss:0.077766
[999]	validation_0-logloss:0.077434
[0]	validation_0-logloss:0.651467
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.224672
[100]	validation_0-logloss:0.143961
[150]	validation_0-logloss:0.106476
[200]	validation_0-logloss:0.082817
[250]	validation_0-logloss:0.074733
[300]	validation_0-logloss:0.072019
Stopping. Best iteration:
[290]	validation_0-logloss:0.071968

[0]	validation_0-logloss:0.652593
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.230816
[100]	validation_0-logloss:0.153575
[150]	validation_0-logloss:0.114057
[200]	validation_0-logloss:0.093356
[250]	validation_0-logloss:0.083356
[300]	validation_0-logloss:0.078277
[350]	validation_0-logloss:0.075792
[400]	validation_0-logloss:0.074115
[450]	validation_0-logloss:0.0733
[500]	validation_0-logloss:0.073061
Stopping. Best iteration:
[505]	validation_0-logloss:

[300]	validation_0-logloss:0.075428
[350]	validation_0-logloss:0.074214
[400]	validation_0-logloss:0.073806
[450]	validation_0-logloss:0.073513
Stopping. Best iteration:
[431]	validation_0-logloss:0.073436

[0]	validation_0-logloss:0.647655
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.20815
[100]	validation_0-logloss:0.132397
[150]	validation_0-logloss:0.093078
[200]	validation_0-logloss:0.076535
[250]	validation_0-logloss:0.072488
[300]	validation_0-logloss:0.072221
Stopping. Best iteration:
[288]	validation_0-logloss:0.071954

[0]	validation_0-logloss:0.648279
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.218324
[100]	validation_0-logloss:0.135213
[150]	validation_0-logloss:0.099237
[200]	validation_0-logloss:0.084589
[250]	validation_0-logloss:0.079065
[300]	validation_0-logloss:0.076845
[350]	validation_0-logloss:0.075437
[400]	validation_0-logloss:0.074784
[450]	validation_0-loglos

[350]	validation_0-logloss:0.080108
[400]	validation_0-logloss:0.077694
[450]	validation_0-logloss:0.075992
[500]	validation_0-logloss:0.075071
[550]	validation_0-logloss:0.074292
[600]	validation_0-logloss:0.073907
[650]	validation_0-logloss:0.073608
[700]	validation_0-logloss:0.073385
[750]	validation_0-logloss:0.073324
Stopping. Best iteration:
[764]	validation_0-logloss:0.073229

[0]	validation_0-logloss:0.656049
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.26625
[100]	validation_0-logloss:0.181644
[150]	validation_0-logloss:0.13548
[200]	validation_0-logloss:0.108537
[250]	validation_0-logloss:0.093725
[300]	validation_0-logloss:0.084899
[350]	validation_0-logloss:0.080237
[400]	validation_0-logloss:0.077672
[450]	validation_0-logloss:0.07621
[500]	validation_0-logloss:0.075337
[550]	validation_0-logloss:0.07474
[600]	validation_0-logloss:0.074344
[650]	validation_0-logloss:0.074088
[700]	validation_0-logloss:0.073755
[750]	valida

[150]	validation_0-logloss:0.111222
[200]	validation_0-logloss:0.092738
[250]	validation_0-logloss:0.082928
[300]	validation_0-logloss:0.078734
[350]	validation_0-logloss:0.076391
[400]	validation_0-logloss:0.074842
[450]	validation_0-logloss:0.074092
[500]	validation_0-logloss:0.073744
[550]	validation_0-logloss:0.073563
[600]	validation_0-logloss:0.07326
Stopping. Best iteration:
[606]	validation_0-logloss:0.073251

[0]	validation_0-logloss:0.651495
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.226877
[100]	validation_0-logloss:0.148587
[150]	validation_0-logloss:0.104609
[200]	validation_0-logloss:0.08444
[250]	validation_0-logloss:0.076021
[300]	validation_0-logloss:0.073504
[350]	validation_0-logloss:0.07236
[400]	validation_0-logloss:0.071708
[450]	validation_0-logloss:0.071724
Stopping. Best iteration:
[438]	validation_0-logloss:0.071597

[0]	validation_0-logloss:0.652618
Will train until validation_0-logloss hasn't improved in 2

[0]	validation_0-logloss:0.646686
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.213386
[100]	validation_0-logloss:0.13192
[150]	validation_0-logloss:0.099025
[200]	validation_0-logloss:0.08775
[250]	validation_0-logloss:0.082114
[300]	validation_0-logloss:0.07959
[350]	validation_0-logloss:0.077865
[400]	validation_0-logloss:0.076573
[450]	validation_0-logloss:0.07572
[500]	validation_0-logloss:0.075447
[550]	validation_0-logloss:0.075072
[600]	validation_0-logloss:0.074766
[650]	validation_0-logloss:0.074411
Stopping. Best iteration:
[663]	validation_0-logloss:0.074407

[0]	validation_0-logloss:0.647907
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.20986
[100]	validation_0-logloss:0.129292
[150]	validation_0-logloss:0.10303
[200]	validation_0-logloss:0.092937
[250]	validation_0-logloss:0.082743
[300]	validation_0-logloss:0.079527
[350]	validation_0-logloss:0.077209
[400]	validation_0-lo

[150]	validation_0-logloss:0.136704
[200]	validation_0-logloss:0.110061
[250]	validation_0-logloss:0.093739
[300]	validation_0-logloss:0.087077
[350]	validation_0-logloss:0.083379
[400]	validation_0-logloss:0.081192
[450]	validation_0-logloss:0.079281
[500]	validation_0-logloss:0.077821
[550]	validation_0-logloss:0.076821
[600]	validation_0-logloss:0.076092
[650]	validation_0-logloss:0.075642
[700]	validation_0-logloss:0.07534
[750]	validation_0-logloss:0.074843
[800]	validation_0-logloss:0.074657
[850]	validation_0-logloss:0.074449
[900]	validation_0-logloss:0.074257
Stopping. Best iteration:
[926]	validation_0-logloss:0.074075

[0]	validation_0-logloss:0.655271
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.255852
[100]	validation_0-logloss:0.172482
[150]	validation_0-logloss:0.129652
[200]	validation_0-logloss:0.106359
[250]	validation_0-logloss:0.09736
[300]	validation_0-logloss:0.088682
[350]	validation_0-logloss:0.083151
[400]	vali

[550]	validation_0-logloss:0.073816
[600]	validation_0-logloss:0.07336
[650]	validation_0-logloss:0.072527
Stopping. Best iteration:
[668]	validation_0-logloss:0.072469

[0]	validation_0-logloss:0.652623
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.238998
[100]	validation_0-logloss:0.147953
[150]	validation_0-logloss:0.114771
[200]	validation_0-logloss:0.102148
[250]	validation_0-logloss:0.095525
[300]	validation_0-logloss:0.091171
[350]	validation_0-logloss:0.086765
[400]	validation_0-logloss:0.083995
[450]	validation_0-logloss:0.082159
[500]	validation_0-logloss:0.080727
[550]	validation_0-logloss:0.079872
[600]	validation_0-logloss:0.07898
[650]	validation_0-logloss:0.078245
[700]	validation_0-logloss:0.077857
[750]	validation_0-logloss:0.077456
[800]	validation_0-logloss:0.076969
[850]	validation_0-logloss:0.076426
[900]	validation_0-logloss:0.076258
[950]	validation_0-logloss:0.07607
[999]	validation_0-logloss:0.075921
[0]	validat

[800]	validation_0-logloss:0.07677
[850]	validation_0-logloss:0.076408
[900]	validation_0-logloss:0.07622
[950]	validation_0-logloss:0.076069
[999]	validation_0-logloss:0.075875
[0]	validation_0-logloss:0.651736
Will train until validation_0-logloss hasn't improved in 20 rounds.
[50]	validation_0-logloss:0.230457
[100]	validation_0-logloss:0.152161
[150]	validation_0-logloss:0.110356
[200]	validation_0-logloss:0.088276
[250]	validation_0-logloss:0.079036
[300]	validation_0-logloss:0.074299
[350]	validation_0-logloss:0.072308
[400]	validation_0-logloss:0.071203
[450]	validation_0-logloss:0.070651
[500]	validation_0-logloss:0.070361
[550]	validation_0-logloss:0.070151
[600]	validation_0-logloss:0.069974
[650]	validation_0-logloss:0.06996
Stopping. Best iteration:
[634]	validation_0-logloss:0.069892

Elapsed computation time: 147.557 mins


In [396]:
cv.best_params_

{'gamma': 0.1,
 'max_depth': 6,
 'min_child_weight': 10,
 'n_estimators': 1000,
 'random_state': 42}

In [397]:
# This will help us deciding number of estimators
xgb = XGBClassifier(n_estimators=1000)
best_params_xgb = cv.best_params_
xgb.set_params(**best_params_xgb)
xgb.fit(X=X_balanced_train, y=Y_balanced_train.values.ravel(), eval_set=[(X_valid, Y_valid)], eval_metric="logloss", early_stopping_rounds=10, verbose=10)

[0]	validation_0-logloss:0.651736
Will train until validation_0-logloss hasn't improved in 10 rounds.
[10]	validation_0-logloss:0.467548
[20]	validation_0-logloss:0.361545
[30]	validation_0-logloss:0.298454
[40]	validation_0-logloss:0.25701
[50]	validation_0-logloss:0.230457
[60]	validation_0-logloss:0.211893
[70]	validation_0-logloss:0.19325
[80]	validation_0-logloss:0.179203
[90]	validation_0-logloss:0.163888
[100]	validation_0-logloss:0.152161
[110]	validation_0-logloss:0.139522
[120]	validation_0-logloss:0.132642
[130]	validation_0-logloss:0.124634
[140]	validation_0-logloss:0.117031
[150]	validation_0-logloss:0.110356
[160]	validation_0-logloss:0.104467
[170]	validation_0-logloss:0.100311
[180]	validation_0-logloss:0.094953
[190]	validation_0-logloss:0.090794
[200]	validation_0-logloss:0.088276
[210]	validation_0-logloss:0.085795
[220]	validation_0-logloss:0.084368
[230]	validation_0-logloss:0.082699
[240]	validation_0-logloss:0.080333
[250]	validation_0-logloss:0.079036
[260]	val

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.1,
              learning_rate=0.1, max_delta_step=0, max_depth=6,
              min_child_weight=10, missing=None, n_estimators=1000, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [398]:
optimal_n_estimators = xgb.best_ntree_limit

We have found out optimal max_depth and number of estimators for XGBoost algorithm for our case. Train the XGBoost on entire training dataset for using it in promotion strategy.

In [399]:
X_balanced, Y_balanced = sm.fit_sample(X,Y)
X_balanced = pd.DataFrame(X_balanced, columns=features)
Y_balanced = pd.Series(Y_balanced)

In [400]:
xgb = XGBClassifier(max_depth=best_params_xgb["max_depth"],
                    gamma=best_params_xgb["gamma"],
                    min_child_weight=best_params_xgb["min_child_weight"],
                    n_estimators=optimal_n_estimators,
                    random_state=42)
xgb.fit(X_balanced, Y_balanced)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0.1,
              learning_rate=0.1, max_delta_step=0, max_depth=6,
              min_child_weight=10, missing=None, n_estimators=528, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=42,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [401]:
pickle.dump(xgb, open(data_dir + '/xgb_best_approach_2.pkl', 'wb'))

In [14]:
model = pickle.load(open(data_dir + "/xgb_best_approach_2.pkl", 'rb'))

We define `diff` as difference in the probabilities of person purchasing the product with and without receiving promotion. We have to choose the threshold value and if the `diff` is higher than that threshold value than we can choose to show promotion to that person. To decide the value of threshold that maximizes the NIR for given prediction model, I evaluate the thresholds in range from 0 to 0.1 by calculating the mean of NIR on 10 folds of the validation dataset and choose threshold value with maximum NIR.

In [111]:
def evaluate(X, Y, diff_threshold, after_promotion_purchase_prob_threshold):
    def score(df, promo_pred_col = 'Promotion'):
        n_treat       = df.loc[df[promo_pred_col] == 'Yes',:].shape[0]
        n_control     = df.loc[df[promo_pred_col] == 'No',:].shape[0]
        n_treat_purch = df.loc[df[promo_pred_col] == 'Yes', 'purchase'].sum()
        n_ctrl_purch  = df.loc[df[promo_pred_col] == 'No', 'purchase'].sum()
        nir = 10 * n_treat_purch - 0.15 * n_treat - 10 * n_ctrl_purch
        return nir
    
    nir_scores = []
    kf = KFold(n_splits=10, random_state=42)
    for train_index, test_index in kf.split(X):
        X_train, X_valid = X.loc[train_index], X.loc[test_index]
        Y_train, Y_valid = Y.loc[train_index], Y.loc[test_index]
        
        # As we have already trained the hyper parameters for XGBoost, we need not train it again here
        # we can use the trained model, to calculate score for given threshold value
        model = pickle.load(open(data_dir + "/xgb_best_approach_2.pkl", 'rb'))
        
        X_valid_with_promo = X_valid.copy()
        # predict probability of purchase with promotion
        X_valid_with_promo["Promotion_Yes"] = 1
        X_valid_with_promo["Promotion_No"] = 0
        probs_with_promotion = model.predict_proba(X_valid_with_promo)[:, 1]

        # predict probability of purchase without promotion
        X_valid_with_promo["Promotion_Yes"] = 0
        X_valid_with_promo["Promotion_No"] = 1

        probs_without_promotion = model.predict_proba(X_valid_with_promo)[:, 1]

        # calculate the difference as diff
        diff = probs_with_promotion - probs_without_promotion

        # if diff is above threshold choose to promote else don't
        promos = (probs_with_promotion > after_promotion_purchase_prob_threshold) & (diff > diff_threshold)
        val_data = X_valid.copy()
        val_data["Promotion"] = "No"
        val_data.loc[val_data["Promotion_Yes"] == 1, "Promotion"] = "Yes"
        val_data["purchase"] = Y_valid.copy()
        score_df = val_data.iloc[np.where(promos)]
        nir = score(score_df)
        nir_scores.append(nir)
    return np.asscalar(np.mean(nir_scores))

In [105]:
(X_valid.index == Y_valid.index).all()

True

In [116]:
evaluated_point_scores = {}

def objective_threshold(params):
    if (str(params) in evaluated_point_scores):
        return evaluated_point_scores[str(params)]
    else:
        print(params)
        diff_threshold = params["diff_threshold"]
        after_promotion_purchase_prob_threshold = params["after_promotion_purchase_prob_threshold"]
        nir_score = evaluate(X=X_valid, Y=Y_valid, 
                             diff_threshold=diff_threshold, 
                             after_promotion_purchase_prob_threshold=after_promotion_purchase_prob_threshold)
        print("nir: " + str(nir_score))        
        evaluated_point_scores[str(params)] = -nir_score
        return -nir_score

param_space = {
    "diff_threshold": hp.choice("diff_threshold", list(float_range("0.02", "0.04", "0.001"))),
    "after_promotion_purchase_prob_threshold": hp.choice("after_promotion_purchase_prob_threshold", list(float_range("0.0", "1.0", "0.1")))
}

start_time = time.time()
best_params_threshold = space_eval(
    param_space, 
    fmin(objective_threshold, 
         param_space, 
         algo=hyperopt.tpe.suggest,
         max_evals=200))
print(best_params_threshold)
elapsed_time = (time.time() - start_time) / 60
print('Elapsed computation time: {:.3f} mins'.format(elapsed_time))
best_diff_threshold = best_params_threshold["diff_threshold"]
best_after_promotion_purchase_prob_threshold = best_params_threshold["after_promotion_purchase_prob_threshold"]

{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.036}
  0%|          | 0/200 [00:00<?, ?it/s, best loss: ?]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  del sys.path[0]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  



nir: 0.0                                             
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.02}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.025}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.021}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.022}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.7, 'diff_threshold': 0.027}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.034}
nir: 0.0                                                        
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.021}
nir: 2.91                              

{'after_promotion_purchase_prob_threshold': 0.3, 'diff_threshold': 0.028}       
nir: 1.97                                                                       
{'after_promotion_purchase_prob_threshold': 0.9, 'diff_threshold': 0.033}       
nir: 0.0                                                                        
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.02}        
nir: 13.084999999999999                                                         
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.021}       
nir: 12.430000000000001                                                         
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.034}       
nir: 0.985                                                                      
{'after_promotion_purchase_prob_threshold': 0.2, 'diff_threshold': 0.038}       
nir: 2.91                                                                       
{'after_promotion_purchase_p

nir: 1.97                                                            
{'after_promotion_purchase_prob_threshold': 0.1, 'diff_threshold': 0.035}
nir: 13.01                                                           
{'after_promotion_purchase_prob_threshold': 0.8, 'diff_threshold': 0.027}
nir: 0.0                                                             
{'after_promotion_purchase_prob_threshold': 0.5, 'diff_threshold': 0.022}
nir: 0.985                                                           
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.031}
nir: 12.725                                                          
100%|██████████| 200/200 [01:07<00:00,  2.94it/s, best loss: -14.535]
{'after_promotion_purchase_prob_threshold': 0.0, 'diff_threshold': 0.037}
Elapsed computation time: 1.124 mins


In [117]:
def promotion_strategy(df):
    '''
    INPUT 
    df - a dataframe with *only* the columns V1 - V7 (same as train_data)

    OUTPUT
    promotion_df - np.array with the values
                   'Yes' or 'No' related to whether or not an 
                   individual should recieve a promotion 
                   should be the length of df.shape[0]
                
    Ex:
    INPUT: df
    
    V1	V2	  V3	V4	V5	V6	V7
    2	30	-1.1	1	1	3	2
    3	32	-0.6	2	3	2	2
    2	30	0.13	1	1	4	2
    
    OUTPUT: promotion
    
    array(['Yes', 'Yes', 'No'])
    indicating the first two users would recieve the promotion and 
    the last should not.
    '''
    X = df.copy()
    # predict probability of purchase with promotion

    X["Promotion_No"] = 0
    X["Promotion_Yes"] = 1
    probs_with_promotion = model.predict_proba(X)[:, 1]


    # predict probability of purchase without promotion
    
    X["Promotion_No"] = 1
    X["Promotion_Yes"] = 0
    probs_without_promotion = model.predict_proba(X)[:, 1]

    # calculate the difference as diff
    diff = probs_with_promotion - probs_without_promotion        

    should_promote = pd.DataFrame() 
    should_promote["promo"] = (probs_with_promotion > best_after_promotion_purchase_prob_threshold) & (diff > best_diff_threshold)
    
    should_promote.loc[diff >= best_threshold, "promo"] = "Yes"
    should_promote.loc[diff < best_threshold, "promo"] = "No"    
    return should_promote["promo"].to_numpy(dtype="str")

In [118]:
test_results(promotion_strategy)

Nice job!  See how well your strategy worked on our test data below!

Your irr with this strategy is 0.0179.

Your nir with this strategy is 105.75.
We came up with a model with an irr of 0.0188 and an nir of 189.45 on the test set.

 How did you do?


(0.017909013159424635, 105.75)

# Approach 3

Here I am trying the two models approach that is commonly recommended on literature related to uplift measurement. In this approach, we create one model for people who have received the promotion and another model for those who haven't received it. Each model predicts whether the person would purchase the product. The difference between the probability predicted by first model and second model is to be considered for deciding wether to promote to to that person or not. Caveat here is that the error of prediction can get doubled as we are using two separate models. Also the scale of the probabilities may not be the same.