**Market Response Models**. Offer - Promotion    

Predicting incremental gains of promotional campaigns.  If we do a discount today, how many incremental transactions should we expect?   

One strategy is to split the customers who we are going to send the offer into test and control groups helps us to calculate incremental gains.  In the following setup, the target group was divided into three groups to find an answer to the questions below:  

1- Does giving an offer increase conversion?     
2- If yes, what kind of offer performs best? Discount or Buy One Get One?   


* experiment A: policy Discount that results 18%
* experiemnt B: policy Buy One Get One that results 17%
* control group C:  no offer that results in 15%

Assuming the results are statistically significant, Discount (Group A) looks the best as it’s increased the conversion by 3% compared to the Control group and brought 1% more conversion against Buy One Get One.
Of course in the real world, it is much more complicated. Some offers perform better on specific segments. So you need to create a portfolio of offers for selected segments. Moreover, you can’t count on conversion as the only criterion of success. There is always a cost trade-off. Generally, while conversion rates go up, cost increases too.   

That’s why sometimes you need to select an offer that is cost-friendly but brings less conversion.
Now we know which offer performed well compared to others thanks to the experiment. But what about predicting it? If we predict the effect of giving an offer, we can easily maximize our transactions and have a forecast of the cost. Market Response Models help us building this framework. But there is more than one way of doing it. We can group them into two:   

1- If you don’t have a control group (imagine you did an open promotion to everyone and announced it on social media), then you cannot calculate the incrementality. For this kind of situation, better to build a regression model that predicts overall sales. The prior assumption will be that the model will provide higher sales numbers for the promo days.   

To build this kind of model, **your dataset should include promo & non-promo days sales numbers so that the machine learning model can calculate the incrementality**.

2- If you have a control group, you can build the response model based on segment or individual level. For both of them, the assumption is the same. Giving an offer should increase the probability of conversion. The uptick in the individuals’ conversion probability will bring us the incremental conversion.



In [1]:
from __future__ import division

from datetime import datetime, timedelta,date
import numpy as np
import seaborn as sns
import pandas as pd
%matplotlib inline

from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt


from sklearn.cluster import KMeans

import sklearn
import xgboost as xgb
from sklearn.model_selection import KFold, cross_val_score, train_test_split

In [2]:
def order_cluster(cluster_field_name, target_field_name,df,ascending):
    new_cluster_field_name = 'new_' + cluster_field_name
    df_new = df.groupby(cluster_field_name)[target_field_name].mean().reset_index()
    df_new = df_new.sort_values(by=target_field_name,ascending=ascending).reset_index(drop=True)
    df_new['index'] = df_new.index
    df_final = pd.merge(df,df_new[[cluster_field_name,'index']], on=cluster_field_name)
    df_final = df_final.drop([cluster_field_name],axis=1)
    df_final = df_final.rename(columns={"index":cluster_field_name})
    return df_final


In [3]:
df = pd.read_csv('./Downloads/data/ModifiedEmail_Marketing.csv')
df.head()

Unnamed: 0,recency,history,used_discount,used_bogo,zip_code,is_referral,channel,offer,conversion
0,10,142.44,1,0,Surburban,0,Phone,Buy One Get One,0
1,6,329.08,1,1,Rural,1,Web,No Offer,0
2,7,180.65,0,1,Surburban,1,Web,Buy One Get One,0
3,9,675.83,1,0,Rural,1,Web,Discount,0
4,2,45.34,1,0,Urban,0,Web,Buy One Get One,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64000 entries, 0 to 63999
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   recency        64000 non-null  int64  
 1   history        64000 non-null  float64
 2   used_discount  64000 non-null  int64  
 3   used_bogo      64000 non-null  int64  
 4   zip_code       64000 non-null  object 
 5   is_referral    64000 non-null  int64  
 6   channel        64000 non-null  object 
 7   offer          64000 non-null  object 
 8   conversion     64000 non-null  int64  
dtypes: float64(1), int64(5), object(3)
memory usage: 4.4+ MB


Our first 8 columns are providing individual-level data and conversion column is our label to predict:   
* recency: months since last purchase
* history: $value of the historical purchases
* used_discount/used_bogo: indicates if the customer used a discount or buy one get one before
* zip_code: class of the zip code as Suburban/Urban/Rural
* is_referral: indicates if the customer was acquired from referral channel
* channel: channels that the customer using, Phone/Web/Multichannel
* offer: the offers sent to the customers, Discount/But One Get One/No Offer


In [5]:
df.conversion.mean()

0.14678125

**Uplift Formula**.     
First off, we need to build a function that calculates our uplift. To keep it simple, we will assume every conversion means 1 order and the average order value is 25$.   

We are going to calculate three types of uplift:   

- **Conversion Uplift**: Conversion rate of test group - conversion rate of control group   
- **Order Uplift**: Conversion uplift * # converted customer in test group
- **Revenue Uplift**: Order Uplift * Average order $ value


In [6]:
def calc_uplift(df):
    #assigning 25$ to the average order value
    avg_order_value = 25
    
    #calculate conversions for each offer type
    base_conv = df[df.offer == 'No Offer']['conversion'].mean()
    disc_conv = df[df.offer == 'Discount']['conversion'].mean()
    bogo_conv = df[df.offer == 'Buy One Get One']['conversion'].mean()
    
    #calculate conversion uplift for discount and bogo
    disc_conv_uplift = disc_conv - base_conv
    bogo_conv_uplift = bogo_conv - base_conv
    
    #calculate order uplift
    disc_order_uplift = disc_conv_uplift * len(df[df.offer == 'Discount']['conversion'])
    bogo_order_uplift = bogo_conv_uplift * len(df[df.offer == 'Buy One Get One']['conversion'])
    
    #calculate revenue uplift
    disc_rev_uplift = disc_order_uplift * avg_order_value
    bogo_rev_uplift = bogo_order_uplift * avg_order_value
    
    
    print('Discount Conversion Uplift: {0}%'.format(np.round(disc_conv_uplift*100,2)))
    print('Discount Order Uplift: {0}'.format(np.round(disc_order_uplift,2)))
    print('Discount Revenue Uplift: ${0}\n'.format(np.round(disc_rev_uplift,2)))
          
    print('-------------- \n')

    print('BOGO Conversion Uplift: {0}%'.format(np.round(bogo_conv_uplift*100,2)))
    print('BOGO Order Uplift: {0}'.format(np.round(bogo_order_uplift,2)))
    print('BOGO Revenue Uplift: ${0}'.format(np.round(bogo_rev_uplift,2)))     
    
calc_uplift(df)    

Discount Conversion Uplift: 7.66%
Discount Order Uplift: 1631.89
Discount Revenue Uplift: $40797.35

-------------- 

BOGO Conversion Uplift: 4.52%
BOGO Order Uplift: 967.4
BOGO Revenue Uplift: $24185.01


Discount looks like a better option if we want to get more conversion. It brings 7.6% uptick compared to the customers who didn’t receive any offer. BOGO (Buy One Get One) has 4.5% uptick as well.   


Let’s start exploring which factors are the drivers of this incremental change.   

We check every feature one by one to find out their impact on conversion 

**1. Recency**   
Ideally, the conversion should go down while recency goes up since inactive customers are less likely to buy agai

In [7]:
df_recency = df.groupby('recency').conversion.mean().reset_index()
df_recency

Unnamed: 0,recency,conversion
0,1,0.193029
1,2,0.17779
2,3,0.166328
3,4,0.148907
4,5,0.14235
5,6,0.140717
6,7,0.133889
7,8,0.127897
8,9,0.113957
9,10,0.112624


It goes as expected until 11 months of recency. Then it increases. It can be due to many reasons like having less number of customers in those buckets or the effect of the given offers.   

**2. History**   
We will create a history cluster and observe its impact. Let’s apply k-means clustering to define the significant groups in history:

In [8]:
kmeans = KMeans(n_clusters=5)
kmeans.fit(df[['history']])
df['history_cluster'] = kmeans.predict(df[['history']])

In [9]:
df = order_cluster('history_cluster', 'history',df,True)

df.groupby('history_cluster').agg({'history':['mean','min','max'], 'conversion':['count', 'mean']})

Unnamed: 0_level_0,history,history,history,conversion,conversion
Unnamed: 0_level_1,mean,min,max,count,mean
history_cluster,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0,74.043981,29.99,160.66,32329,0.122553
1,246.808797,160.68,362.87,17924,0.160121
2,478.248033,362.93,644.41,9080,0.180617
3,810.203169,644.47,1109.54,3746,0.192739
4,1409.772009,1110.09,3345.93,921,0.217155


In [10]:
df_plot = df.groupby('history_cluster').conversion.mean().reset_index()
df_plot

Unnamed: 0,history_cluster,conversion
0,0,0.122553
1,1,0.160121
2,2,0.180617
3,3,0.192739
4,4,0.217155


**Customers with higher $ value of history are more likely to convert.**

In [11]:
df.groupby(['used_discount','offer']).agg({'conversion':'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,conversion
used_discount,offer,Unnamed: 2_level_1
0,Buy One Get One,0.169794
0,Discount,0.166388
0,No Offer,0.095808
1,Buy One Get One,0.136286
1,Discount,0.196098
1,No Offer,0.114533


In [12]:
df.groupby(['used_bogo','offer']).agg({'conversion':'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,conversion
used_bogo,offer,Unnamed: 2_level_1
0,Buy One Get One,0.110892
0,Discount,0.168968
0,No Offer,0.099813
1,Buy One Get One,0.18453
1,Discount,0.193974
1,No Offer,0.111416


### 3- Used Discount & BOGO

In [13]:
df.groupby(['used_discount','used_bogo','offer']).agg({'conversion':'mean'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,conversion
used_discount,used_bogo,offer,Unnamed: 3_level_1
0,1,Buy One Get One,0.169794
0,1,Discount,0.166388
0,1,No Offer,0.095808
1,0,Buy One Get One,0.110892
1,0,Discount,0.168968
1,0,No Offer,0.099813
1,1,Buy One Get One,0.251653
1,1,Discount,0.314993
1,1,No Offer,0.180549


**Customers, who used both of the offers before, have the highest conversion rate.**   

### 4- Zip Code

In [16]:
df_plot = df.groupby('zip_code').conversion.mean().reset_index()
df_plot

Unnamed: 0,zip_code,conversion
0,Rural,0.188121
1,Surburban,0.139943
2,Urban,0.139044


**Rural shows better conversion compared to others:**  

### 5- Referral

In [18]:
df_plot = df.groupby('is_referral').conversion.mean().reset_index()
df_plot

Unnamed: 0,is_referral,conversion
0,0,0.17306
1,1,0.120738


**customers from referral channel have less conversion rate : They show almost 5% less conversion.**

### 6. Channel

In [19]:
df_plot = df.groupby('channel').conversion.mean().reset_index()
df_plot

Unnamed: 0,channel,conversion
0,Multichannel,0.171734
1,Phone,0.127155
2,Web,0.159407


Multichannel shows higher conversion rate as we expected. Using more than one channel is an indicator of high engagement.   

### 7- Offer Type

In [20]:
df_plot = df.groupby('offer').conversion.mean().reset_index()
df_plot

Unnamed: 0,offer,conversion
0,Buy One Get One,0.1514
1,Discount,0.182757
2,No Offer,0.106167


Customers who get discount offers show ~18% conversion whereas it is ~15% for BOGO. If customers don’t get an offer, their conversion rate drops by ~4%.

In [21]:
df_model = df.copy()
df_model = pd.get_dummies(df_model)
df_model.head()

Unnamed: 0,recency,history,used_discount,used_bogo,is_referral,conversion,history_cluster,zip_code_Rural,zip_code_Surburban,zip_code_Urban,channel_Multichannel,channel_Phone,channel_Web,offer_Buy One Get One,offer_Discount,offer_No Offer
0,10,142.44,1,0,0,0,0,0,1,0,0,1,0,1,0,0
1,2,45.34,1,0,0,0,0,0,0,1,0,0,1,1,0,0
2,6,134.83,0,1,0,1,0,0,1,0,0,1,0,1,0,0
3,9,46.42,0,1,0,0,0,0,0,1,0,1,0,1,0,0
4,10,32.84,0,1,1,0,0,0,0,1,0,0,1,1,0,0


In [22]:
df_model.conversion.mean()

0.14678125

In [24]:
#create feature set and labels
X = df_model.drop(['conversion'],axis=1)
y = df_model.conversion
X.columns


Index(['recency', 'history', 'used_discount', 'used_bogo', 'is_referral',
       'history_cluster', 'zip_code_Rural', 'zip_code_Surburban',
       'zip_code_Urban', 'channel_Multichannel', 'channel_Phone',
       'channel_Web', 'offer_Buy One Get One', 'offer_Discount',
       'offer_No Offer'],
      dtype='object')

### To get the conversion probabilities, we will use predit_proba() function for each row:

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=56)

xgb_model = xgb.XGBClassifier().fit(X_train, y_train)
X_test['proba'] = xgb_model.predict_proba(X_test)[:,1] 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


In [26]:
X_test.head(5)

Unnamed: 0,recency,history,used_discount,used_bogo,is_referral,history_cluster,zip_code_Rural,zip_code_Surburban,zip_code_Urban,channel_Multichannel,channel_Phone,channel_Web,offer_Buy One Get One,offer_Discount,offer_No Offer,proba
32277,10,137.29,0,1,0,0,0,0,1,0,1,0,0,1,0,0.152918
12824,4,154.85,1,0,0,0,0,0,1,0,1,0,0,1,0,0.177674
20159,1,29.99,1,0,1,0,0,1,0,0,0,1,1,0,0,0.117256
41575,4,288.92,0,1,0,1,0,0,1,0,1,0,0,1,0,0.205907
10736,9,46.57,1,0,1,0,0,1,0,0,0,1,0,1,0,0.094767


From the above, we can see that our model assigned the probability of conversion (from 0 to 1) for each customer.

In [27]:
X_test.proba.mean()

0.14615918695926666

In [28]:
y_test.mean()

0.142890625

In [31]:
X_test.loc[:,['conversion']] = y_test

In [32]:
X_test[X_test['offer_Buy One Get One'] == 1].conversion.mean()

0.14519523030161327

In [33]:
X_test[X_test['offer_Buy One Get One'] == 1].proba.mean()

0.15141303837299347

In [34]:
X_test[X_test['offer_Discount'] == 1].conversion.mean()

0.17772567409144197

In [35]:
X_test[X_test['offer_Discount'] == 1].proba.mean()

0.18134887516498566

In [36]:
X_test[X_test['offer_No Offer'] == 1].conversion.mean()

0.10568341944574917

In [37]:
X_test[X_test['offer_No Offer'] == 1].proba.mean()

0.10563493520021439

## Results on test set   

Now we assume, the difference in the probability of discount, bogo and control group should be similar to conversion differences between them.
We need to use our test set to find it out.
Let’s calculate predicted and real order upticks for discount:


In [38]:
real_disc_uptick = int(len(X_test)*(X_test[X_test['offer_Discount'] == 1].conversion.mean() - X_test[X_test['offer_No Offer'] == 1].conversion.mean()))

pred_disc_uptick = int(len(X_test)*(X_test[X_test['offer_Discount'] == 1].proba.mean() - X_test[X_test['offer_No Offer'] == 1].proba.mean()))



In [39]:
print('Real Discount Uptick - Order: {}, Revenue: {}'.format(real_disc_uptick, real_disc_uptick*25))
print('Predicted Discount Uptick - Order: {}, Revenue: {}'.format(pred_disc_uptick, pred_disc_uptick*25))

Real Discount Uptick - Order: 922, Revenue: 23050
Predicted Discount Uptick - Order: 969, Revenue: 24225


The results are pretty good. The real order uptick was 966 and the model predicted it as 948 (1.8% error).
Revenue uptick prediction comparison: 24150 vs 23700.
We need to check if the results are good for BOGO as well:


In [40]:
real_bogo_uptick = int(len(X_test)*(X_test[X_test['offer_Buy One Get One'] == 1].conversion.mean() - X_test[X_test['offer_No Offer'] == 1].conversion.mean()))

pred_bogo_uptick = int(len(X_test)*(X_test[X_test['offer_Buy One Get One'] == 1].proba.mean() - X_test[X_test['offer_No Offer'] == 1].proba.mean()))



In [41]:
print('Real Discount Uptick - Order: {}, Revenue: {}'.format(real_bogo_uptick, real_bogo_uptick*25))
print('Predicted Discount Uptick - Order: {}, Revenue: {}'.format(pred_bogo_uptick, pred_bogo_uptick*25))

Real Discount Uptick - Order: 505, Revenue: 12625
Predicted Discount Uptick - Order: 585, Revenue: 14625


**Promising results for BOGO:**    
Order uptick - real vs predicted: 563 vs 595   
Revenue uptick — real vs predicted: 14075 vs 14875

The error rate is around 5.6%. The model can benefit from improving the prediction scores on BOGO offer type.
Calculating conversion probabilities help us a lot in different areas as well. We have predicted the return of the different types of offers but it can help us to find out who to target for maximizing the uplift as well