# MMM Channel Attribution script

Importing all the required libraries for the code implementations.

In [1]:
import pandas as pd
import numpy as np
import os
import operator
import sys 
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler, Imputer
from scipy.stats import norm

All the input files should start with __ABT_(channel_name).xlsx__ for example __ABT_newspaper.xlsx__.<br>

Input files should be placed along with this jupyter notebook in same folder. Irrespective of the number of channels i.e. we can have any number of input files for processing.

below code will extract all the files starting with ABT_

In [2]:
files = []
path = os.getcwd()
for r, d, f in os.walk(path):
    for file in f:
        if 'ABT_' in file:
            files.append(os.path.join(r, file))

for f in files:
    print(f)

/Users/sk186089/Desktop/MMM/ABT_fb.xlsx
/Users/sk186089/Desktop/MMM/ABT_radio.xlsx
/Users/sk186089/Desktop/MMM/ABT_newspaper.xlsx
/Users/sk186089/Desktop/MMM/ABT_TV.xlsx
/Users/sk186089/Desktop/MMM/ABT_twitter.xlsx


---

We will define 5 functions which is 5 different machine learning models. Each and every input file will iterate through these 5 models, and model with highest accuracy will be chosen for final predictions. In remaining 2 function one is for iteration of files and another one is for selection of best model. Functions are as follow<br>
1. logistic_regression
2. support_vector
3. random_forrest
4. xgboost
5. random_search
6. selecting_best_model


## Logistic Regression

Logisctic Regression Model, very basic classification Model.<br> 

Following are the steps performed:
- Feature Scaling
- Model Fitting 
- Prediction 
- Model Accuracy

Function returns AUC value(a performance parameter along with it's model)

In [3]:
def logistic_regression(X_train, X_test, y_train, y_test):
    
    # Feature Scaling
    # Importing Library required for feature scalling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Fitting Logistic Regression to the Training set
    # Importing library for Logistic Regression Model
    from sklearn.linear_model import LogisticRegression
    classifier = LogisticRegression(random_state = 0)
    classifier.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = classifier.predict(X_test)

    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # calculate the fpr and tpr for all thresholds of the classification
    import sklearn.metrics as metrics
    probs = classifier.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    return metrics.auc(fpr, tpr), classifier

## SVM

Support Vector Machine, classification Model.<br> 

Following are the steps performed:
- Feature Scaling
- Model Fitting 
- Prediction 
- Model Accuracy

Function returns AUC value(a performance parameter along with it's model)

In [4]:
def support_vector(X_train, X_test, y_train, y_test):  
    
    # Feature Scaling
    # Importing Library required for feature scalling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    # Fitting Support Vector to the Training set
    # Importing library for Support Vector Model
    from sklearn.svm import SVC
    classifier = SVC(kernel = 'linear', random_state = 0, probability=True)
    classifier.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = classifier.predict(X_test)

    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # calculate the fpr and tpr for all thresholds of the classification
    import sklearn.metrics as metrics
    probs = classifier.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    return metrics.auc(fpr, tpr), classifier

## Random Forrest 

Random Forrest, an ensamle Technique classification Model.<br> 

Following are the steps performed:
- Feature Scaling
- Model Fitting 
- Prediction 
- Model Accuracy

Function returns AUC value(a performance parameter along with it's model)

In [5]:
def random_forrest(X_train, X_test, y_train, y_test):
    
    # Feature Scaling
    # Importing Library required for feature scalling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    # Fitting Random Forrest to the Training set
    # Importing library for Random Forrest Model
    from sklearn.ensemble import RandomForestClassifier
    classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
    classifier.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = classifier.predict(X_test)

    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # calculate the fpr and tpr for all thresholds of the classification
    import sklearn.metrics as metrics
    probs = classifier.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    return metrics.auc(fpr, tpr), classifier


## XG Boost

Gradient Boosting, an ensamle Technique classification Model.<br> 

Following are the steps performed:
- Feature Scaling
- Model Fitting 
- Prediction 
- Model Accuracy

Function returns AUC value(a performance parameter along with it's model)

In [6]:
def xgboost(X_train, X_test, y_train, y_test):
    
    # Feature Scaling
    # Importing Library required for feature scalling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    # Fitting XGBoost to the Training set
    # Importing library for XGBoost Model
    from xgboost import XGBClassifier
    classifier = XGBClassifier()
    classifier.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = classifier.predict(X_test)

    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # calculate the fpr and tpr for all thresholds of the classification
    import sklearn.metrics as metrics
    probs = classifier.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    return metrics.auc(fpr, tpr), classifier
    

## Using Cross Validation, Grid Search and Parameter Tunning 

Gradient Boosting, an ensamle Technique classification Model.<br> 

Following are the steps performed:
- Feature Scaling
- Model Parameter selection inputs 
- Model Fitting 
- Prediction 
- Model Accuracy

Function returns AUC value(a performance parameter along with it's model)

In [7]:
def random_search(X_train, X_test, y_train, y_test):

    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    from sklearn.model_selection import StratifiedKFold, RandomizedSearchCV
    from sklearn.ensemble import RandomForestClassifier
    # Number of Folds
    fold = 5
    paramCombinations = 10
    skf = StratifiedKFold(random_state=42,shuffle=True,n_splits=fold)
    param_RF = {
                'n_estimators' : [10,50,100,150,200],
                'max_depth'    : [3, 4, 5, 8, 10, 12]
                }

    
    rf = RandomForestClassifier()

    rand_search = RandomizedSearchCV(rf,
                                     param_distributions=param_RF,
                                     #verbose= 3, #Uncomment if you want to see details of execution
                                     scoring= 'roc_auc',
                                     random_state=42,
                                     n_iter=paramCombinations, 
                                     cv = skf.split(X_cv,y_cv))

    rand_search.fit(X_train, y_train)
    
    # Predicting the Test set results
    y_pred = rand_search.predict(X_test)
    
    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # calculate the fpr and tpr for all thresholds of the classification
    import sklearn.metrics as metrics
    probs = rand_search.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    return metrics.auc(fpr, tpr), rand_search

## Comparing and Selecting Models

Function created to select the best performing model and create a scoring dataset using that model so as to genrate output which will be used for Markov Chain.

In [8]:
def selecting_best_model(roc_auc_LR, roc_auc_RF, roc_auc_SVM, roc_auc_XGB, roc_auc_RS, classifier_LR, classifier_SVM, classifier_RF, classifier_XGB, rand_search, channel):
    
    # Creating Dictionary in which Key will be Model Synonym and Value will be AUC number.
    AUC_DICT = {'LR':roc_auc_LR, 
                'RF':roc_auc_RF, 
                'SVM':roc_auc_SVM,
                'XGB':roc_auc_XGB,
                'RS':roc_auc_RS}

    # Creating Dictionary in which Key will be Model Synonym and Value will be Models.
    classifier_DICT = {'LR':classifier_LR,
                       'SVM':classifier_SVM,
                       'RF':classifier_RF,
                       'XGB':classifier_XGB,
                       'RS':rand_search}
    
    # Slecting key with Maximum AUC value i.e the mode with highest accuracy
    max_auc = max(AUC_DICT.items(), key=operator.itemgetter(1))[0]
    
    
    # Scoring entire table with Model selected above X_final is the entire dataset.
    probs_f = classifier_DICT[max_auc].predict_proba(X_final)
    preds_f = probs_f[:,1]
    prob    = pd.DataFrame(preds_f)
    df['Probablity'] = prob
    
    # To calculate lift we are considering all the records with Application Flag as '1'
    df_lift = df.iloc[:,[0,-2,-1]][df['app_flg'] == 1]
    # Sorting entire filtered dataset on Probablity in ascending order
    df_lift = df_lift.sort_values(by='Probablity',ascending=False)
    # Genrating a row id column which will help us to create groups of rows
    df_lift['row_id'] = range(0,0+len(df_lift))
    # Bucketing entire data set 10% of rows i.e deciles
    df_lift['decile'] = (df_lift['row_id']/(len(df_lift)/10)).astype(int)
    # Taking average probablity of 2nd decile bucket, which is our cutoff value.
    cutoff = df_lift.iloc[:,2][df_lift['decile'] == 2].mean()
    # Creating final dataset
    df_final = df.iloc[:,[0,-1]][df['app_flg'] == 1]
    
    # A small function created so that if probability is more than cutoff we can replace the same 
    # with channel name
    def f(row):
        if row['Probablity'] > cutoff:
            val = channel
        else:
            val = 0
        return val
    df_final['conversion'] = df_final.apply(f, axis=1)
    df_final['channel'] = channel
    df_final['cutoff'] = cutoff
    return df_final

## Running for all the files

Now we will take all the available files and loop through functions which we had defined previously. And append to a final dataframe one by one

In [9]:
df_markov = pd.DataFrame()
for f in files:
    
    x = f.find('ABT_')+ 4
    y = x - len(f)
    channel = f[y:-5]
    print('Channel: ',channel)
    print('File Name: ',f)
    df = pd.read_excel(f)
    X  = df.iloc[:,1:-1].values
    y  = df.iloc[:,-1].values
    
    #Splitting entire dataset into Train Test and further splitting it into Cross validation sets.
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
    X_train, X_cv, y_train, y_cv     = train_test_split(X_train, y_train, test_size=0.10, random_state=15)

    from sklearn.preprocessing import StandardScaler
    sc_X = StandardScaler()
    X_final = sc_X.fit_transform(X)
    
    # Running all the models iteratively for all the available files.
    roc_auc_LR,  classifier_LR  = logistic_regression(X_train, X_test, y_train, y_test)
    roc_auc_SVM, classifier_SVM = support_vector(X_train, X_test, y_train, y_test)
    roc_auc_RF,  classifier_RF  = random_forrest(X_train, X_test, y_train, y_test)
    roc_auc_XGB, classifier_XGB = xgboost(X_train, X_test, y_train, y_test)
    roc_auc_RS,  rand_search    = random_search(X_train, X_test, y_train, y_test)
    
    
    
    df_final = selecting_best_model(roc_auc_LR, roc_auc_RF, roc_auc_SVM, roc_auc_XGB, roc_auc_RS, classifier_LR, classifier_SVM, classifier_RF, classifier_XGB, rand_search, channel)
    #Creating final dataset for Markov Chain
    df_markov = df_markov.append(df_final)
    df_markov.tail()

Channel:  fb
File Name:  /Users/sk186089/Desktop/MMM/ABT_fb.xlsx


  if diff:


Channel:  radio
File Name:  /Users/sk186089/Desktop/MMM/ABT_radio.xlsx


  if diff:


Channel:  newspaper
File Name:  /Users/sk186089/Desktop/MMM/ABT_newspaper.xlsx


  if diff:


Channel:  TV
File Name:  /Users/sk186089/Desktop/MMM/ABT_TV.xlsx


  if diff:


Channel:  twitter
File Name:  /Users/sk186089/Desktop/MMM/ABT_twitter.xlsx


  if diff:


In [10]:
df.head()

Unnamed: 0,cust_no,cust_seg,tenure,gender,age,total_rev_Amt,prob_exp_mon1,prob_exp_mon2,prob_exp_mon3,prob_exp_mon4,prob_exp_mon5,prob_exp_mon6,prob_exp_mon7,prob_exp_mon8,prob_exp_mon9,prob_exp_mon10,prob_exp_mon11,prob_exp_mon12,app_flg,Probablity
0,8791958,1,40,0,50,4281552,0.262,0.572,0.186,0.92,0.571,0.972,0.326,0.181,0.779,0.027,0.102,0.962,1,0.4963
1,8770539,1,56,1,84,4551934,0.542,0.696,0.522,0.624,0.371,0.59,0.556,0.244,0.841,0.101,0.623,0.366,0,0.464562
2,1040153,4,37,1,50,896681,0.234,0.682,0.792,0.366,0.043,0.842,0.947,0.665,0.317,0.668,0.981,0.257,0,0.479115
3,2561218,2,10,0,83,3693221,0.261,0.924,0.046,0.026,0.605,0.952,0.219,0.0,0.056,0.636,0.06,0.753,0,0.492903
4,4973530,3,58,1,35,1924669,0.543,0.628,0.681,0.515,0.218,0.805,0.816,0.02,0.046,0.647,0.575,0.965,0,0.465116


Creating dataset Markov. This dataset, after some preprocessing, will serve as a input for Markov Chain R script.

In [11]:
df_markov["conversion"]= df_markov["conversion"].replace('fb', "FB")
df_markov["conversion"]= df_markov["conversion"].replace('radio', "R")
df_markov["conversion"]= df_markov["conversion"].replace('newspaper', "NP")
df_markov["conversion"]= df_markov["conversion"].replace('twitter', "Tw")

df_markov.to_excel('markov.xlsx')
df_markov = pd.read_excel('markov.xlsx',usecols = [i for i in range(1,6)])
cl = list(df_markov['cust_no'])

## Morkov Chain input file creation

In [12]:
def concatenate_list_data(list):
    result= ''
    for element in list:
        result += str(element) + '>'
    return result

In [13]:
# Empty dictionary for holding Channels and Values
d = {}
# 'i' is the customer number 
for i in cl:
    temp = df_markov.loc[(df_markov['cust_no'] == i) & (df_markov['conversion'] != 0),["cust_no",'conversion']] 
    temp = temp.pivot(index='cust_no', columns='conversion',values='conversion')
    key = temp.index.values
    try:
        key = key[0]
    except:
        pass
    values = list(temp.columns.values)
    values = concatenate_list_data(values)
    try:
        d.update({key:values})
    except:
        pass

In [14]:
df_markov_final = pd.DataFrame(list(d.items()))
df_markov_final.columns = ['Cutomer','Path']
df_markov_final['Path'] = df_markov_final['Path'].str[:-1]
#df_markov_final.to_excel('Final_Markov.xlsx')
temp = df_markov_final.Path.value_counts()
df_Markov_R_input = temp.to_frame()
df_Markov_R_input = df_Markov_R_input.reset_index()
df_Markov_R_input.columns = ['Path','Conversion']
# Writing file back to csv
df_Markov_R_input.to_csv('Final_Path_count.csv', index= False)

---

## R script for Markov Chain

At this moment we have to execute below R script separately. 

Input file for the script is same as created in above step

In [None]:
# install.packages("ChannelAttribution")
# library(ChannelAttribution)
# 
# setwd <- setwd('')
# df <- read.csv('Final_Path_count.csv')
# df <- df[c(1,2)]
# df[2]
# 
# M <- markov_model(df, 'Path', var_value = 'Conversion', var_conv = 'Conversion', sep = '>', order=1, out_more = TRUE)
# 
# write.csv(M$result, file = "Markov - Output - Conversion values.csv", row.names=FALSE)
# write.csv(M$transition_matrix, file = "Markov - Output - Transition matrix.csv", row.names=FALSE)

---

## Scurve Calculation

In [15]:
import statsmodels.formula.api as sm

Reading __s_curves_All.xlsx__ file. As the data is scattered into different Tabs, we will read each tab in different dataframe and run the Scurve code in a loop which will execute each dataframe seperatly.

In [16]:
xls = pd.ExcelFile('s_curves_All.xlsx')
df1 = pd.read_excel(xls, 'TV')
df2 = pd.read_excel(xls, 'FB')
df3 = pd.read_excel(xls, 'Tw')
df4 = pd.read_excel(xls, 'NP')
df5 = pd.read_excel(xls, 'R')
datafames = [df1,df2,df3,df4,df5]

In [17]:
for i in datafames:
    indatafr=i
    name = list(indatafr.channel.unique())
    name = str(name[0])
    indatafr = indatafr.rename(columns=lambda x: x.strip().lower())
    indataX = indatafr.loc[:, indatafr.columns != 'app_cnt']
    indataY = indatafr.loc[:, indatafr.columns == 'app_cnt']
    indataXR=indataX[['spend_0_50k', 'spend_50k_100k', 'spend_100k']]
    lm6 = sm.ols(formula='indataY ~ indataXR', data=indatafr).fit()
    
    # Taking Parameter part of Scurve model lm6
    a = lm6.params.values
    
    # To swap the larger value with smaller one
    for i in range(1,len(a)-1):
        if a[i+1] > a[i]:
            pass
        else:
            a[i+1] = a[i] 
    df_final = pd.DataFrame(a[1:4])
    spend = [50000,100000,150000]
    df_final['spend'] = spend
    df_final.columns = [name,'Spend']
    df_final = df_final[['Spend',name]]
    # Writting File back to scv this file will be further used in Scurve graph creation.
    df_final.to_excel('Scurve_Output_'+name+'.xlsx')