<br><br><br><br>
<h4><b>Target Modelling to maximise the revenue</b></h4>
<br><br>

<pre>  
<b>1. Environment setup-used</b>
    
      M1-Macbook
<br>
<b>2. Notebook Flow</b>
    
      <b>2.1 File read</b>
    
              a. Reading generated training data
              b. Splitting of training data to X and Y
    
      <b>2.2 Exploration of Target Modelling to maximise the revenue </b>
    
              a. Pipeline {Train data + PCA + Linear Regression } --> To estimate revenue from Mutual_Fund
              b. Pipeline {StandardScaler (Train data) + PCA + Lin Reg} --> To estimate revenue from Mutual_Fund
              c. Pipeline {RobustScaled( Train data) + PCA + Lin Reg } --> To estimate revenue from Mutual_Fund

              d. Pipeline {Train data + PCA + XGBoost Regressor } --> To estimate revenue from Mutual_Fund
              e. Pipeline {StandardScaler(Training data) + PCA + XGBoost Reg} --> To estimate revenue from Mutual_Fund
              f. Pipeline {RobustScaled( Training data) + PCA + XGBoost Reg} --> To estimate revenue from Mutual_Fund

              g. Finalizing which aforementioned pipeline works best 
              
              h. Finalized pipeline -----> To estimate revenue from Mutual_Fund
              i. Finalized pipeline -----> To estimate revenue from Credit_Card
              j. Finalized pipeline -----> To estimate revenue from Customer_Loan 

      <b>2.3 Takeaway</b>
              a. Pipeline 5: {StandardScaler (Training data) + PCA + XGBoost Reg} performed better than other pipelines
                 - It achieved MSE of ~24 on training data
    
      <b>2.4 Prediction (or) Inference </b>
              a. Prepping Test datasets
              b. Estimating revenue from Mutual fund, Credit Card, Consumer loan on test data set
              c. Estimating total revenue ( from MF + CC + CL)
      
      <b>2.5 Which clients to be targeted for which offer</b>
             Given Constraints
             - Max 100 client
             - Each client should receive only one offer
              
<br>
</pre> 

In [1]:
import numpy as np
import pandas as pd
import os

#----- Plotting & Visualization
import matplotlib.pyplot as plt
import seaborn as sb
bold_s = '\033[1m' #----- To print bold font
bold_e = '\033[0m'

#----- Pre-Processing & Feature Engineering
from Data_Analysis.utils import Preprocess, file_read
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.decomposition import PCA

#----- Stats
from scipy.stats import norm

#----- Modelling
from sklearn.linear_model import LinearRegression
from xgboost import XGBRFRegressor

#----- Evaluation Metric
from sklearn.metrics import mean_squared_error


<br><br><br>
<h4><b>2.1 File Read</b></h4>
<pre>
              a. Reading generated training data
              b. Splitting of training data to X and Y
</pre>

In [2]:
#----- a. Reading generated dataset
train_data = pd.read_excel('./train_data.xlsx')

print(f'\n{bold_s}#----- Training dataset Shape: {train_data.shape}{bold_e}')


[1m#----- Training dataset Shape: (949, 36)[0m


In [3]:
#----- b. Splitting of training data to X and Y
target = ['Client', 'Sale_MF', 'Sale_CC', 'Sale_CL', 'Revenue_MF', 'Revenue_CC', 'Revenue_CL'  ]
input_var = list(train_data.columns)[1:-6]

train_y = train_data[target]
train_x = train_data[input_var]

print(f'\n{bold_s}#----- Training dataset X Shape: {train_x.shape}{bold_e}')
print(f'\n{bold_s}#----- Training dataset Y Shape: {train_y.shape}{bold_e}')


[1m#----- Training dataset X Shape: (949, 29)[0m

[1m#----- Training dataset Y Shape: (949, 7)[0m


<br><br><br>
<h4><b>2.1 Exploration of Propensing Modelling</b></h4>
<pre>
      
    
              a. Pipeline {Train data + PCA + Linear Regression } --> To estimate revenue from Mutual_Fund
              b. Pipeline {StandardScaler (Train data) + PCA + Lin Reg} --> To estimate revenue from Mutual_Fund
              c. Pipeline {RobustScaled( Train data) + PCA + Lin Reg } --> To estimate revenue from Mutual_Fund

              d. Pipeline {Train data + PCA + XGBoost Regressor } --> To estimate revenue from Mutual_Fund
              e. Pipeline {StandardScaler(Training data) + PCA + XGBoost Reg} --> To estimate revenue from Mutual_Fund
              f. Pipeline {RobustScaled( Training data) + PCA + XGBoost Reg} --> To estimate revenue from Mutual_Fund

              g. Finalizing which aforementioned pipeline works best 
              
              h. Finalized pipeline -----> To estimate revenue from Mutual_Fund
              i. Finalized pipeline -----> To estimate revenue from Credit_Card
              j. Finalized pipeline -----> To estimate revenue from Customer_Loan 

</pre>

In [4]:
#----- Prepping dataset
#----- Applying Scalers

#----- Standard scaler
scaler_std = StandardScaler()
scaled_std_data_x = scaler_std.fit_transform(train_x)

#----- Robust scaler
scaler_robust = RobustScaler()
scaler_robust_data_x = scaler_robust.fit_transform(train_x)


In [5]:
#----- Diemnsionality Reduction via PCA
#----- For plain data
pca = PCA(0.99) 
pca_data = pca.fit_transform(train_x)
print(f'\n{bold_s}#----- PCA on training data{bold_e}')
print(f'(#training records, #pca components): {pca_data.shape}')

#----- For Standar Scaler applied data
pca_ss = PCA(0.99)
pca_ss_data = pca_ss.fit_transform(scaled_std_data_x)
print(f'\n{bold_s}#----- PCA on StandardScaler applied training data{bold_e}')
print(f'(#training records, #pca components): {pca_ss_data.shape}')


#----- For Robust Scaler applied data
pca_rs = PCA(0.99)
pca_rs_data = pca.fit_transform(scaler_robust_data_x)
print(f'\n{bold_s}#----- PCA on RobustScaler applied training data{bold_e}')
print(f'(#training records, #pca components): {pca_rs_data.shape}')
print(f'\n\n\n')


[1m#----- PCA on training data[0m
(#training records, #pca components): (949, 4)

[1m#----- PCA on StandardScaler applied training data[0m
(#training records, #pca components): (949, 23)

[1m#----- PCA on RobustScaler applied training data[0m
(#training records, #pca components): (949, 2)






In [6]:
#----- a. Pipeline {Train data + PCA + Linear Regression } --> To estimate revenue from Mutual_Fund 
#----- b. Pipeline {StandardScaler (Train data) + PCA + Lin Reg} --> To estimate revenue from Mutual_Fund 
#----- c. Pipeline {RobustScaled( Train data) + PCA + Lin Reg } --> To estimate revenue from Mutual_Fund 

pipeline_data = [pca_data, pca_ss_data, pca_rs_data]

for ind,data in enumerate(pipeline_data):
    #------ Modelling
    pipeline = LinearRegression().fit(data, train_y['Revenue_MF'])
    pipeline_pred = pipeline.predict(data)

    #----- Training data evaluation
    mse = mean_squared_error(train_y['Revenue_MF'],pipeline_pred )
    print(f'\n{bold_s}#----- Training data\'s MSE on Pipeline - {ind +1} for MF purchases : {mse:.3f}{bold_e}')



[1m#----- Training data's accuracy score on Pipeline - 1 for MF purchases : 100.679[0m

[1m#----- Training data's accuracy score on Pipeline - 2 for MF purchases : 98.842[0m

[1m#----- Training data's accuracy score on Pipeline - 3 for MF purchases : 100.709[0m


In [7]:
#----- d. Pipeline {Train data + PCA + XGBoost } --> To estimate likelihood of MF sale
#----- e. Pipeline {StandardScaled(Training data) + PCA + XGBoost} --> To estimate likelihood of MF sale
#------ f. Pipeline {RobustScaled( Training data) + PCA + XGBoost} --> To estimate likelihood of MF sale

pipeline_data = [pca_data, pca_ss_data, pca_rs_data]

for ind,data in enumerate(pipeline_data):
    #------ Modelling
    pipeline = XGBRFRegressor(min_child_weight = 1).fit(data, train_y['Revenue_MF'])
    pipeline_pred = pipeline.predict(data)
    
    #----- Training data evaluation
    mse = mean_squared_error(train_y['Revenue_MF'],pipeline_pred )
    print(f'\n{bold_s}#----- Training data\'s MSE on Pipeline - {ind +1} for MF purchases : {mse:.3f}{bold_e}')



[1m#----- Training data's accuracy score on Pipeline - 1 for MF purchases : 32.538[0m

[1m#----- Training data's accuracy score on Pipeline - 2 for MF purchases : 24.192[0m

[1m#----- Training data's accuracy score on Pipeline - 3 for MF purchases : 91.058[0m


<br><br><br>
<pre>
<b>#-----  g. Finalizing which aforementioned pipeline works best for Mutual fund purchases
</b>
    From the above confusion matrix Pipeline 5: Standard Scalar ( Training data ) + PCA + XGBoost Regressor works the best
</pre><br><br><br>

In [10]:
#----- h. Finalized pipeline -----> To estimate the revenue of MF 

#------ Modelling
fin_pipeline_MF = XGBRFRegressor(min_child_weight = 1).fit(pca_ss_data, train_y['Revenue_MF'])
pipeline_pred = fin_pipeline_MF.predict(pca_ss_data)
    
#----- Training data evaluation
mse = mean_squared_error(train_y['Revenue_MF'],pipeline_pred )
print(f'\n{bold_s}#----- Training data\'s MSE on finalized pipeline of Mutual fund : {mse:.3f}{bold_e}')


[1m#----- Training data's MSE on finalized pipeline of Mutual fund : 24.192[0m


In [11]:
#----- i. Finalized pipeline -----> To estimate the revenue of CC

#------ Modelling
fin_pipeline_CC = XGBRFRegressor(min_child_weight = 1).fit(pca_ss_data, train_y['Revenue_CC'])
pipeline_pred = fin_pipeline_CC.predict(pca_ss_data)
    
#----- Training data evaluation
mse = mean_squared_error(train_y['Revenue_CC'],pipeline_pred )
print(f'\n{bold_s}#----- Training data\'s MSE on finalized pipeline of Credit card : {mse:.3f}{bold_e}')


[1m#----- Training data's MSE on finalized pipeline of Credit card : 28.395[0m


In [12]:
#----- j. Finalized pipeline -----> To estimate the revenue of CL

#------ Modelling
fin_pipeline_CL = XGBRFRegressor(min_child_weight = 1).fit(pca_ss_data, train_y['Revenue_CL'])
pipeline_pred = fin_pipeline_CL.predict(pca_ss_data)
    
#----- Training data evaluation
mse = mean_squared_error(train_y['Revenue_CL'],pipeline_pred )
print(f'\n{bold_s}#----- Training data\'s MSE on finalized pipeline of Consumer Loan : {mse:.3f}{bold_e}')


[1m#----- Training data's MSE on finalized pipeline of Consumer Loan : 24.769[0m


<br><br><br>
<h4><b>2.4 Prediction (or) Inference</b></h4>
<pre>
                  a. Prepping Test datasets
                  b. Estimating revenue from Mutual fund, Credit Card, Consumer loan on test data set
                  c. Estimating total revenue ( from MF + CC + CL)
</pre>

In [39]:
#----- a. Prepping test dataaset

test_data = pd.read_excel('./test_data.xlsx')

#-----  Removing Client from test data
client_list = list(test_data['Client'])
test_x = test_data.drop(columns = 'Client', axis = 'columns' )

print(f'\n{bold_s}#----- Testing dataset X Shape: {test_x.shape}{bold_e}')



[1m#----- Testing dataset X Shape: (635, 29)[0m


In [14]:
#-----  Applying finalised pipeline onto the testdata
scaled_std_data_test_x = scaler_std.transform(test_x)

pca_ss_data_test_x = pca_ss.transform(scaled_std_data_test_x)

print(f'\n{bold_s}#----- PCA on StandardScaler applied test data{bold_e}')
print(f'(#training records, #pca components): {pca_ss_data_test_x.shape}')


[1m#----- PCA on StandardScaler applied test data[0m
(#training records, #pca components): (635, 23)


In [24]:
#----- b. Estimating revenue from Mutual fund, Credit Card, Consumer loan on test data set

#----- Prediction for MF purchase 
predicted_test_MF = fin_pipeline_MF.predict(pca_ss_data_test_x)
print(f'\n{bold_s}#----- Estimated total revenue from MF : {predicted_test_MF.sum()}{bold_e}')

#----- Prediction for CC purchase 
predicted_test_CC = fin_pipeline_CC.predict(pca_ss_data_test_x) 
print(f'\n{bold_s}#----- Estimated total revenue from CC : {predicted_test_CC.sum()} {bold_e}')

#----- Prediction for CL purchase 
predicted_test_CL = fin_pipeline_CL.predict(pca_ss_data_test_x)
print(f'\n{bold_s}#----- Estimated total revenue form CL : {predicted_test_CL.sum()} {bold_e}\n\n')




[1m#----- Estimated total revenue from MF : 1444.394287109375[0m

[1m#----- Estimated total revenue from CC : 1910.623291015625 [0m

[1m#----- Estimated total revenue form CL : 2380.324951171875 [0m




In [23]:
#----- c. Estimating total revenue ( from MF + CC + CL)

#----- Predicted total revenue
total_revenue = predicted_test_MF.sum() + predicted_test_CC.sum() + predicted_test_CL.sum()
print(f'\n{bold_s}#----- Estimated total revenue (from MF + CC + CL) :{total_revenue} {bold_e}')


[1m#----- Estimated total revenue (from MF + CC + CL) :5735.3427734375 [0m


<br><br><br>
<h4><b>2.5 Which clients to be targeted for which offer</b></h4>
<pre>

             Given Constraints
             - Max 100 client
             - Each client should receive only one offer

</pre>
              

In [48]:
target = pd.DataFrame(columns = ['Client', 'Revenue_MF', 'Revenue_CC', 'Revenue_CL' ])

target['Client'] = client_list
target['Revenue_MF'] = predicted_test_MF
target['Revenue_CC'] = predicted_test_CC
target['Revenue_CL'] = predicted_test_CL

target =  pd.melt(target,
                            id_vars='Client',
                            value_vars=['Revenue_MF', 'Revenue_CC', 'Revenue_CL'],
                            value_name='Revenue')

#----- Sort client based on Revenue
optimized = target.sort_values(by = ['Revenue'], ascending= [False])

#----- Drop duplicates on 'Client_ID' keeping the highest Revenue         #------- To satisfy constraint 1
optimized.drop_duplicates(subset='Client', keep='first',inplace= True)

#----- Select top 100 (in case there are more than 100 unique Client_IDs)  #------- To satisfy constraint 1
display(optimized.head(5))

optimized.to_excel('./Target_client_target_offer.xlsx', index = 'ignore')

Unnamed: 0,Client,variable,Revenue
1178,1349,Revenue_CC,111.645798
1114,32,Revenue_CC,104.307976
831,748,Revenue_CC,95.9757
1007,467,Revenue_CC,87.140228
531,786,Revenue_MF,84.769348
