## Credit Card Fraud Detection Using Predictive Machine Learning Models ##


<img src= "https://ai-journey.com/wp-content/uploads/2019/06/fraud-EMV-chip-credit-card.jpg" alt ="Credit Card Fraud Detection" style='width: 600px;'>

## Introduction ##

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have **492 frauds** out of **284,807** transactions. The dataset is highly unbalanced, the positive class (frauds) account for **0.172% of all transactions**.

It contains only numerical input variables which are the result of a PCA transformation.

Due to **confidentiality** issues, there are not provided the original features and more background information about the data.

**Features V1, V2, ... V28 are the principal components obtained with PCA;**
The only features which have not been transformed with PCA are Time and Amount. Feature Time contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature Amount is the transaction Amount, this feature can be used for example-dependant **cost-senstive learning**.

Feature Class is the response variable and it takes value 1 in case of fraud and 0 otherwise.

# Load packages #

In [None]:
import pandas as pd 
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings('ignore')

import gc
from datetime import datetime 
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier 
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from catboost import CatBoostClassifier
from sklearn import svm
import lightgbm as lgb
from lightgbm import LGBMClassifier
import xgboost as xgb

pd.set_option('display.max_columns', 100)


RFC_METRIC = 'gini'  #metric used for RandomForrestClassifier
NUM_ESTIMATORS = 100 #number of estimators used for RandomForrestClassifier
NO_JOBS = 4 #number of parallel jobs used for RandomForrestClassifier


#TRAIN/VALIDATION/TEST SPLIT
#VALIDATION
VALID_SIZE = 0.20 # simple validation using train_test_split
TEST_SIZE = 0.20 # test size using_train_test_split

#CROSS-VALIDATION
NUMBER_KFOLDS = 5 #number of KFolds for cross-validation



RANDOM_STATE = 2018

MAX_ROUNDS = 1000 #lgb iterations
EARLY_STOP = 50 #lgb early stop 
OPT_ROUNDS = 1000  #To be adjusted based on best validation rounds
VERBOSE_EVAL = 50 #Print out metric result

IS_LOCAL = False

import os

PATH = "/kaggle/input/creditcardfraud/creditcard.csv"

# Read the Data #

In [None]:
data = pd.read_csv("/kaggle/input/creditcardfraud/creditcard.csv")

# Check the Data #

In [None]:
print("Credit Card Fraud Detection data -  rows:",data.shape[0]," columns:", data.shape[1])

# Glimpse of the data #

In [None]:
data.head()

In [None]:
data.describe()

Looking to the Time feature, we can confirm that the data contains **284,807** transactions, **during 2 consecutive days (or 172792 seconds).**

# Check missing data #

In [None]:
total = data.isnull().sum().sort_values(ascending = False)
percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
pd.concat([total, percent], axis=1, keys=['Total', 'Percent']).transpose()

### Wow - No Missing Data !!!!! ###

# Check for Data Imbalance #

In [None]:
temp = data["Class"].value_counts()
df = pd.DataFrame({'Class': temp.index,'values': temp.values})

trace = go.Bar(
    x = df['Class'],y = df['values'],
    name="Credit Card Fraud Class - data unbalance (Not fraud = 0, Fraud = 1)",
    marker=dict(color="Blue"),
    text=df['values']
)
temp_data = [trace]
layout = dict(title = 'Credit Card Fraud Class - data unbalance (Not fraud = 0, Fraud = 1)',
          xaxis = dict(title = 'Class', showticklabels=True), 
          yaxis = dict(title = 'Number of transactions'),
          hovermode = 'closest',width=600
         )
fig = dict(data=temp_data, layout=layout)
iplot(fig, filename='class')

**Only 492 (or 0.172%) of transaction are fraudulent**. That means the data is highly unbalanced with respect with target variable Class. 

### Data Exploration ###

In [None]:
class_0 = data.loc[data['Class'] == 0]["Time"]
class_1 = data.loc[data['Class'] == 1]["Time"]

hist_data = [class_0, class_1]
group_labels = ['Not Fraud', 'Fraud']

fig = ff.create_distplot(hist_data, group_labels, show_hist=False, show_rug=False)
fig['layout'].update(title='Credit Card Transactions Time Density Plot', xaxis=dict(title='Time [s]'))
iplot(fig, filename='dist_only')

## Insights ##

1) Low Real i.e Non Fraud Transactions during night times 

2) Fraudulent transactions have more even distributions

3) Fraudulent transactions happen consistently over night time

## Transactions amount ##

In [None]:
fig, ax1 = plt.subplots(ncols=1, figsize=(6,6))
s = sns.boxplot(ax = ax1, x="Class", y="Amount", hue="Class",data=data, palette="PRGn",showfliers=False)
plt.show();

In [None]:
plt.hist(data["Amount"], bins=20)
plt.gca().set(title='Frequency Histogram', ylabel='Frequency');

### Data is highly skewed and has a long tail towards the right side 

In [None]:
plt.hist(np.log(data["Amount"] +1), bins=50)
plt.gca().set(title='Frequency Histogram', ylabel='Frequency');

### Log transformation - reduces the skewness 

## Features correlation ##

In [None]:
plt.figure(figsize = (14,14))
plt.title('Credit Card Transactions features correlation plot (Pearson)')
corr = data.corr()
sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.columns,linewidths=.1,cmap="Reds")
plt.show()

### Since the data points are PCA i.e. Principal Component Analysis - they are **uncorrelated** ###

### Note : if we have 2 feature with high correlation then we need to select 1 feature and drop another feature based on statistics significance.

### You can find Information Value of both the features and select the feature which has high IV

In [None]:
orig_data = data.copy()

data["Amount"] = np.log(data["Amount"] + 1)

# Split data in train, test and validation set #

In [None]:
train_df, test_df = train_test_split(data, test_size=TEST_SIZE, random_state=RANDOM_STATE, shuffle=True )
train_df, valid_df = train_test_split(train_df, test_size=VALID_SIZE, random_state=RANDOM_STATE, shuffle=True )

In [None]:
print(train_df.shape)
print(test_df.shape)
print(valid_df.shape)

### Balancing the train data

In [None]:
do_balancing = True

if do_balancing :

    # Lets make the event rate as 1% 

    train_fraud_df  = train_df[train_df['Class'] ==1]
    no_of_fraud = train_fraud_df.shape[0]
    print("Total Fraud in Train Data :" ,no_of_fraud)

    no_of_non_fraud = no_of_fraud * 99
    train_non_fraud_df = train_df[train_df['Class'] ==0].sample( no_of_non_fraud , random_state =2021)
    no_of_non_fraud = train_non_fraud_df.shape[0]
    print("Total non Fraud in Train Data :" ,no_of_non_fraud)

    # join the data 

    train_df = pd.concat([train_fraud_df, train_non_fraud_df] , axis =0 ) # concat  row wise
    train_df = train_df.sample(frac = 1)


In [None]:
target = 'Class'
predictors = ['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',\
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19',\
       'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28',\
       'Amount']


### Simple Model  - Decision Tree ###

In [None]:
clf = DecisionTreeClassifier(random_state=RANDOM_STATE,
                             max_depth= 2)

In [None]:
%%time
clf.fit(train_df[predictors], train_df[target].values)

### Plot the tree ###

In [None]:

from sklearn import tree
from sklearn.tree import export_graphviz

tree.export_graphviz(clf,out_file='tree.dot',feature_names = predictors,
class_names = ['Non-Fraud' ,'Fraud'],rounded = True, proportion = False, precision = 2, filled = True)  

!dot -Tpng tree.dot -o tree.png
from IPython.display import Image
Image(filename = 'tree.png')

In [None]:
def model_function(row):
    
    if row['V17'] <= -2.04:
        return 1
    elif row['V14'] <= -4.64:
        return 1
    else :
        return 0
    


In [None]:
valid_df['prediction'] = valid_df.apply(model_function , axis = 1)

In [None]:
valid_df['prediction'].value_counts()

## Check the performance ##

In [None]:
cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

### 1) 45447 ---- True Negative --> These were non fraud and model predicted them as non fraud 

### 2) 71 ---- True Positive --> These were fraud txn and model predicted them as fraud txn

### 3) 31 ----  False Negative --> These were fraud txn but our model failed to predict them as fraud 

### 4) 20 ------ False Positive ---> These were non fraud txn but our model predicted them as fraud 


## Amazing ... isn't it!!!! ##

### With a Simple Decision Tree - we were able to detect 70% of Fraud ###

### We Know what the model is doing internally ###

### implementing this model is very cost effective as it can be coded in any of the existing language like Java , C# or even Mainframe ###

<img src= "http://static.financialexpress.com/m-images/M_Id_459416_Savings.jpg" alt ="Fraud Loss Saved" style='width: 600px;'>

### Wait  !!!!!!! ###

### Business wants to how much money is saved ###

In [None]:
metric_data = pd.DataFrame(columns =['Model Name','Detection Rate' ,'AUROC','F1 Score','Accuracy','Fraud Loss Saved'])
metric_data.shape

In [None]:
# we will use original data as Amount is transformed for modelling 
from sklearn.metrics import accuracy_score , f1_score ,roc_auc_score
def fraud_loss_saved ( dataset , key) :

    df = dataset.copy()
    df['Amount']  = np.exp(df['Amount'])
    total_fraud_amt = df[df['Class'] ==1]['Amount'].sum()
    print("Total Fraud Amount in Validation Data : " +  str(round(total_fraud_amt,2)))
    total_fraud_amt_detected = df.loc[(df['prediction'] ==1) & (df['Class']==1) ]['Amount'].sum()
    print("Total Fraud Amount Detected in Validation Data : " +  str(round(total_fraud_amt_detected,2)))
    print("Fraud Loss Saved (%): " + str(round(100*total_fraud_amt_detected/total_fraud_amt ,2)))
    detection_rate  = 100 * (df[df['prediction']==1]['Class'].sum())/df['Class'].sum()
    print("Detection Rate (%) : " + str(round(detection_rate , 2)))
    accuracy = 100*accuracy_score(df['Class'] ,df['prediction'])
    print("Accuracy : " + str(round(accuracy ,2)))
    f1 = f1_score(df['Class'] ,df['prediction'])
    print("F1 Score : " + str(round(f1 ,4)))   
    auc_score = roc_auc_score(df['Class'],df['prediction'])
    print("AUROC Score : " + str(round(auc_score,4)))
    values = []
    values.append(key)
    values.append(detection_rate)
    values.append(auc_score)
    values.append(f1)
    values.append(accuracy)
    values.append(round(100*total_fraud_amt_detected/total_fraud_amt ,2))
    
    final_values =[]
    final_values.append(values)
    temp_df = pd.DataFrame(final_values ,columns =['Model Name','Detection Rate' ,'AUROC','F1 Score','Accuracy','Fraud Loss Saved'])
    
    global metric_data
    
    metric_data = pd.concat([metric_data,temp_df ] , axis = 0 )
    
    
    

In [None]:
fraud_loss_saved(valid_df ,'Decision Tree - Valid Data')

### Lets try Machine Learning and see if it can help ###

<img src= "https://thumbs.dreamstime.com/b/machine-learning-technology-artificial-intelligence-modern-manufacturing-144923304.jpg" alt ="Machine Learning" style='width: 600px;'>

In [None]:
%%time 

from xgboost import XGBClassifier

xgb_clf = XGBClassifier()

xgb_clf.fit(train_df[predictors], train_df[target].values)

valid_df['prediction'] = xgb_clf.predict(valid_df[predictors])

cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

In [None]:
fraud_loss_saved(valid_df ,'XGBOOST - Valid Data')

In [None]:
from sklearn.linear_model import LogisticRegression

lr_clf = LogisticRegression()

lr_clf.fit(train_df[predictors], train_df[target].values)

valid_df['prediction'] = lr_clf.predict(valid_df[predictors])

cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

In [None]:
fraud_loss_saved(valid_df ,'Logistic Regression - Valid Data')

In [None]:
%%time 
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators = 20)

rf_clf.fit(train_df[predictors], train_df[target].values)

valid_df['prediction'] = rf_clf.predict(valid_df[predictors])

cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

In [None]:
fraud_loss_saved(valid_df ,'Random Forest - Valid Data')

In [None]:
metric_data

### Note : Importance of Validation Data set 

### It is used to fine tune the model , here we are training 4 models and evaluating their performance on the validation data set 

### We can use the result on this data set to select the model 

## Let us check Decision Tree & Machine Learning Algorithm XGBOOST performance on test data set 

In [None]:


test_df['prediction'] = test_df[predictors].apply(model_function, axis = 1)

cm = pd.crosstab(test_df[target].values, test_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

fraud_loss_saved(test_df, 'Decision Tree - Test Data')

### For Simplicity - we are taking Dollar Saving as the criteria to choose the model in Practise many parameters are there like AUROC , F1 Score , False Positive Ratio , PSI , CSI , Customer Impact , Concordance - Discordance , Gini ###

In [None]:
test_df['prediction'] = xgb_clf.predict(test_df[predictors])


cm = pd.crosstab(test_df[target].values, test_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

fraud_loss_saved(test_df ,'XGBOOST - Test Data')

In [None]:
metric_data

### Caution !!!!! - Check Accuracy everyone doing 99% , we cant figure good vs bad model ###

### Perhaps  AUROC and F1 Score is a better model performance metric for final model selection ###

In [None]:
# Lets Hyper Tune XGBOOST 


from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold
model = xgb.XGBClassifier()
n_estimators = range(50, 100, 50)
param_grid = dict(n_estimators=n_estimators)

max_depth = range(5, 8, 2)
param_grid['max_depth'] = max_depth
print(param_grid)

In [None]:
%%time

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=7)
grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold,verbose = 3)
grid_result = grid_search.fit(train_df[predictors], train_df[target])
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))


In [None]:
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))
# plot


In [None]:
grid_result.best_params_

In [None]:
valid_df['prediction'] = grid_result.best_estimator_.predict(valid_df[predictors])


cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

fraud_loss_saved(valid_df ,'XGBOOST - Hypertune - Valid Data')

In [None]:
metric_data

In [None]:
test_df['prediction'] = grid_result.best_estimator_.predict(test_df[predictors])


cm = pd.crosstab(test_df[target].values, test_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()

fraud_loss_saved(test_df , 'XGBOOST - Hypertune - Test Data')

In [None]:
metric_data

### Lets Understand the Business Impact of implementing machine learning model for Fraud Detection ###

In [None]:
total_fraud_in_data = orig_data.loc[orig_data['Class'] ==1]['Amount'].sum()
print("Total Fraud Amount in the Dataset :" + str(round(total_fraud_in_data , 2)))
print("Fraud Loss per day : " + str(round(total_fraud_in_data/2,2)))
print("Fraud Loss per year : " + str(round(365*total_fraud_in_data/2,2)))


# Using Decision Tree : ###

### Detect 70% of Fraudulent Transaction ###

### Save Approx 5.9 million dollars on Fraud Losses ###

# Using XGBOOST : ###

### Detect 80% of Fraudulent Transaction ###

### Save Approx 6.4 million dollars on Fraud Losses ###

### Incremental benefit of 500k dollars ###

## Advance Technique ##

### Lets us try to use the probabilities which are given by model predict_proba method ####

### For every instance model with 2 probabilities i.e [ 0.3, 0.7] 

### where 0.3 means probability of it being Class = 0 i.e. Non Fraud 

### where 0.7 means probability of it being Class = 1 i.e. Fraud 

In [None]:
%%time
metric_list  =[]
for threshold in np.linspace(0.0001, 0.1,200 ) :
    
    threshold = round(threshold,5)
    df = valid_df.copy()
    probs = grid_result.best_estimator_.predict_proba(df[predictors])
    prob_of_fraud = probs[:,1]
    preds = prob_of_fraud >= threshold
    df['prediction'] = preds
    df['prediction'] = df['prediction'].astype(int)
    df['Amount']  = np.exp(df['Amount'])
    total_fraud_amt = df[df['Class'] ==1]['Amount'].sum()
    total_fraud_amt_detected = df.loc[(df['prediction'] ==1) & (df['Class']==1) ]['Amount'].sum()
    fraud_saving = round(100*total_fraud_amt_detected/total_fraud_amt ,2)
    detection_rate  = 100 * (df[df['prediction']==1]['Class'].sum())/df['Class'].sum()
    detection_rate = round(detection_rate , 2)
    metric =[]
    metric.append(threshold)
    metric.append(round(f1_score(df[target] ,df['prediction']),4))
    metric.append(detection_rate)
    metric.append(fraud_saving)
    
    metric_list.append(metric)

brute_force_df =pd.DataFrame(metric_list , columns = ['Threshold', 'F1 Score' , 'Detection Rate' ,'Fraud Loss Saved'])
brute_force_df = brute_force_df.sort_values('Fraud Loss Saved' , ascending = False)                  
brute_force_df.head(20)

In [None]:
cut_off_selected = 0.00161
probs = grid_result.best_estimator_.predict_proba(valid_df[predictors])
prob_of_fraud = probs[:,1]
preds = prob_of_fraud >= cut_off_selected
valid_df['prediction'] = preds

valid_df['prediction'] = valid_df['prediction'].astype(int)

cm = pd.crosstab(valid_df[target].values, valid_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()


fraud_loss_saved(valid_df ,'XGBOOST Optimized Cut off - Valid Data')


In [None]:
cut_off_selected = 0.00161
probs = grid_result.best_estimator_.predict_proba(test_df[predictors])
prob_of_fraud = probs[:,1]
preds = prob_of_fraud >= cut_off_selected
test_df['prediction'] = preds

test_df['prediction'] = test_df['prediction'].astype(int)

cm = pd.crosstab(test_df[target].values, test_df['prediction'], rownames=['Actual'], colnames=['Predicted'])
fig, (ax1) = plt.subplots(ncols=1, figsize=(7,7))
sns.heatmap(cm, 
            xticklabels=['Not Fraud', 'Fraud'],
            yticklabels=['Not Fraud', 'Fraud'],
            annot=True,ax=ax1,
            linewidths=.2,linecolor="Darkblue", cmap="Blues" , fmt='d')
plt.title('Confusion Matrix', fontsize=16)
plt.show()


fraud_loss_saved(test_df ,'XGBOOST Optimized Cut off - Test Data')


In [None]:
metric_data

# Using Performance Tuning we have saved 25K dollars additionally !!!!!

# As compares to Decision Trees 5.88 million  , we have saved 6.42 million dollars for the bank using Machine Learning model ( XGBOOST)

# Incremental benefit of 520K Dollars

### Feature Importance ###

In [None]:
tmp = pd.DataFrame({'Feature': predictors, 'Feature importance': grid_result.best_estimator_.feature_importances_})
tmp = tmp.sort_values(by='Feature importance',ascending=False)
plt.figure(figsize = (10,8))
plt.title('Features importance',fontsize=14)
s = sns.barplot(x='Feature',y='Feature importance',data=tmp)
s.set_xticklabels(s.get_xticklabels(),rotation=90)
plt.show()   

### Note V17 is the most useful feature for XGBOOST 

### Even Decision Tree algorithm started with V17 as the root node - indicating it is the most important feature

## Future Scope ##

### 1. Try Deep Learning 
### 2. Try Unsupervised Learning
### 3 . Create and Ensemble of Supervised & Unsupervised Learning 