## Credit Card Fraud Analysis - Predictive Models

Credit Card Fraud Detection with Machine Learning is a process of data investigation and the development of a model that will provide the best results in revealing and preventing fraudulent transactions. This is achieved through bringing together all meaningful features of card users’ transactions, such as Date, User Zone, Product Category, Amount, Provider, Client’s Behavioral Patterns, etc.

The Credit Card Fraud Detection Problem includes modeling past credit card transactions with the knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new transaction is fraudulent or not. Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications.

The purpose of this data analysis is therefore to identify potential fraudulent credit card transactions.

I order to detect these anomalies and propose a prediction model, I would use machine learning-based techniques such as: Desicion Tree Classification, Random Forest classification method, Logistic Regregression etc.


Machine Learning-based Fraud Detection:

* Detecting fraud automatically
* Real-time streaming
* Less time needed for verification methods
* Identifying hidden correlations in data

### Data
from https://www.kaggle.com/mlg-ulb/creditcardfraud

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have **492** frauds out of **284,807** transactions. The dataset is highly unbalanced, the positive class (frauds) account for **0.172%** of all transactions.

The dataset consists of numerical values from the **28** "Principal Component Analysis (PCA)" transformed features, namely V1 to V28. Furthermore, there is no metadata about the original features provided, so pre-analysis or feature study could not be done.
The 'Time' and 'Amount' features are not transformed data.

### Import Standard Packages:

In [None]:
#packages
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set(style="ticks", color_codes=True)
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection  import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, precision_recall_curve
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report
from sklearn.pipeline import make_pipeline
import warnings
warnings.filterwarnings("ignore")

## 1. Load the dataset:

In [None]:
CRdf = pd.read_csv('../input/creditcardfraud/creditcard.csv')

In [None]:
#dataset size
CRdf.shape

In [None]:
#display 5 first rows
CRdf.head(5)

In [None]:
#display 5 last rows
CRdf.tail()

Let's check the variables types

In [None]:
#variables types
CRdf.dtypes

Let's check for missing values

In [None]:
def NA_val(data):
    
    missing = data.isna().sum()
    missing = missing[missing>0]
    missing_perc = missing/CRdf.shape[0]*100
    na = pd.DataFrame([missing, missing_perc], index = ['missing_num', 'missing_perc']).T
    NA_val = na.sort_values(by = 'missing_perc', ascending = False)
    NA_val = round(NA_val,2)

    return NA_val

In [None]:
NA_val=NA_val(CRdf)
NA_val

Good! there is no missing value in the dataset.

# 2. Exploratory Data Analysis


In [None]:
print(CRdf.shape)
print(CRdf.describe())

In [None]:
print("Column Names", CRdf.columns) #Here I am using both the print function

In [None]:
CRdf.hist(figsize=(20,20))
plt.show()


In [None]:
# designate target variable name
targetVar = 'Class'
#print(targetVar)
targetSeries = CRdf[targetVar] #notice one column is considered a series in pandas
#print(targetSeries)
#remove target from current location and insert in column number 0
del CRdf[targetVar]
CRdf.insert(0, targetVar, targetSeries)
#reprint dataframe and see target is in position 0
CRdf.head()

In [None]:
#Basic bar chart since the target is binominal
groupby = CRdf.groupby(targetVar)
targetEDA=groupby[targetVar].aggregate(len)
print(targetEDA)

labels = ["Normal", "Fraud"]
plt.figure()
targetEDA.plot(kind='bar', grid=True, color='orange')
plt.axhline(0, color='k')
plt.title("Transaction Class Distribution")
plt.xticks(range(2), labels)
plt.xlabel("Class")
plt.ylabel("Frequency");

The fraud (Class 1) frequency is too low to see; to remind the number of fraudulent transactions is 492 frauds.

In [None]:
#Calculate fraud rate
nb_customers = len(CRdf.index)
print('There are a total of %s customers in the dataset among which %s anomaly (or fraud).' 
      %(nb_customers, CRdf[CRdf['Class'] == 1].shape[0]))
CR_NB = CRdf['Class'].value_counts()[1]
FraudRate = float(CR_NB) / nb_customers
print('The Attrition rate is {:.2f}%'.format(FraudRate*100))

How different are the amount of money used in different transaction classes?

In [None]:
normal_transactions = CRdf[CRdf['Class'] == 0]
fraud_transactions = CRdf[CRdf['Class'] == 1]
normal_transactions.head(5)

In [None]:
normal_transactions.Amount.describe()

In [None]:
fraud_transactions.Amount.describe()

In [None]:
plt.figure(figsize = (11,3))
plt.subplot(1,2,1)
plt.scatter(normal_transactions.Time, normal_transactions.Amount)
plt.title('Normal transactions')
plt.xlabel('Time in seconds'); 
plt.ylabel('Amount')
plt.subplot(1,2,2)
plt.scatter(fraud_transactions.Time, fraud_transactions.Amount)
plt.title('Fraud transactions')
plt.xlabel('Time in seconds'); 
plt.ylabel('Amount')

plt.show()

The observation of the graphs above shows that the time of transactions does not really matters. 

#### Drop the variable Time:

In [None]:
CRdf.drop(["Time"], axis=1, inplace=True)

### Correlation:

By plotting a correlation matrix, we have a very nice overview of how the features are related to one another.

In [None]:
#correlation matrix
corr = CRdf.corr()

#plot using seaborn library
# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(17, 11))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(round(corr,2), annot=True, mask=mask, cmap=cmap, vmax=.3,
                linewidths=.5, cbar_kws={"shrink": .5}, ax=ax);
plt.show()

The above correlation matrix shows that none of the V1 to V28 PCA components have any correlation to each other, however the target variable "Class" has some form positive and negative correlations with the V components, but it has no correlation with Time and Amount. There is no risk of collinarity between our variables.

## 3. Machine Learning Models


Machine learning algorithms have hyperparameters that allow you to tailor the behavior of the algorithm to your specific dataset.
Hyperparameters are different from parameters, which are the internal coefficients or weights for a model found by the learning algorithm. Unlike parameters, hyperparameters are specified by the practitioner when configuring the model.
Typically, it is challenging to know what values to use for the hyperparameters of a given algorithm on a given dataset, therefore it is common to use random or grid search strategies for different hyperparameter values.

#### Create a training and test set with a split 70/30:

In [None]:
target = 'Class'
predictors = ['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',\
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19',\
       'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28',\
       'Amount']

In [None]:
# split dataset into testing and training
# column location 1 to end of dataframe are the features.
# column location 0 is the target
features_train, features_test, target_train, target_test = train_test_split(
    CRdf.iloc[:,1:].values, CRdf.iloc[:,0].values, test_size=0.30, random_state=0)

In [None]:
print(features_test.shape)
print(features_train.shape)
print(target_test.shape)
print(target_train.shape)

### 3. 1. Decision Tree Classification

In [None]:
#decision tree. Call up my model and name it clf
#clf is a notation used by many people for classifier
from sklearn import tree 
dt_clf = tree.DecisionTreeClassifier()
#Call up the model to see the parameters you can tune (and their default setting)
print(dt_clf)

In [None]:
#train model
dt_model = dt_clf.fit(features_train, target_train)

In [None]:
#Predict clf DT model again test data
target_pred_dt = dt_model.predict(features_test)

In [None]:
print("Decision Tree Accuracy Score", accuracy_score(target_test, target_pred_dt))
print(classification_report(target_test, target_pred_dt, target_names = ["Class = no", "Class = yes"]))
print(confusion_matrix(target_test, target_pred_dt))

#extracting true_positives, false_positives, true_negatives, false_negatives
tn, fp, fn, tp = confusion_matrix(target_test, target_pred_dt).ravel()
print("True Negatives: ",tn)
print("False Positives: ",fp)
print("False Negatives: ",fn)
print("True Positives: ",tp)

The Decision Tree classifier's accuracy score is 99.92%, which is a excellent score.

Precision: How often is the classifier correct with its positive predictions? Precision = True Positives/(True Positives + False Positives). 

Recall: How well does the classifier predict positive cases? Recall = True Positives/(True Positives + False Negatives). Yes, recall is the same as the sensitivity rate. 

F1-score is a function of Precision and Recall. It is needed when you want to seek a balance between Precision and Recall.

Our classifier correctly identifies 77% of fraudulent transactions (Recall). Also, The DT classifier is 79% correct when it predicts "fraud" (Precision).

* We can further deepen our analysis by trying to improve the recall score (sensitivity). As we know, this metric is very important in detecting anomalies.

Features importance:

In [None]:
imp = pd.DataFrame({'Feature': predictors, 'Feature importance': dt_model.feature_importances_})
imp = imp.sort_values(by='Feature importance',ascending=False)
plt.figure(figsize = (7,4))
plt.title('Features importance',fontsize=14)
s = sns.barplot(x='Feature',y='Feature importance',data=imp)
s.set_xticklabels(s.get_xticklabels(),rotation=90)
plt.show()  

#### **In order to tune the different methods I am going to create a tuning function based on GridSearch:**

In [None]:

# Run grid search, get the prediction array and print the accuracy and best combination
def fit_and_pred_grid_classifier(clf, param_grid, X_train, X_test, y_train, y_test, scoring = "recall", folds = 10):

    
    gs = GridSearchCV(estimator = clf, param_grid = param_grid, cv = folds, scoring = scoring, n_jobs = -1, verbose = 0)
    gs = gs.fit(X_train, y_train)

    best_score = gs.best_score_
    best_parameters = gs.best_params_

    # Get the prediction array
    grid_search_pred = gs.predict(X_test)
    
    
    # summarize results
    print("Best " +  scoring + " score: %f using %s" % (best_score, best_parameters))
    means = gs.cv_results_['mean_test_score']
    stds = gs.cv_results_['std_test_score']
    params = gs.cv_results_['params']
    
    for mean, stdev,param in zip(means, stds, params):
        print("%f (%f) with: %r" % (mean, stdev, param))

    return grid_search_pred, grid_search_pred


Now, let's call the tuning for DT classifier. 

The first parameter to tune is max_depth. This indicates how deep the tree can be. The deeper the tree, the more splits it has and it captures more information about the data. We fit a decision tree with depths ranging from 1 to 32. Other hyper parameters we can also tune are "max_features" and the "criterion" type.

In [None]:
param_grid = {"max_depth": [3,7,11],
              "max_features": [3,9,15],
             "criterion": ["gini", "entropy"]
             }
import time
start = time.time()

# Run grid search, print the results and get the prediction array and model
gs_pred_dt, dt_grid = fit_and_pred_grid_classifier(dt_clf, param_grid, features_train,
                                                   features_test, target_train, target_test)

end = time.time()
print("Time to run", round(end-start), "seconds")

We reach the best recall score, about 77% by using inputs {'criterion': 'entropy', 'max_depth': 3, 'max_features': 15}.

#### *Confusion matrix:*

In [None]:
#validate set
cm_dt = confusion_matrix(target_test, gs_pred_dt)#correlation matrix

print(classification_report(target_test, gs_pred_dt,target_names = ["Class = no", "Class = yes"])) 
print("Recall: " + str(round(recall_score(target_test, gs_pred_dt), 4) * 100) + "%") 

group_counts = ["{0:0.0f}".format(value) for value in
                cm_dt.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
                     cm_dt.flatten()/np.sum(cm_dt)]
labels = [f"{v1}\n{v2}" for v1, v2 in
          zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cm_dt, annot=labels, fmt='', cmap='Oranges')

The results presented above show us that:

We now correctly predicted  **117** entries as "Fraudulent" and increase the recall score to 80% from 77%. This a little bit better than previously, with default parameters.

### 3. 2. Random Forest Model


Random Forest is a variant of Bagging where only a randomly chosen subset of features are considered to split at each node. Each node is split on the "best" of the given subset of features. The random forest model has  less variance than the decision tree.

In [None]:
from sklearn.ensemble import RandomForestClassifier
#Build
rf_clf = RandomForestClassifier(max_features='auto', random_state=123)
print(rf_clf)


In [None]:
#Train set
rf_model = rf_clf.fit(features_train, target_train)

#Validation set - prediction
target_pred_rf = rf_clf.predict(features_test)


In [None]:
print("Random Forest classifier Accuracy Score", accuracy_score(target_test, target_pred_rf))
print(classification_report(target_test, target_pred_rf, target_names = ["Class = no", "Class = yes"]))
print(confusion_matrix(target_test, target_pred_rf))

#extracting true_positives, false_positives, true_negatives, false_negatives
tn, fp, fn, tp = confusion_matrix(target_test, target_pred_rf).ravel()
print("True Negatives: ",tn)
print("False Positives: ",fp)
print("False Negatives: ",fn)
print("True Positives: ",tp)

By using default paramenters the Random Forest classifier is 94% corret when identifying "fraud transaction" (precision) and the model also has a good recall score (75%). As the DT model, the RF model did with a very good accuracy.

*Features importance:*

In [None]:
imp = pd.DataFrame({'Feature': predictors, 'Feature importance': rf_clf.feature_importances_})
imp = imp.sort_values(by='Feature importance',ascending=False)
plt.figure(figsize = (7,4))
plt.title('Features importance',fontsize=14)
s = sns.barplot(x='Feature',y='Feature importance',data=imp)
s.set_xticklabels(s.get_xticklabels(),rotation=90)
plt.show()   

### 3. 3. Extremely Randomized Trees (Extra Trees)

Extra Trees is like Random Forest, in that it builds multiple trees and splits nodes using random subsets of features, but with two key differences: it does not bootstrap observations (meaning it samples without replacement), and nodes are split on random splits, not best splits. 

In [None]:
from sklearn.ensemble import ExtraTreesClassifier
clf_xdt = ExtraTreesClassifier(n_estimators= 100, n_jobs=-1, random_state=123)
print(clf_xdt)

In [None]:
#train data
model_xdt = clf_xdt.fit(features_train, target_train)

#validation set
target_predicted=clf_xdt.predict(features_test)

In [None]:
print("Extra Trees Accuracy", accuracy_score(target_test,target_predicted))
target_names = ["Class = no", "Class = yes"]
print(classification_report(target_test, target_predicted,target_names=target_names))
print(confusion_matrix(target_test, target_predicted))

This classifier correctly identifies 75% of "Fraud" (recall). This classifier is also 97% correct when it predicts an fraudulent case (precision). 

In [None]:
imp = pd.DataFrame({'Feature': predictors, 'Feature importance': clf_xdt.feature_importances_})
imp = imp.sort_values(by='Feature importance',ascending=False)
plt.figure(figsize = (7,4))
plt.title('Features importance',fontsize=14)
s = sns.barplot(x='Feature',y='Feature importance',data=imp)
s.set_xticklabels(s.get_xticklabels(),rotation=90)
plt.show()   

**Tuning:** The most important parameter is the number of random features to sample at each split point (max_features).

In [None]:
# use a full grid over all parametersimport time
import time
param_grid = {"max_features": [7,11,15]}
start = time.time()

# run grid search
import time
start = time.time()

# run grid search
xdt_gs_pred, xdt_grid = fit_and_pred_grid_classifier(clf_xdt, param_grid, features_train, 
                                                     features_test, target_train, target_test, scoring='recall')
end = time.time()
print("Time to run", round(end-start), "seconds")


Well done! We increase the recall score from 75% to 79.45%.

#### Confusion matrix*:*

In [None]:
#with the validation set
cm_xdt = confusion_matrix(target_test, xdt_gs_pred)#confusion matrix

print(classification_report(target_test, xdt_gs_pred,target_names = ["Class = no", "Class = yes"])) 
print("Recall: " + str(round(recall_score(target_test, xdt_gs_pred),4) * 100) + "%") 

group_counts = ["{0:0.0f}".format(value) for value in
                cm_xdt.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
                     cm_xdt.flatten()/np.sum(cm_xdt)]
labels = [f"{v1}\n{v2}" for v1, v2 in
          zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cm_xdt, annot=labels, fmt='', cmap='Oranges')

The classifier correctly predicted 85290 (99.82%) entries as "Normal transactions (No Fraudulents)" and 112 entries as "Fraudulents" (0.13%). This method incorrectly predicted 35 entries as "No Fraudulents" and 6 as "Fraudulents". We slightly improve the recall score from 75 to 76%.

### 3. 4. Stochastic Gradient Descent Classifier

Let's first normalize features:

In [None]:
#Normalize features
scaler = StandardScaler()  
#Train
scaler.fit(features_train)  
#Validate
features_train_norm = scaler.transform(features_train)  
# apply same transformation to test data
features_test_norm = scaler.transform(features_test) 


In [None]:
from sklearn.linear_model import SGDClassifier
sgd_linear_svm_clf = SGDClassifier(random_state=0)
print(sgd_linear_svm_clf )

#Train data
model_sgd = sgd_linear_svm_clf.fit(features_train_norm, target_train)

#test data
target_pred_sgd = sgd_linear_svm_clf.predict(features_test_norm)
print("Accuracy", accuracy_score(target_test, target_pred_sgd))
target_names = ["Class = no", "Class = yes"]
print(classification_report(target_test, target_pred_sgd, target_names=target_names))
print(confusion_matrix(target_test, target_pred_sgd))

Good accuracy 99.90%. Our classifier correctly identifies 55% of fraud transactions (Recall). The Stochastic Gradient Descent  classifier is 87% correct when it predicts "Fraud" (Precision).

### Tuning...

In [None]:
import time
start = time.time()
param_grid = {'alpha': [0.0001,0.01,0.1]
             }
# Run grid search, print the results and get the prediction array and model
gsd_gs_pred, gsd_grid = fit_and_pred_grid_classifier(sgd_linear_svm_clf, param_grid, features_train_norm, 
                                                     features_test_norm, target_train, target_test)
end = time.time()
print("Time to run", round(end-start), "seconds")

We reach the best recall score (56.85%)  for alpha value equals to 0.0001.

#### Confusion matrix:

In [None]:
#validate set
cm_gsd = confusion_matrix(target_test, gsd_gs_pred)#confusion matrix

print(classification_report(target_test, gsd_gs_pred,target_names = ["Class = no", "Class = yes"])) 
print("Recall: " + str(round(recall_score(target_test, gsd_gs_pred),4) * 100) + "%") 

group_counts = ["{0:0.0f}".format(value) for value in
                cm_gsd.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
                     cm_gsd.flatten()/np.sum(cm_gsd)]
labels = [f"{v1}\n{v2}" for v1, v2 in
          zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cm_gsd, annot=labels, fmt='', cmap='Oranges')

### 3. 5. Logit Regression Model



In [None]:
# building logistic regression classifier
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
import matplotlib.ticker

logit = LogisticRegression()

#Call up the model to see the parameters you can tune (and their default setting)
print(logit)

In [None]:
#train model
logit_model = logit.fit(features_train_norm, target_train)

#validate set
logit_predicted=logit.predict(features_test_norm)

In [None]:
print("Logistic classifier Accuracy Score", accuracy_score(target_test, logit_predicted))
print(classification_report(target_test, logit_predicted))
print(confusion_matrix(target_test, logit_predicted))

#extracting true_positives, false_positives, true_negatives, false_negatives
tn, fp, fn, tp = confusion_matrix(target_test, logit_predicted).ravel()
print("True Negatives: ",tn)
print("False Positives: ",fp)
print("False Negatives: ",fn)
print("True Positives: ",tp)

The logit model also did it with a good 99.92% accuracy. It correctly identifies 62% of fraud transactions (Recall), and this method is 88% correct when it predicts a fraud transaction (Precision).

The logit model correctly predicted 85284 entries as "Normal transactions (No Fraudulents)" and 91 entries as "Fraudulent".



In [None]:
param_grid = {'penalty' : ['l1', 'l2'],
    'C' : np.logspace(-1, 1, 10),
    'solver' : ['liblinear']}

# Run grid search, print the results and get the prediction array and model
logit_target_pred, logit_grid = fit_and_pred_grid_classifier(logit, param_grid, features_train_norm, 
                                                     features_test_norm, target_train, target_test)
end = time.time()
print("Time to run", round(end-start), "seconds")


In [None]:
#validate set
cm_logit = confusion_matrix(target_test, logit_target_pred)#confusion matrix

print(classification_report(target_test, logit_target_pred,target_names=['Class = no','Class = yes'])) 
print("Recall: " + str(round(recall_score(target_test, logit_target_pred),4) * 100) + "%") 

group_counts = ["{0:0.0f}".format(value) for value in
                cm_logit.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
                     cm_logit.flatten()/np.sum(cm_logit)]
labels = [f"{v1}\n{v2}" for v1, v2 in
          zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cm_logit, annot=labels, fmt='', cmap='Oranges')

The logistic regression classier correctly predicted 85284entries as "Normal transactions (No Fraudulents)" and 91 entries as "Fraudulents". It also incorrectly predicted 56 entries as "No Fraudulents" and 0.01% (12 entries) as "Fraudulents". The recall score change to 61.9%.

### 3. 6.  ROC Curves

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import GaussianNB

# Instantiate the classfiers and make a list
classifiers = [make_pipeline(StandardScaler(), LogisticRegression()),
               tree.DecisionTreeClassifier(),
               RandomForestClassifier(max_features='auto', n_estimators=100), 
               ExtraTreesClassifier(n_estimators= 100),
               ]

# Define a result table as a DataFrame
result_table = pd.DataFrame(columns=['classifiers', 'fpr','tpr','auc'])

# Train the models and record the results
for cls in classifiers:
    model = cls.fit(features_train, target_train)
    target_predicted = model.predict_proba(features_test)[::,1]
    
    fpr, tpr, _ = roc_curve(target_test,  target_predicted)
    auc = roc_auc_score(target_test, target_predicted)
    
    result_table = result_table.append({'classifiers':cls.__class__.__name__,
                                        'fpr':fpr, 
                                        'tpr':tpr, 
                                        'auc':auc}, ignore_index=True)

# Set name of the classifiers as index labels
result_table.set_index('classifiers', inplace=True)


fig = plt.figure(figsize=(8,6))

for i in result_table.index:
    plt.plot(result_table.loc[i]['fpr'], 
             result_table.loc[i]['tpr'], 
             label="{}, AUC={:.3f}".format(i, result_table.loc[i]['auc']))
    
plt.plot([0,1], [0,1], color='orange', linestyle='--')

plt.xticks(np.arange(0.0, 1.1, step=0.1))
plt.xlabel("False Positive Rate", fontsize=15)

plt.yticks(np.arange(0.0, 1.1, step=0.1))
plt.ylabel("True Positive Rate", fontsize=15)

plt.title('ROC Curve Analysis', fontweight='bold', fontsize=15)
plt.legend(prop={'size':13}, loc='lower right')

plt.show()

In general we obtained very good scores for all models. The logistic regression model leads with a auc score equals to 96.9%.  

## 4.  Conclusion

We have first explorated the dataset, by understanding features and the relationship between each to other. In the second part We modeled the data set to achieve about 99.9% accuracy for fraud detection according to the different supervized ML methods implemented. Such models will intially capture all the frauds, but will rigorously classify non-frauds as fradulent as well.

Since all algorithms performed with high accuracy, it was interesting to look at other metrics, especially the recall score. By tuning our models we managed to increase the recall score. This comes out with good results, however at the cost of computational expense.

So, overall, the Extremely Randomized Trees model (Extra Trees) were much more successful in determining fraudulent transactions. With a high accuracy (99.95%), a good recall score equal to 79.45% and a goog precision (97%), this model seems to be the best candidate to detect fraudulent transactions. It is followed by the Random Forest model. The gradient descent classifier has the fewest recall score.