## Neural networks applications to fraud detection
## Comparing alternative methods

Neural networks are one of the most relevant learning methods currently available, and their widespread application is understood by theoretical robustness, flexible architecture design, and strong expected predictive accuracy.
<br>
<br>
The main objective of this study is to develop a neural network application to fraud detection, and mainly to construct and implement a strategy for hyper-parameter tuning, since this learning method requires a proper definition of a large set of parameters in order to result in a competitive performance.
<br>
<br>
Previously to empirical inquirements, it is necessary to review all details concerning neural networks structure, fitting, and specification, which will base experiments design and tests implementation. So, the theoretical presentation of this notebook will be followed by an empirical stage of tests in which hyper-parameters will be defined to improve neural networks predictive accuracy, after which the best specification obtained should be opposed to alternative learning methods.

-----------

After estimating models for different learning methods, presenting and discussing their results, this notebook compares the performance of all tested methods based on different statistical metrics.

---------------

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Settings](#settings)<a href='#settings'></a>.
3. [Importing data](#imports)<a href='#imports'></a>.
4. [Comparing performance of alternative methods](#comparing_performance)<a href='#comparing_performance'></a>.

<a id='libraries'></a>

## Libraries

In [1]:
import pandas as pd
import numpy as np
import json
import os

from datetime import datetime
import time

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# print(__version__) # requires version >= 1.9.0

import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import pickle

<a id='settings'></a>

## Settings

In [2]:
# Declare whether to export results:
export = False

<a id='imports'></a>

## Importing data

In [32]:
# Dictionary with information on model structure and performance (neural networks):
os.chdir('/home/matheus_rosso/Arquivo/Materiais/Codes/neural_nets/')

model_assessment = {}

with open('Datasets/model_assessment.json') as json_file:
    model_assessment['NN'] = json.load(json_file)

In [33]:
# Dictionary with information on model structure and performance (alternative methods):
for m in ['LR', 'SVM', 'GBM']:
    with open('Datasets/model_assessment_{0}.json'.format(m)) as json_file:
        model_assessment[m] = json.load(json_file)

<a id='comparing_performance'></a>

## Comparing performance of alternative methods

In [38]:
# Statistics for performance metrics by learning method:
avg_roc_auc = [
    model_assessment['NN']['1615253754']['performance_metrics']['avg_roc_auc'],
    model_assessment['LR']['1615061729']['performance_metrics']['avg_roc_auc'],
    model_assessment['SVM']['1615402205']['performance_metrics']['avg_roc_auc'],
    model_assessment['GBM']['1615580184']['performance_metrics']['avg_roc_auc']
]

avg_avg_prec_score = [
    model_assessment['NN']['1615253754']['performance_metrics']['avg_avg_prec_score'],
    model_assessment['LR']['1615061729']['performance_metrics']['avg_avg_prec_score'],
    model_assessment['SVM']['1615402205']['performance_metrics']['avg_avg_prec_score'],
    model_assessment['GBM']['1615580184']['performance_metrics']['avg_avg_prec_score']
]

avg_brier_score = [
    model_assessment['NN']['1615253754']['performance_metrics']['avg_brier_score'],
    model_assessment['LR']['1615061729']['performance_metrics']['avg_brier_score'],
    model_assessment['SVM']['1615402205']['performance_metrics']['avg_brier_score'],
    model_assessment['GBM']['1615580184']['performance_metrics']['avg_brier_score']
]

std_roc_auc = [
    model_assessment['NN']['1615253754']['performance_metrics']['std_roc_auc'],
    model_assessment['LR']['1615061729']['performance_metrics']['std_roc_auc'],
    model_assessment['SVM']['1615402205']['performance_metrics']['std_roc_auc'],
    model_assessment['GBM']['1615580184']['performance_metrics']['std_roc_auc']
]

std_avg_prec_score = [
    model_assessment['NN']['1615253754']['performance_metrics']['std_avg_prec_score'],
    model_assessment['LR']['1615061729']['performance_metrics']['std_avg_prec_score'],
    model_assessment['SVM']['1615402205']['performance_metrics']['std_avg_prec_score'],
    model_assessment['GBM']['1615580184']['performance_metrics']['std_avg_prec_score']
]

std_brier_score = [
    model_assessment['NN']['1615253754']['performance_metrics']['std_brier_score'],
    model_assessment['LR']['1615061729']['performance_metrics']['std_brier_score'],
    model_assessment['SVM']['1615402205']['performance_metrics']['std_brier_score'],
    model_assessment['GBM']['1615580184']['performance_metrics']['std_brier_score']
]

In [44]:
# Dataframe with statistics of performance metrics by learning method:
metrics = pd.DataFrame(data = {
    'method': ['NN', 'LR', 'SVM', 'GBM'],
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_avg_prec_score': avg_avg_prec_score,
    'std_avg_prec_score': std_avg_prec_score,
    'avg_brier_score': avg_brier_score,
    'std_brier_score': std_brier_score
})

for m in ['roc_auc', 'avg_prec_score', 'brier_score']:
    metrics[f'ratio_{m}'] = metrics[f'avg_{m}']/metrics[f'std_{m}']

metrics.sort_values(['avg_roc_auc', 'ratio_roc_auc'], ascending=[False, False])

Unnamed: 0,method,avg_roc_auc,std_roc_auc,avg_avg_prec_score,std_avg_prec_score,avg_brier_score,std_brier_score,ratio_roc_auc,ratio_avg_prec_score,ratio_brier_score
1,LR,0.955409,5.8e-05,0.505008,0.000866,0.008912,8e-06,16508.084976,582.909927,1071.775044
3,GBM,0.947797,0.003961,0.441309,0.033076,0.009877,0.0005,239.292287,13.342267,19.752882
0,NN,0.946432,0.001816,0.460111,0.01283,0.010527,0.000353,521.099463,35.862691,29.859895
2,SVM,0.941386,4e-06,0.489918,3.5e-05,0.008749,2.4e-05,245019.126469,13948.539729,369.248814
