## Neural networks applications to fraud detection
## Analysis of results

Neural networks are one of the most relevant learning methods currently available, and their widespread application is understood by theoretical robustness, flexible architecture design, and strong expected predictive accuracy.
<br>
<br>
The main objective of this study is to develop a neural network application to fraud detection, and mainly to construct and implement a strategy for hyper-parameter tuning, since this learning method requires a proper definition of a large set of parameters in order to result in a competitive performance.
<br>
<br>
Previously to empirical inquirements, it is necessary to review all details concerning neural networks structure, fitting, and specification, which will base experiments design and tests implementation. So, the theoretical presentation of this notebook will be followed by an empirical stage of tests in which hyper-parameters will be defined to improve neural networks predictive accuracy, after which the best specification obtained should be opposed to alternative learning methods.

---------------

**Hyper-parameters and other definitions**

The following attributes of a neural network should be specified in order to optimize predictive accuracy:
1. Architecture: number of hidden layers ($L$), number of neurons in each hidden layer ($J_l$).
2. Functions: cost function, activation function for neurons in each layer (except for input layer).
3. Distribution for weights initialization.
4. Learning rate ($\eta$).
5. Fitting hyper-parameters: number of epochs ($T$), mini-batch size ($S$).
6. L1 or L2 regularization and its hyper-parameter ($\lambda$).
7. Share of neurons to be dropped out at each mini-batch iteration ($\rho$).
8. Early stopping: minimum change for an improvement ($\delta$) and tolerated (consecutive) number of epochs without improvement ($P$).

-----------

**Strategy to define architecture and hyper-parameters**

The strategy applied here has followed an intinerary that begins with the choice of broader parameters, such as architecture setting, fitting hyper-parameters (mini-batch size) and functions (cost and activation). Then, tests have focused on parameters that are expected to improve generalization as they act to directly reduce overfitting: dropout parameters (input and hidden), and regularization parameters (L1 vs. L2 and penalty term). Later, tuning has covered the learning activity by approaching the optimizer (Adam vs. SGD and all of their parameters) and weights initialization.
<br>
<br>
The next step has concerned a review of all major choices previously made. So, alternative architectures were again tested, besides of new checks on the following hyper-parameters: mini-batch size, dropout parameters (input and hidden), and Adam parameters.
<br>
<br>
Since all tests mentioned above have used random samples of training and validation data, in order to speed up estimations, the regularization parameter was finally set up by making use of a grid search using the entire training and validation datasets. Similarly, the number of epochs was fixed to a small, yet representative value. Then, its final choice has relied on early stopping.
<br>
<br>
In a section below, each hyper-parameter choice will be indicated together with a graph illustrating the evolution of performance metrics by best partial specification.

-----------

#### Evolution of perfomance metrics

As the line plots of evolution in performance metrics by tuning step show, the successive hyper-parameters tuning has achieved ever growing averages of *ROC-AUC* and *average precision score* metrics. Not only the average of these metrics over 100 estimations has improved, but also their standard deviation has consistently decreased, as the ratio between the average and the standard deviation of performance metrics shows.
<br>
<br>
Two tuning steps has provided the strongest improvements in performance: the change in activation function for hidden neurons (from reLu to tanh) and the review in architecture after a first round of hyper-parameters choices (from 2 hidden layers to 1 hidden layer, besides a change in the rule for the number of hidden neurons from $J_1 = sqrt(num\_inputs*num\_outputs)$, $J_2 = J_1/2$ to $J_1 = (num\_inputs + num\_outputs)*0.1$).
<br>
<br>
This illustrates how further improvements could be obtained by iterating even more over that itinerary of hyper-parameters definition. Even so, after only two iterations the best choices of hyper-parameters are as follows:
* Model architecture:
    * One hidden layer.
    * Number of hidden neurons given by $J_1 = (num\_inputs + num\_outputs)*0.1$.
<br>
<br>
* Sigmoid activation function for the output neuron.
* Tanh activation function for the hidden neurons.
* Binary cross-entropy cost function.
* Mini-batch size: $S = 512$.
* L2 regularization with $\lambda = 1e-6$.
* No input dropout.
* Hidden dropout of $\rho = 0.1$.
* Glorot uniform distribution for weights initialization.
* Optimizer: Adam with parameters $\{\eta = 0.001, \beta_1 = 0.5, \beta_2 = 0.9999, \epsilon = 1e-7\}$.
* Default number of epochs: $T = 200$, but effective number defined through early stopping.

--------

This notebook presents and discusses the evolution of performance metrics by step of hyper-parameter tuning, while line plots will illustrate it the change in performance level.

---------------

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Importing data](#imports)<a href='#imports'></a>.
3. [Hyper-parameters choices](#hyper_parameters_choices)<a href='#hyper_parameters_choices'></a>.
    * [Collection of estimations](#collection_estimations)<a href='#collection_estimations'></a>.
    * [Evolution of performance metrics](#performance_evolution)<a href='#performance_evolution'></a>.

<a id='libraries'></a>

## Libraries

In [1]:
import pandas as pd
import numpy as np
import json
import os

from datetime import datetime
import time

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
print(__version__) # requires version >= 1.9.0

# import cufflinks as cf
# init_notebook_mode(connected=True)
# cf.go_offline()

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import pickle

4.12.0


<a id='imports'></a>

## Importing data

In [2]:
# Dictionary with information on model structure and performance:
os.chdir('..')

if 'model_assessment.json' not in os.listdir('Datasets'):
    model_assessment = {}

else:
    with open('Datasets/model_assessment.json') as json_file:
        model_assessment = json.load(json_file)

<a id='hyper_parameters_choices'></a>

## Hyper-parameters choices

<a id='collection_estimations'></a>

### Collection of estimations

In [3]:
# Dataframe with the identification of all estimations:
estimations = pd.DataFrame(data={
    'estimation_id': list(model_assessment.keys()),
    'comment': [model_assessment[k]['comment'] for k in model_assessment.keys()]
})

# Keeping only unique assessments:
estimations = estimations.drop_duplicates(['comment'], keep='last').reset_index(drop=True)

In [4]:
# Complete list of estimations description:
list(estimations.comment)

['Basic estimation for codes development.',
 'Estimation loop for variability assessment.',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.1',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.2',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.3',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.4',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.5',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.6',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.7',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.8',
 'Testing architectures. One hidden layer: J1 = (num_inputs + num_outputs)*0.9',
 'Testing architectures. One hidden layer: J1 = num_obs/(2*(num_inputs + num_outputs))',
 'Testing architectures. One hidden layer: J1 = num_obs/(4*(num_inputs + num_outputs))',
 

#### Selected estimations

In [5]:
selected_estimations = ['1609598769', '1609988509', '1610758512', '1610905503', '1611425007', '1611712007',
                        '1611780618', '1612103127', '1612635856', '1613085785', '1613223667', '1613227483',
                        '1613238130', '1613319533', '1613473761']
tuning = ['architecture', 'batch_size', 'cost_function', 'hidden_activations', 'regul_param', 'input_dropout',
          'hidden_dropout', 'opt_params', 'weights_init',
          'architecture_review', 'batch_size_review', 'input_dropout_review', 'hidden_dropout_review',
          'opt_params_review', 'regul_param_final']
param = ['architecture', 'batch_size', 'cost_function', 'hidden_activations', 'regul_param', 'input_dropout',
         'hidden_dropout', 'opt_params', 'weights_init', 'architecture', 'batch_size', 'input_dropout',
         'hidden_dropout', 'opt_params', 'regul_param']

In [6]:
outcomes = pd.DataFrame(data={
    'estimation_id': selected_estimations,
    'tuning_parameter': param,
    'tuning_step': tuning,
    'avg_roc_auc': [e['performance_metrics']['avg_roc_auc'] for e in
                    [model_assessment[e] for e in selected_estimations]],
    'std_roc_auc': [e['performance_metrics']['std_roc_auc'] for e in
                    [model_assessment[e] for e in selected_estimations]],
    'ratio_roc_auc': [e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_avg_prec_score']
                      for e in [model_assessment[e] for e in selected_estimations]],
    
    'avg_avg_prec_score': [e['performance_metrics']['avg_avg_prec_score'] for e in
                           [model_assessment[e] for e in selected_estimations]],
    'std_avg_prec_score': [e['performance_metrics']['std_avg_prec_score'] for e in
                           [model_assessment[e] for e in selected_estimations]],
    'ratio_avg_prec_score': [e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score']
                             for e in [model_assessment[e] for e in selected_estimations]],
    
    'avg_brier_score': [e['performance_metrics']['avg_brier_score'] for e in
                        [model_assessment[e] for e in selected_estimations]],
    'std_brier_score': [e['performance_metrics']['std_brier_score'] for e in
                        [model_assessment[e] for e in selected_estimations]],
    'ratio_brier_score': [e['performance_metrics']['avg_brier_score']/e['performance_metrics']['std_brier_score']
                          for e in [model_assessment[e] for e in selected_estimations]]
})

print(f'\033[1mShape of outcomes:\033[0m {outcomes.shape}.')

outcomes.head(3)

[1mShape of outcomes:[0m (15, 12).


Unnamed: 0,estimation_id,tuning_parameter,tuning_step,avg_roc_auc,std_roc_auc,ratio_roc_auc,avg_avg_prec_score,std_avg_prec_score,ratio_avg_prec_score,avg_brier_score,std_brier_score,ratio_brier_score
0,1609598769,architecture,architecture,0.868335,0.013726,26.383978,0.262224,0.032911,7.967559,0.018823,0.02068,0.910219
1,1609988509,batch_size,batch_size,0.825623,0.011539,52.242784,0.217563,0.015804,13.76669,0.010698,0.000183,58.399124
2,1610758512,cost_function,cost_function,0.825063,0.01235,49.48693,0.21683,0.016672,13.005352,0.010708,0.000223,48.06263


<a id='performance_evolution'></a>

### Evolution of performance metrics

In [8]:
metric = ['roc_auc', 'avg_prec_score']
metric_name = {'roc_auc': 'ROC-AUC', 'avg_prec_score': 'Average precision score'}

In [10]:
# Loop over performance metrics:
for m in metric:
    # Create figure:
    fig = make_subplots(specs=[[{'secondary_y': True}]])

    fig.add_trace(
        go.Scatter(x = outcomes['tuning_step'], y = outcomes['avg_{0}'.format(m)], name = 'avg_{0}'.format(m),
                   hovertemplate = 'avg_{0}'.format(m) + ' = %{y:.4f}<br>' + 'tuning_step = %{x}',
                   marker_color = 'black'),
        secondary_y = False,
    )

    fig.add_trace(
        go.Scatter(x = outcomes['tuning_step'], y = outcomes['ratio_{0}'.format(m)], name = 'ratio_{0}'.format(m),
                   hovertemplate = 'ratio_{0}'.format(m) + ' = %{y:.4f}<br>' + 'tuning_step = %{x}',
                   marker_color = 'blue'),
        secondary_y = True,
    )

    # Changing layout:
    fig.update_layout(
        title_text=f'Evolution of performance metrics - {metric_name[m]}',
        width=900, height=600
    )

    # Set labels:
    fig.update_xaxes(tickangle=45, title_text='tuning step')
    fig.update_yaxes(title_text='avg_{0}'.format(m), secondary_y = False)
    fig.update_yaxes(title_text='ratio_{0}'.format(m), secondary_y = True)

    fig.show()