## Neural networks applications to fraud detection
## Hyper-parameters definition

Neural networks are one of the most relevant learning methods currently available, and their widespread application is understood by theoretical robustness, flexible architecture design, and strong expected predictive accuracy.
<br>
<br>
The main objective of this study is to develop a neural network application to fraud detection, and mainly to construct and implement a strategy for hyper-parameter tuning, since this learning method requires a proper definition of a large set of parameters in order to result in a competitive performance.
<br>
<br>
Previously to empirical inquirements, it is necessary to review all details concerning neural networks structure, fitting, and specification, which will base experiments design and tests implementation. So, the theoretical presentation of this notebook will be followed by an empirical stage of tests in which hyper-parameters will be defined to improve neural networks predictive accuracy, after which the best specification obtained should be opposed to alternative learning methods.

---------------

**Hyper-parameters and other definitions**

The following attributes of a neural network should be specified in order to optimize predictive accuracy:
1. Architecture: number of hidden layers ($L$), number of neurons in each hidden layer ($J_l$).
2. Functions: cost function, activation function for neurons in each layer (except for input layer).
3. Distribution for weights initialization.
4. Learning rate ($\eta$).
5. Fitting hyper-parameters: number of epochs ($T$), mini-batch size ($S$).
6. L1 or L2 regularization and its hyper-parameter ($\lambda$).
7. Share of neurons to be dropped out at each mini-batch iteration ($\rho$).
8. Early stopping: minimum change for an improvement ($\delta$) and tolerated (consecutive) number of epochs without improvement ($P$).

-----------

**Strategy to define architecture and hyper-parameters**

1. Simplification of the learning task: produce random samples of training and validation data.
    * **Basic estimation:** using random samples of training and validation data, fit a neural network with a single hidden layer and a single neuron.
    <br>
    <br>
    * **Architecture and fitting hyper-parameters:** using random samples of training and validation data, applying Adam optimizer, and considering suitable values for the remaining hyper-parameters, define appropriate values for:
        * Number of hidden layers ($L$).
        * Number of neurons in each hidden layer ($J_l$).
        * Number of epochs ($T$).
        * Mini-batch size ($S$).
        * Parameters of early stopping ($\delta$, $P$).
    <br>
    <br>
    * **Functions:** using random samples of training and validation data, besides architecture and fitting hyper-parameters from above, and considering suitable values for the remaining hyper-parameters, apply Adam optimizer and define the most promising cost and activation functions.
    <br>
    <br>
    * **Dropout parameter ($\rho$):** using random samples of training and validation data, besides architecture, functions and hyper-parameters from above, apply Adam optimizer and define appropriate values for the share of neurons to be dropped out at each mini-batch iteration. Use performance metrics evaluated at the validation data.
    <br>
    <br>
    * **Regularization parameter ($\lambda$):** using random samples of training and validation data, besides architecture, functions and hyper-parameters from above, apply Adam optimizer and define appropriate values for regularization parameter $\lambda$. Use performance metrics evaluated at the validation data.
    <br>
    <br>
    * **Learning rate ($\eta$):** first, explore different values for the hyper-parameters that constitute Adam, using performance metrics evaluated on validation data as reference. Then, explore different settings for stochastic gradient descent (SGD) optimizer. Finally, oppose the best specification of Adam to the best SGD setting.
        * *Strategy for defining $\eta$ with constant SGD configuration:* starting with a small value, increase it until the threshold $\eta_0$ is found, above which validation cost starts to oscillate (overshooting). Define appropriate values for learning rate $\eta$ with $\lambda = 0$. Then, adjust the regularization parameter $\lambda$, after which the learning rate $\eta$ should be redefined, and so on.
        * If constant SGD with an optimum value for $\eta$ outperforms Adam optimizer, review architecture, fitting hyper-parameters and functions.
    <br>
    <br>
    * **Parameters initialization:** oppose default (Glorot Uniform) initialization of parameters with alternative approaches, such as Normal distribution with zero mean and standard deviation equals to the root-squared of the number of neurons in the predecessor layer.
    <br>
    <br>
    * **Review architecture**.
    * **Further review:** consider alternatives for hyper-parameters that have shown similar results during initial tests.
<br>
<br>
2. Hyper-parameters definition using the entire training and validation data:
    * Given $\lambda^*$ an appropriate value found during initial trials based on random samples of training and validation data, then grid or random search will be applied over $[\lambda^* - k_{\lambda}, \lambda^* + k_{\lambda}]$, for $k_{\lambda} > 0$.
    * If constant SGD is applied instead of Adam, then perform grid or random search over $[\eta^* - k_{\eta}, \eta^* + k_{\eta}]$ $x$ $[\lambda^* - k_{\lambda}, \lambda^* + k_{\lambda}]$, for $k_{\eta}, k_{\lambda} > 0$.
    * When implementing grid or random search, architecture and all remaining hyper-parameters will be defined using best values found during initial trials.
<br>
<br>
3. After appropriate values for all hyper-parameters are set, estimation takes place using early stopping to define the number of training epochs. Training will stop once no improvement in performance occurs ($roc\_auc_1 - roc\_auc_0 > \delta = 0$) after $P^*$ epochs of training. Finally, performance metrics are evaluated on the test data, revealing a reliable estimate of predictive accuracy for the neural network model.
    * Note, however, that a different strategy would imply in a higher expected test set performance. Similarly to all other hyper-parameters, the number of epochs could be defined using validation data by maximizing a performance metric, for instance. Then, having at hand appropriate values for all hyper-parameters, training and validation data could be combined so model estimation would use a larger set of training samples.

**Note:** an interesting alternative strategy to define values for all hyper-parameters would follow this itinerary: choice of functions, architecture and its parameters, fitting hyper-parameters (mini-batch size), dropout parameters, regularization parameter, learning rate (or, more broadly, choice of optimizer), parameters initialization, review of architecture, further review.

-----------

**References for implementation**

1. Mini-batch size definition:
    * [Reference](https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/) of strategies for defining mini-batch size.
    * [Reference](https://medium.com/deep-learning-experiments/effect-of-batch-size-on-neural-net-training-c5ae8516e57) about effects of mini-batch size on neural networks estimation.
<br>
<br>
2. Cost function:
    * [Theoretical reference](https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/) concerning cost functions for binary classification tasks.
    * [Reference](https://keras.io/api/losses/) of Keras implementation of different cost functions.
<br>
<br>
3. Activation function:
    * Theoretical discussion on which activation function to use and pros and cons of different alternatives: [reference 1](https://datascience.aero/aviation-function-deep-learning/), [reference 2](https://towardsdatascience.com/analyzing-different-types-of-activation-functions-in-neural-networks-which-one-to-prefer-e11649256209), [reference 3](https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/) and [reference 4](https://towardsdatascience.com/comparison-of-activation-functions-for-deep-neural-networks-706ac4284c8a).
    * [Reference](https://keras.io/api/layers/activations/) of Keras implementation of different activation functions.
    * References of Keras and Tensorflow implementations of [leaky ReLU](https://keras.io/api/layers/activation_layers/leaky_relu/), [PReLU](https://keras.io/api/layers/activation_layers/prelu/) and [Swish](https://www.tensorflow.org/api_docs/python/tf/keras/activations/swish) activations.
    * [Comprehensive list](https://en.wikipedia.org/wiki/Activation_function) of activation functions, including their definitions and properties.
<br>
<br>
4. Regularization:
    * [Tensorflow](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/L2) and [Keras](https://keras.io/api/layers/regularizers/) documentation for regularization.
    * [Reference](https://towardsdatascience.com/how-to-implement-custom-regularization-in-tensorflow-keras-4e77be082918) of implementation of L2 regularization.
<br>
<br>
5. Dropout:
    * [Tensorflow and Keras documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) of dropout layers.
    * [Reference](https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/) of implementation of dropout layers.
<br>
<br>
6. Learning rate:
    * References for Keras using of optimizers: [general usage](https://keras.io/api/optimizers/), [Adam](https://keras.io/api/optimizers/adam/), [SGD](https://keras.io/api/optimizers/sgd/).
    * [Reference](https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/) with theoretical and empirical discussions on learning rate settings.
    * [Reference](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/) of theoretical presentation and implementation guide for Adam.
<br>
<br>
7. Parameters initialization:
    * [Keras documentation](https://keras.io/api/layers/initializers/) for weights and biases initialization.
    * [Tensorflow and Keras documentation](https://www.tensorflow.org/api_docs/python/tf/keras/initializers) for weights and biases initializers.
    * [Reference](https://jamesmccaffrey.wordpress.com/2017/06/21/neural-network-glorot-initialization/#:~:text=One%20common%20initialization%20scheme%20for,fan%2Dout%20of%20the%20weight.) of default Glorot initialization of parameters.
    * [Reference](https://machinelearningmastery.com/weight-initialization-for-deep-learning-neural-networks/) for theoretical discussion on weights initialization.
<br>
<br>
8. Early stopping:
    * [Keras](https://keras.io/api/callbacks/early_stopping/) and [Tensorflow](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) documentations for default early stopping callbacks.
    * [Reference](https://datascience.stackexchange.com/questions/26833/is-there-away-to-change-the-metric-used-by-the-early-stopping-callback-in-keras) for constructing custom early stopping callbacks.

---------------

After an introductory theoretical discussion, this notebook applies a strategy to define all main hyper-parameters for the estimation of neural network models. A further notebook will present the evolution of performance metrics after successive tuning of different hyper-parameters. Then, models based on alternative learning methods will also be trained, and their performance will later be compared against that for the best neural network model.

---------------

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Functions and classes](#functions_classes)<a href='#functions_classes'></a>.
3. [Settings](#settings)<a href='#settings'></a>.
4. [Importing data](#imports)<a href='#imports'></a>.
    * [Categorical features](#categorical_features)<a href='#categorical_features'></a>.
    * [Model assessment](#model_assessment)<a href='#model_assessment'></a>.
    * [Classifying features](#classif_feat)<a href='#classif_feat'></a>.
<br>
<br>
5. [Data pre-processing](#data_pre_proc)<a href='#data_pre_proc'></a>.
    * [Assessing missing values](#assessing_missing)<a href='#assessing_missing'></a>.
    * [Transforming numerical features](#num_transf)<a href='#num_transf'></a>.
    * [Transforming categorical features](#categorical_transf)<a href='#categorical_transf'></a>.
    * [Datasets structure](#datasets_structure)<a href='#datasets_structure'></a>.
<br>
<br>
6. [Basic estimation](#basic_estimation)<a href='#basic_estimation'></a>.
    * [Random samples](#random_samples)<a href='#random_samples'></a>.
    * [Basic neural network](#basic_model)<a href='#basic_model'></a>.
    * [Variability assessment](#var_assessment)<a href='#var_assessment'></a>.
<br>
<br>
7. [Architecture definition](#architecture)<a href='#architecture'></a>.
    * [Neurons for a single hidden layer](#neurons_single_hidden_layer)<a href='#neurons_single_hidden_layer'></a>.
    * [Neurons for two hidden layers](#neurons_two_hidden_layers)<a href='#neurons_two_hidden_layers'></a>.
    * [Neurons for three hidden layers](#neurons_three_hidden_layers)<a href='#neurons_three_hidden_layers'></a>.
<br>
<br>
8. [Fitting hyper-parameters](#fitting_params)<a href='#fitting_params'></a>.
    * [Grid of mini-batch sizes](#grid_mini_batch_sizes)<a href='#grid_mini_batch_sizes'></a>.
    * [Velocity approach to mini-batch size definition](#velocity_approach)<a href='#velocity_approach'></a>.
    * [Number of epochs](#number_epochs)<a href='#number_epochs'></a>.
<br>
<br>
9. [Functions](#functions)<a href='#functions'></a>.
    * [Cost function](#cost_function)<a href='#cost_function'></a>.
    * [Activation functions](#activation_functions)<a href='#activation_functions'></a>.
<br>
<br>
10. [Regularization](#regularization)<a href='#regularization'></a>.
    * [Assessing overfitting through L2 regularization](#assessing_overfitting_l2)<a href='#assessing_overfitting_l2'></a>.
    * [Assessing overfitting through L1 regularization](#assessing_overfitting_l1)<a href='#assessing_overfitting_l1'></a>.
    * [Grid of L2 regularization parameters](#grid_l2_regul_params)<a href='#grid_l2_regul_params'></a>.
    * [Grid of L1 regularization parameters](#grid_l1_regul_params)<a href='#grid_l1_regul_params'></a>.
<br>
<br>
11. [Dropout](#dropout)<a href='#dropout'></a>.
    * [Grid of values for input dropout](#input_dropout)<a href='#input_dropout'></a>.
    * [Grid of values for dropout of hidden neurons](#hidden_dropout)<a href='#hidden_dropout'></a>.
<br>
<br>
12. [Learning rate](#learning_rate)<a href='#learning_rate'></a>.
    * [Testing Adam hyper-parameters](#adam_params)<a href='#adam_params'></a>.
    * [SGD optimizer (no momentum and no decay)](#sgd_opt1)<a href='#sgd_opt1'></a>.
    * [SGD optimizer](#sgd_opt2)<a href='#sgd_opt2'></a>.
<br>
<br>
13. [Parameters initialization](#parameters_init)<a href='#parameters_init'></a>.
    * [Grid of distributions](#distributions)<a href='#distributions'></a>.
<br>
<br>
14. [Architecture review](#architecture_review)<a href='#architecture_review'></a>.
    * [Testing alternative architectures](#testing_architectures)<a href='#testing_architectures'></a>.
<br>
<br>
15. [Further review](#further_review)<a href='#further_review'></a>.
    * [Mini-batch size](#mini_batch_size_review)<a href='#mini_batch_size_review'></a>.
    * [Input dropout](#input_dropout_review)<a href='#input_dropout_review'></a>.
    * [Hidden dropout](#hidden_dropout_review)<a href='#hidden_dropout_review'></a>.
    * [Adam hyper-parameters](#adam_params_review)<a href='#adam_params_review'></a>.
<br>
<br>
16. [Grid search for regularization parameter](#regul_param_grid_search)<a href='#regul_param_grid_search'></a>.
17. [Final estimation with early stopping](#final_estimation)<a href='#final_estimation'></a>.

<a id='libraries'></a>

## Libraries

In [1]:
import pandas as pd
import numpy as np
import json
import os

from datetime import datetime
import time

import progressbar
from time import sleep

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, LeakyReLU, PReLU
from tensorflow.keras.regularizers import l1, l2
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.callbacks import EarlyStopping, Callback
from tensorflow.keras.initializers import RandomNormal, Zeros
from tensorflow.nn import leaky_relu
from tensorflow.keras.activations import swish
from tensorflow.keras.models import load_model

from scipy.stats import uniform, norm, randint

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import roc_auc_score, average_precision_score, auc, precision_recall_curve, brier_score_loss

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# print(__version__) # requires version >= 1.9.0

import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import pickle

<a id='functions_classes'></a>

## Functions and classes

In [2]:
import utils
from utils import epoch_to_date, text_clean, is_velocity, balanced_sample, get_cat, permutation

In [3]:
from transformations import log_transformation, standard_scale, recreate_missings, impute_missing
from transformations import one_hot_encoding

In [4]:
import keras_nn
from keras_nn import KerasNN

<a id='settings'></a>

## Settings

In [5]:
# Declare whether to export results:
export = True

# Define a dataset id:
s = 6044

# Declare whether to apply logarithmic transformation over numerical data:
log_transform = True

# Declare whether to standardize numerical data:
standardize = True

<a id='imports'></a>

## Importing data

In [6]:
# Train data:
os.chdir('/home/matheus_rosso/Arquivo/Features/Datasets/')

df_train = pd.read_csv('new_additional_datasets/dataset_' + str(s) + '.csv',
                       dtype={'order_id': str, 'store_id': int})
df_train.drop_duplicates(['order_id', 'epoch', 'order_amount'], inplace=True)
df_train.reset_index(drop=True, inplace=True)
df_train['date'] = df_train.epoch.apply(epoch_to_date)

# Dropping original categorical features:
cat_vars = get_cat(df_train)
c_vars = [c for c in list(df_train.columns) if 'C#' in c]
na_vars = ['NA#' + c for c in cat_vars if 'NA#' + c in list(df_train.columns)]

df_train = df_train.drop(c_vars, axis=1).drop(na_vars, axis=1)

# Splitting data into train and test:
df_test = df_train[(df_train.date > datetime.strptime('2020-03-30', '%Y-%m-%d'))]
df_train = df_train[(df_train.date <= datetime.strptime('2020-03-30', '%Y-%m-%d'))]

# Splitting data into validation and test:
df_val = df_test[df_test.date < datetime.strptime('2020-05-01', '%Y-%m-%d')]
df_test = df_test[df_test.date >= datetime.strptime('2020-05-01', '%Y-%m-%d')]

print('\033[1mShape of df_train for store ' + str(s) + ':\033[0m ' + str(df_train.shape) + '.')
print('\033[1mShape of df_val for store ' + str(s) + ':\033[0m ' + str(df_val.shape) + '.')
print('\033[1mShape of df_test for store ' + str(s) + ':\033[0m ' + str(df_test.shape) + '.')
print('\n')

# Accessory variables:
drop_vars = ['y', 'order_amount', 'store_id', 'order_id', 'status', 'epoch', 'date', 'weight']

df_train.head(3)

[1mShape of df_train for store 6044:[0m (35897, 2173).
[1mShape of df_val for store 6044:[0m (20940, 2173).
[1mShape of df_test for store 6044:[0m (21791, 2173).




Unnamed: 0,BILLINGLARGEAREAREPUTATION(),BILLINGSMALLAREAREPUTATION(),"BILLINGZIP(CREDITCARD,10080)","BILLINGZIP(CREDITCARD,1440)","BILLINGZIP(CREDITCARD,21600)","BILLINGZIP(CREDITCARD,360)","BILLINGZIP(CREDITCARD,43200)","BILLINGZIP(CREDITCARD,60)","BILLINGZIP(CREDITCARD,64800)","BILLINGZIP(DOCUMENT,10080)",...,ZIPFIRST3REPUTATION(),ZIPFIRST5REPUTATION(),y,order_amount,order_id,status,epoch,store_id,weight,date
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,156.75,D48D0720681E4F5D9A2767F7174B5FA6-2782006,APPROVED,1577751000000.0,6044,1.0,2019-12-30
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000992,0.0,0.0,67.96,A0EB579C0AE0452D9020C91C54565B4F-2782009,APPROVED,1577751000000.0,6044,1.0,2019-12-30
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.003344,0.0,0.0,315.72,17A1DF0F984E4B34AC512D7E9E23B7BB-2782011,APPROVED,1577751000000.0,6044,1.0,2019-12-30


In [7]:
# Assessing missing values:
num_miss_train = df_train.isnull().sum().sum()
num_miss_val = df_val.isnull().sum().sum()
num_miss_test = df_test.isnull().sum().sum()

if num_miss_train > 0:
    print('\033[1mProblem - Number of overall missings detected (training data):\033[0m ' +
          str(df_train.isnull().sum().sum()) + '.')
    print('\n')

if num_miss_val > 0:
    print('\033[1mProblem - Number of overall missings detected (validation data):\033[0m ' +
          str(df_val.isnull().sum().sum()) + '.')
    print('\n')
    
if num_miss_test > 0:
    print('\033[1mProblem - Number of overall missings detected (test data):\033[0m ' +
          str(df_test.isnull().sum().sum()) + '.')
    print('\n')

<a id='categorical_features'></a>

### Categorical features

In [8]:
categorical_train = pd.read_csv('new_additional_datasets/categorical_features/dataset_' + str(s) + '.csv',
                      dtype={'order_id': str, 'store_id': int})
categorical_train.drop_duplicates(['order_id', 'epoch', 'order_amount'], inplace=True)

categorical_train['date'] = categorical_train.epoch.apply(epoch_to_date)

# Splitting data into train and test:
categorical_test = categorical_train[(categorical_train.date > datetime.strptime('2020-03-30', '%Y-%m-%d'))]
categorical_train = categorical_train[(categorical_train.date <= datetime.strptime('2020-03-30', '%Y-%m-%d'))]

# Splitting data into validation and test:
categorical_val = categorical_test[categorical_test.date < datetime.strptime('2020-05-01', '%Y-%m-%d')]
categorical_test = categorical_test[categorical_test.date >= datetime.strptime('2020-05-01', '%Y-%m-%d')]

print('\033[1mShape of categorical_train (training data):\033[0m ' + str(categorical_train.shape) + '.')
print('\033[1mNumber of orders (training data):\033[0m ' + str(categorical_train.order_id.nunique()) + '.')
print('\n')

print('\033[1mShape of categorical_val (validation data):\033[0m ' + str(categorical_val.shape) + '.')
print('\033[1mNumber of orders (validation data):\033[0m ' + str(categorical_val.order_id.nunique()) + '.')
print('\n')

print('\033[1mShape of categorical_test (test data):\033[0m ' + str(categorical_test.shape) + '.')
print('\033[1mNumber of orders (test data):\033[0m ' + str(categorical_test.order_id.nunique()) + '.')
print('\n')

categorical_train.head()

[1mShape of categorical_train (training data):[0m (35897, 22).
[1mNumber of orders (training data):[0m 35897.


[1mShape of categorical_val (validation data):[0m (20940, 22).
[1mNumber of orders (validation data):[0m 20940.


[1mShape of categorical_test (test data):[0m (21791, 22).
[1mNumber of orders (test data):[0m 21791.




Unnamed: 0,BILLINGCITY(),BILLINGSTATE(),BROWSER(),CREDITCARDBRAND(),CREDITCARDCOUNTRY(),CREDITCARDSUBTYPE(),EMAILDOMAIN(),GENDERBYNAMEPTBR(),IPGEOLOCATIONCITY(),IPGEOLOCATIONCOUNTRY(),...,SHIPPINGSTATE(),UTMSOURCELASTCLICK(),y,order_amount,order_id,status,epoch,store_id,weight,date
0,,,,VISA,BR,GOLD,hotmail.com,F,Fartura,BR,...,SP,,0.0,156.75,D48D0720681E4F5D9A2767F7174B5FA6-2782006,APPROVED,1577751000000.0,6044,1.0,2019-12-30
1,,,,MASTERCARD,BR,GOLD,gmail.com,F,São Paulo,BR,...,SP,,0.0,67.96,A0EB579C0AE0452D9020C91C54565B4F-2782009,APPROVED,1577751000000.0,6044,1.0,2019-12-30
2,,,,VISA,BR,CLASSIC,gmail.com,F,Recife,BR,...,AL,,0.0,315.72,17A1DF0F984E4B34AC512D7E9E23B7BB-2782011,APPROVED,1577751000000.0,6044,1.0,2019-12-30
3,,,,MASTERCARD,BR,PLATINUM,gmail.com,F,Guarapari,BR,...,RJ,,0.0,514.15,21CA5C8AA45B400DB55985466AEE0BCD-2782028,APPROVED,1577751000000.0,6044,1.0,2019-12-30
4,,,,ELO/DISCOVER,BR,NANJING DINERS,hotmail.com,M,Curitiba,BR,...,SC,,0.0,64.74,FFC167F3C6C742C9AD26E7E07ED72115-2782055,APPROVED,1577752000000.0,6044,1.0,2019-12-30


#### Treating missing values

In [9]:
print('\033[1mAssessing missing values in categorical data (training data):\033[0m')
print(categorical_train.drop(drop_vars, axis=1).isnull().sum().sort_values(ascending=False))

[1mAssessing missing values in categorical data (training data):[0m
UTMSOURCELASTCLICK()      35793
BROWSER()                 35689
BILLINGSTATE()            32920
BILLINGCITY()             32920
CREDITCARDSUBTYPE()         642
IPGEOLOCATIONCITY()         522
IPGEOLOCATIONCOUNTRY()       20
GENDERBYNAMEPTBR()           12
SHIPPINGSTATE()               0
SHIPPINGCITY()                0
SELLERID()                    0
EMAILDOMAIN()                 0
CREDITCARDCOUNTRY()           0
CREDITCARDBRAND()             0
dtype: int64


In [10]:
print('\033[1mAssessing missing values in categorical data (validation data):\033[0m')
print(categorical_val.drop(drop_vars, axis=1).isnull().sum().sort_values(ascending=False))

[1mAssessing missing values in categorical data (validation data):[0m
UTMSOURCELASTCLICK()      20896
BROWSER()                 20846
BILLINGSTATE()            19447
BILLINGCITY()             19447
CREDITCARDSUBTYPE()         350
IPGEOLOCATIONCITY()         274
GENDERBYNAMEPTBR()           10
IPGEOLOCATIONCOUNTRY()        5
CREDITCARDCOUNTRY()           1
SHIPPINGSTATE()               0
SHIPPINGCITY()                0
SELLERID()                    0
EMAILDOMAIN()                 0
CREDITCARDBRAND()             0
dtype: int64


In [11]:
print('\033[1mAssessing missing values in categorical data (test data):\033[0m')
print(categorical_test.drop(drop_vars, axis=1).isnull().sum().sort_values(ascending=False))

[1mAssessing missing values in categorical data (test data):[0m
UTMSOURCELASTCLICK()      21757
BROWSER()                 21689
BILLINGSTATE()            20084
BILLINGCITY()             20084
IPGEOLOCATIONCITY()        1927
IPGEOLOCATIONCOUNTRY()      492
CREDITCARDSUBTYPE()         455
CREDITCARDCOUNTRY()           2
GENDERBYNAMEPTBR()            1
SHIPPINGSTATE()               0
SHIPPINGCITY()                0
SELLERID()                    0
EMAILDOMAIN()                 0
CREDITCARDBRAND()             0
dtype: int64


In [12]:
# Loop over categorical features:
for f in categorical_train.drop(drop_vars, axis=1).columns:
    # Training data
    categorical_train[f] = categorical_train[f].apply(lambda x: 'NA_VALUE' if pd.isna(x) else x)
    
    # Validation data:
    categorical_val[f] = categorical_val[f].apply(lambda x: 'NA_VALUE' if pd.isna(x) else x)
    
    # Test data:
    categorical_test[f] = categorical_test[f].apply(lambda x: 'NA_VALUE' if pd.isna(x) else x)

In [13]:
# Assessing missing values:
if categorical_train.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (training data):\033[0m ' +
          str(categorical_train.isnull().sum().sum()) + '.')
    print('\n')

if categorical_val.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (validation data):\033[0m ' +
          str(categorical_val.isnull().sum().sum()) + '.')
    print('\n')
    
if categorical_test.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (test data):\033[0m ' +
          str(categorical_test.isnull().sum().sum()) + '.')
    print('\n')

#### Treating text data

In [14]:
na_vars = [c for c in categorical_train.drop(drop_vars, axis=1) if 'NA#' in c]

# Loop over categorical features:
for f in categorical_train.drop(drop_vars, axis=1).drop(na_vars, axis=1).columns:
    # Training data:
    categorical_train[f] = categorical_train[f].apply(lambda x: text_clean(str(x)))
    
    # Validation data:
    categorical_val[f] = categorical_val[f].apply(lambda x: text_clean(str(x)))
    
    # Test data:
    categorical_test[f] = categorical_test[f].apply(lambda x: text_clean(str(x)))

categorical_train.head()

Unnamed: 0,BILLINGCITY(),BILLINGSTATE(),BROWSER(),CREDITCARDBRAND(),CREDITCARDCOUNTRY(),CREDITCARDSUBTYPE(),EMAILDOMAIN(),GENDERBYNAMEPTBR(),IPGEOLOCATIONCITY(),IPGEOLOCATIONCOUNTRY(),...,SHIPPINGSTATE(),UTMSOURCELASTCLICK(),y,order_amount,order_id,status,epoch,store_id,weight,date
0,na_value,na_value,na_value,visa,br,gold,hotmail.com,f,fartura,br,...,sp,na_value,0.0,156.75,D48D0720681E4F5D9A2767F7174B5FA6-2782006,APPROVED,1577751000000.0,6044,1.0,2019-12-30
1,na_value,na_value,na_value,mastercard,br,gold,gmail.com,f,sao_paulo,br,...,sp,na_value,0.0,67.96,A0EB579C0AE0452D9020C91C54565B4F-2782009,APPROVED,1577751000000.0,6044,1.0,2019-12-30
2,na_value,na_value,na_value,visa,br,classic,gmail.com,f,recife,br,...,al,na_value,0.0,315.72,17A1DF0F984E4B34AC512D7E9E23B7BB-2782011,APPROVED,1577751000000.0,6044,1.0,2019-12-30
3,na_value,na_value,na_value,mastercard,br,platinum,gmail.com,f,guarapari,br,...,rj,na_value,0.0,514.15,21CA5C8AA45B400DB55985466AEE0BCD-2782028,APPROVED,1577751000000.0,6044,1.0,2019-12-30
4,na_value,na_value,na_value,elo/discover,br,nanjing_diners,hotmail.com,m,curitiba,br,...,sc,na_value,0.0,64.74,FFC167F3C6C742C9AD26E7E07ED72115-2782055,APPROVED,1577752000000.0,6044,1.0,2019-12-30


#### Merging all features

In [15]:
# Training data:
df_train = df_train.merge(categorical_train[[f for f in categorical_train.columns if (f not in drop_vars) |
                                             (f == 'order_id')]],
                          on='order_id', how='left')

print('\033[1mShape of df_train for store ' + str(s) + ':\033[0m ' + str(df_train.shape) + '.')
print('\n')
df_train.head()

[1mShape of df_train for store 6044:[0m (35897, 2187).




Unnamed: 0,BILLINGLARGEAREAREPUTATION(),BILLINGSMALLAREAREPUTATION(),"BILLINGZIP(CREDITCARD,10080)","BILLINGZIP(CREDITCARD,1440)","BILLINGZIP(CREDITCARD,21600)","BILLINGZIP(CREDITCARD,360)","BILLINGZIP(CREDITCARD,43200)","BILLINGZIP(CREDITCARD,60)","BILLINGZIP(CREDITCARD,64800)","BILLINGZIP(DOCUMENT,10080)",...,CREDITCARDCOUNTRY(),CREDITCARDSUBTYPE(),EMAILDOMAIN(),GENDERBYNAMEPTBR(),IPGEOLOCATIONCITY(),IPGEOLOCATIONCOUNTRY(),SELLERID(),SHIPPINGCITY(),SHIPPINGSTATE(),UTMSOURCELASTCLICK()
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,hotmail.com,f,fartura,br,none,fartura,sp,na_value
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,gmail.com,f,sao_paulo,br,none,santos,sp,na_value
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,classic,gmail.com,f,recife,br,none,maceio,al,na_value
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,platinum,gmail.com,f,guarapari,br,none,itaperuna,rj,na_value
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,nanjing_diners,hotmail.com,m,curitiba,br,none,sao_jose,sc,na_value


In [16]:
# Validation data:
df_val = df_val.merge(categorical_val[[f for f in categorical_val.columns if (f not in drop_vars) |
                                       (f == 'order_id')]],
                      on='order_id', how='left')

print('\033[1mShape of df_val for store ' + str(s) + ':\033[0m ' + str(df_val.shape) + '.')
print('\n')
df_val.head()

[1mShape of df_val for store 6044:[0m (20940, 2187).




Unnamed: 0,BILLINGLARGEAREAREPUTATION(),BILLINGSMALLAREAREPUTATION(),"BILLINGZIP(CREDITCARD,10080)","BILLINGZIP(CREDITCARD,1440)","BILLINGZIP(CREDITCARD,21600)","BILLINGZIP(CREDITCARD,360)","BILLINGZIP(CREDITCARD,43200)","BILLINGZIP(CREDITCARD,60)","BILLINGZIP(CREDITCARD,64800)","BILLINGZIP(DOCUMENT,10080)",...,CREDITCARDCOUNTRY(),CREDITCARDSUBTYPE(),EMAILDOMAIN(),GENDERBYNAMEPTBR(),IPGEOLOCATIONCITY(),IPGEOLOCATIONCOUNTRY(),SELLERID(),SHIPPINGCITY(),SHIPPINGSTATE(),UTMSOURCELASTCLICK()
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,hotmail.com,f,sao_paulo,br,none,itapecerica_da_serra,sp,na_value
1,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,br,infinite,karseg.com.br,m,campinas,br,none,campinas,sp,na_value
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,gmail.com,m,sao_paulo,br,none,sao_paulo,sp,na_value
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,platinum,gmail.com,f,mairinque,br,none,mairinque,sp,na_value
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,platinum,hotmail.com,f,salvador,br,none,joao_pessoa,pb,na_value


In [17]:
# Test data:
df_test = df_test.merge(categorical_test[[f for f in categorical_test.columns if (f not in drop_vars) |
                                          (f == 'order_id')]],
                        on='order_id', how='left')

print('\033[1mShape of df_test for store ' + str(s) + ':\033[0m ' + str(df_test.shape) + '.')
print('\n')
df_test.head()

[1mShape of df_test for store 6044:[0m (21791, 2187).




Unnamed: 0,BILLINGLARGEAREAREPUTATION(),BILLINGSMALLAREAREPUTATION(),"BILLINGZIP(CREDITCARD,10080)","BILLINGZIP(CREDITCARD,1440)","BILLINGZIP(CREDITCARD,21600)","BILLINGZIP(CREDITCARD,360)","BILLINGZIP(CREDITCARD,43200)","BILLINGZIP(CREDITCARD,60)","BILLINGZIP(CREDITCARD,64800)","BILLINGZIP(DOCUMENT,10080)",...,CREDITCARDCOUNTRY(),CREDITCARDSUBTYPE(),EMAILDOMAIN(),GENDERBYNAMEPTBR(),IPGEOLOCATIONCITY(),IPGEOLOCATIONCOUNTRY(),SELLERID(),SHIPPINGCITY(),SHIPPINGSTATE(),UTMSOURCELASTCLICK()
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,nanjing,adp.com,m,porto_alegre,br,none,porto_alegre,rs,na_value
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,platinum,uol.com.br,f,jundiai,br,none,jundiai,sp,na_value
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,black,uol.com.br,m,itanhaem,br,none,itanhaem,sp,na_value
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,gmail.com,f,santa_maria,br,none,santa_maria,rs,na_value
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,br,gold,gmail.com,m,sao_paulo,br,none,sao_paulo,sp,na_value


In [18]:
# Assessing missing values (training data):
if df_train.isnull().sum().sum() != num_miss_train:
    print('\033[1mInconsistent number of overall missings values (training data)!\033[0m')
    print('\n')

# Assessing missing values (validation data):
if df_val.isnull().sum().sum() != num_miss_val:
    print('\033[1mInconsistent number of overall missings values (validation data)!\033[0m')
    print('\n')
    
# Assessing missing values (test data):
if df_test.isnull().sum().sum() != num_miss_test:
    print('\033[1mInconsistent number of overall missings values (test data)!\033[0m')
    print('\n')

<a id='model_assessment'></a>

### Model assessment

In [19]:
# Dictionary with information on model structure and performance:
os.chdir('/home/matheus_rosso/Arquivo/Materiais/Codes/neural_nets/')

if 'model_assessment.json' not in os.listdir('Datasets'):
    model_assessment = {}

else:
    with open('Datasets/model_assessment.json') as json_file:
        model_assessment = json.load(json_file)

<a id='classif_feat'></a>

### Classifying features

In [20]:
# Categorical features:
cat_vars = list(categorical_train.drop(drop_vars, axis=1).columns)

# Dummy variables indicating missing value status:
missing_vars = [c for c in list(df_train.drop(drop_vars, axis=1).columns) if ('NA#' in c)]

# Dropping features with no variance:
no_variance = [c for c in df_train.drop(drop_vars, axis=1).drop(cat_vars,
                                                                axis=1).drop(missing_vars,
                                                                             axis=1) if df_train[c].var()==0]

if len(no_variance) > 0:
    df_train.drop(no_variance, axis=1, inplace=True)
    df_val.drop(no_variance, axis=1, inplace=True)
    df_test.drop(no_variance, axis=1, inplace=True)

# Numerical features:
cont_vars = [c for c in  list(df_train.drop(drop_vars, axis=1).columns) if is_velocity(c)]

# Binary features:
binary_vars = [c for c in list(df_train.drop([c for c in df_train.columns if (c in drop_vars) |
                                             (c in cat_vars) | (c in missing_vars) | (c in cont_vars)],
                                             axis=1).columns) if set(df_train[c].unique()) == set([0,1])]

# Updating the list of numerical features:
for c in list(df_train.drop(drop_vars, axis=1).columns):
    if (c not in cat_vars) & (c not in missing_vars) & (c not in cont_vars) & (c not in binary_vars):
        cont_vars.append(c)

# Dataframe presenting the frequency of features by class:
feats_assess = pd.DataFrame(data={
    'class': ['cat_vars', 'missing_vars', 'binary_vars', 'cont_vars', 'drop_vars'],
    'frequency': [len(cat_vars), len(missing_vars), len(binary_vars), len(cont_vars), len(drop_vars)]
})
feats_assess.sort_values('frequency', ascending=False)

Unnamed: 0,class,frequency
3,cont_vars,1619
1,missing_vars,415
2,binary_vars,27
0,cat_vars,14
4,drop_vars,8


<a id='data_pre_proc'></a>

## Data pre-processing

<a id='assessing_missing'></a>

### Assessing missing values

#### Recreating missing values

In [21]:
missing_vars = [f for f in df_train.columns if 'NA#' in f]

# Loop over variables with missing values:
for f in [c for c in missing_vars if c.replace('NA#', '') not in cat_vars]:
    if f.replace('NA#', '') in df_train.columns:
        # Training data:
        df_train[f.replace('NA#', '')] = recreate_missings(df_train[f.replace('NA#', '')], df_train[f])
        
        # Validation data:
        df_val[f.replace('NA#', '')] = recreate_missings(df_val[f.replace('NA#', '')], df_val[f])
        
        # Test data:
        df_test[f.replace('NA#', '')] = recreate_missings(df_test[f.replace('NA#', '')], df_test[f])
    else:
        df_train.drop([f], axis=1, inplace=True)
        
        df_val.drop([f], axis=1, inplace=True)
        
        df_test.drop([f], axis=1, inplace=True)

In [22]:
# Dropping all variables with missing value status:
df_train.drop([f for f in df_train.columns if 'NA#' in f], axis=1, inplace=True)

df_val.drop([f for f in df_val.columns if 'NA#' in f], axis=1, inplace=True)

df_test.drop([f for f in df_test.columns if 'NA#' in f], axis=1, inplace=True)

#### Describing the frequency of missing values

In [23]:
# Dataframe with the number of missings by feature (training data):
missings_dict = df_train.isnull().sum().sort_values(ascending=False).to_dict()

missings_assess_train = pd.DataFrame(data={
    'feature': list(missings_dict.keys()),
    'missings': list(missings_dict.values())
})

print('\033[1mNumber of features with missings:\033[0m {}'.format(sum(missings_assess_train.missings > 0)) +
      ' out of {} features'.format(len(missings_assess_train)) +
      ' ({}%).'.format(round((sum(missings_assess_train.missings > 0)/len(missings_assess_train))*100, 2)))
print('\033[1mAverage number of missings:\033[0m {}'.format(int(missings_assess_train.missings.mean())) +
      ' out of {} observations'.format(len(df_train)) +
      ' ({}%).'.format(round((int(missings_assess_train.missings.mean())/len(df_train))*100,2)))
print('\n')
missings_assess_train.index.name = 'training_data'
missings_assess_train.head(10)

[1mNumber of features with missings:[0m 389 out of 1668 features (23.32%).
[1mAverage number of missings:[0m 7108 out of 35897 observations (19.8%).




Unnamed: 0_level_0,feature,missings
training_data,Unnamed: 1_level_1,Unnamed: 2_level_1
0,"CUSTNAVCOUNT(cv,6M)",35601
1,"GDOCUMENT(TOTAL_AMOUNT,60)",35574
2,"GTELEPHONE(TOTAL_AMOUNT,360)",35540
3,"NAME(TOTAL_AMOUNT,1440)",35494
4,"IP(TOTAL_AMOUNT,1440)",35490
5,"EMAIL(TOTAL_AMOUNT,1440)",35482
6,"DOCUMENT(TOTAL_AMOUNT,1440)",35459
7,"CREDITCARD(TOTAL_AMOUNT,1440)",35458
8,"TELEPHONE(TOTAL_AMOUNT,1440)",35447
9,"GEMAIL(TOTAL_AMOUNT,360)",35387


In [24]:
# Dataframe with the number of missings by feature (validation data):
missings_dict = df_val.isnull().sum().sort_values(ascending=False).to_dict()

missings_assess_val = pd.DataFrame(data={
    'feature': list(missings_dict.keys()),
    'missings': list(missings_dict.values())
})

print('\033[1mNumber of features with missings:\033[0m {}'.format(sum(missings_assess_val.missings > 0)) +
      ' out of {} features'.format(len(missings_assess_val)) +
      ' ({}%).'.format(round((sum(missings_assess_val.missings > 0)/len(missings_assess_val))*100, 2)))
print('\033[1mAverage number of missings:\033[0m {}'.format(int(missings_assess_val.missings.mean())) +
      ' out of {} observations'.format(len(df_val)) +
      ' ({}%).'.format(round((int(missings_assess_val.missings.mean())/len(df_val))*100,2)))
print('\n')
missings_assess_val.index.name = 'val_data'
missings_assess_val.head(10)

[1mNumber of features with missings:[0m 389 out of 1668 features (23.32%).
[1mAverage number of missings:[0m 4176 out of 20940 observations (19.94%).




Unnamed: 0_level_0,feature,missings
val_data,Unnamed: 1_level_1,Unnamed: 2_level_1
0,"CUSTNAVCOUNT(cv,6M)",20741
1,"GDOCUMENT(TOTAL_AMOUNT,60)",20731
2,"GTELEPHONE(TOTAL_AMOUNT,360)",20710
3,"NAME(TOTAL_AMOUNT,1440)",20685
4,"EMAIL(TOTAL_AMOUNT,1440)",20674
5,"CREDITCARD(TOTAL_AMOUNT,1440)",20658
6,"IP(TOTAL_AMOUNT,1440)",20657
7,"DOCUMENT(TOTAL_AMOUNT,1440)",20657
8,"TELEPHONE(TOTAL_AMOUNT,1440)",20654
9,FSBZIPPHONE(),20630


In [25]:
# Dataframe with the number of missings by feature (test data):
missings_dict = df_test.isnull().sum().sort_values(ascending=False).to_dict()

missings_assess_test = pd.DataFrame(data={
    'feature': list(missings_dict.keys()),
    'missings': list(missings_dict.values())
})

print('\033[1mNumber of features with missings:\033[0m {}'.format(sum(missings_assess_test.missings > 0)) +
      ' out of {} features'.format(len(missings_assess_test)) +
      ' ({}%).'.format(round((sum(missings_assess_test.missings > 0)/len(missings_assess_test))*100, 2)))
print('\033[1mAverage number of missings:\033[0m {}'.format(int(missings_assess_test.missings.mean())) +
      ' out of {} observations'.format(len(df_test)) +
      ' ({}%).'.format(round((int(missings_assess_test.missings.mean())/len(df_test))*100,2)))
print('\n')
missings_assess_test.index.name = 'test_data'
missings_assess_test.head(10)

[1mNumber of features with missings:[0m 389 out of 1668 features (23.32%).
[1mAverage number of missings:[0m 4302 out of 21791 observations (19.74%).




Unnamed: 0_level_0,feature,missings
test_data,Unnamed: 1_level_1,Unnamed: 2_level_1
0,"GDOCUMENT(TOTAL_AMOUNT,60)",21521
1,"GTELEPHONE(TOTAL_AMOUNT,360)",21512
2,"EMAIL(TOTAL_AMOUNT,1440)",21493
3,"CUSTNAVCOUNT(cv,6M)",21490
4,"NAME(TOTAL_AMOUNT,1440)",21482
5,"TELEPHONE(TOTAL_AMOUNT,1440)",21476
6,"DOCUMENT(TOTAL_AMOUNT,1440)",21475
7,"CREDITCARD(TOTAL_AMOUNT,1440)",21472
8,"IP(TOTAL_AMOUNT,1440)",21466
9,FSBZIPPHONE(),21444


<a id='num_transf'></a>

### Transforming numerical features

#### Logarithmic transformation

In [26]:
print('---------------------------------------------------------------------------------------------------------')
print('\033[1mAPPLYING LOGARITHMIC TRANSFORMATION OVER NUMERICAL DATA\033[0m')
print('\n')
# Variables that should not be log-transformed:
not_log = [c for c in df_train.columns if c not in cont_vars]

if log_transform:
    print('\033[1mTraining data:\033[0m')

    # Assessing missing values (before logarithmic transformation):
    num_miss_train = df_train.isnull().sum().sum()
    if num_miss_train > 0:
        print('\033[1mNumber of overall missings detected (before logarithmic transformation):\033[0m ' +
              str(num_miss_train) + '.')

    log_transf = log_transformation(not_log=not_log)
    log_transf.transform(df_train)
    df_train = log_transf.log_transformed

    # Assessing missing values (after logarithmic transformation):
    num_miss_train_log = df_train.isnull().sum().sum()
    if num_miss_train_log > 0:
        print('\033[1mNumber of overall missings detected (after logarithmic transformation):\033[0m ' + 
              str(num_miss_train_log) + '.')

    # Checking consistency in the number of missings:
    if num_miss_train_log != num_miss_train:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

    print('\n')
    print('\033[1mValidation data:\033[0m')

    # Assessing missing values (before logarithmic transformation):
    num_miss_val = df_val.isnull().sum().sum()
    if num_miss_val > 0:
        print('\033[1mNumber of overall missings detected (before logarithmic transformation):\033[0m ' +
              str(num_miss_val) + '.')

    log_transf = log_transformation(not_log=not_log)
    log_transf.transform(df_val)
    df_val = log_transf.log_transformed

    # Assessing missing values (after logarithmic transformation):
    num_miss_val_log = df_val.isnull().sum().sum()
    if num_miss_val_log > 0:
        print('\033[1mNumber of overall missings detected (after logarithmic transformation):\033[0m ' + 
              str(num_miss_val_log) + '.')

    # Checking consistency in the number of missings:
    if num_miss_val_log != num_miss_val:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')
        
    print('\n')
    print('\033[1mTest data:\033[0m')

    # Assessing missing values (before logarithmic transformation):
    num_miss_test = df_test.isnull().sum().sum()
    if num_miss_test > 0:
        print('\033[1mNumber of overall missings detected (before logarithmic transformation):\033[0m ' +
              str(num_miss_test) + '.')

    log_transf = log_transformation(not_log=not_log)
    log_transf.transform(df_test)
    df_test = log_transf.log_transformed

    # Assessing missing values (after logarithmic transformation):
    num_miss_test_log = df_test.isnull().sum().sum()
    if num_miss_test_log > 0:
        print('\033[1mNumber of overall missings detected (after logarithmic transformation):\033[0m ' + 
              str(num_miss_test_log) + '.')

    # Checking consistency in the number of missings:
    if num_miss_test_log != num_miss_test:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

else:
    print('\033[1mNo transformation performed!\033[0m')

print('\n')
print('---------------------------------------------------------------------------------------------------------')
print('\n')

---------------------------------------------------------------------------------------------------------
[1mAPPLYING LOGARITHMIC TRANSFORMATION OVER NUMERICAL DATA[0m


[1mTraining data:[0m
[1mNumber of overall missings detected (before logarithmic transformation):[0m 11856255.
[1mNumber of numerical variables log-transformed:[0m 1619.
[1mNumber of overall missings detected (after logarithmic transformation):[0m 11856255.


[1mValidation data:[0m
[1mNumber of overall missings detected (before logarithmic transformation):[0m 6967103.
[1mNumber of numerical variables log-transformed:[0m 1619.
[1mNumber of overall missings detected (after logarithmic transformation):[0m 6967103.


[1mTest data:[0m
[1mNumber of overall missings detected (before logarithmic transformation):[0m 7177024.
[1mNumber of numerical variables log-transformed:[0m 1619.
[1mNumber of overall missings detected (after logarithmic transformation):[0m 7177024.


---------------------------------

#### Standardizing numerical features

In [27]:
print('---------------------------------------------------------------------------------------------------------')
print('\033[1mAPPLYING STANDARD SCALE TRANSFORMATION OVER NUMERICAL DATA\033[0m')
print('\n')
# Inputs that should not be standardized:
not_stand = [c for c in df_train.columns if c.replace('L#', '') not in cont_vars]

if standardize:
    print('\033[1mTraining data:\033[0m')

    stand_scale = standard_scale(not_stand = not_stand)
    
    stand_scale.scale(train = df_train, test = df_val)
    
    df_train_scaled = stand_scale.train_scaled
    print('\033[1mShape of df_train_scaled (after scaling):\033[0m ' + str(df_train_scaled.shape) + '.')

    # Assessing missing values (after standardizing numerical features):
    num_miss_train = df_train.isnull().sum().sum()
    num_miss_train_scaled = df_train_scaled.isnull().sum().sum()
    if num_miss_train_scaled > 0:
        print('\033[1mNumber of overall missings:\033[0m ' + str(num_miss_train_scaled) + '.')
    else:
        print('\033[1mNo missing values detected (training data)!\033[0m')

    if num_miss_train_scaled != num_miss_train:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')
    
    print('\n')
    print('\033[1mValidation data:\033[0m')
    df_val_scaled = stand_scale.test_scaled
    print('\033[1mShape of df_val_scaled (after scaling):\033[0m ' + str(df_val_scaled.shape) + '.')

    # Assessing missing values (after standardizing numerical features):
    num_miss_val = df_val.isnull().sum().sum()
    num_miss_val_scaled = df_val_scaled.isnull().sum().sum()
    if num_miss_val_scaled > 0:
        print('\033[1mNumber of overall missings:\033[0m ' + str(num_miss_val_scaled) + '.')
    else:
        print('\033[1mNo missing values detected (val data)!\033[0m')

    if num_miss_val_scaled != num_miss_val:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')
        
    print('\n')
    print('\033[1mTest data:\033[0m')
    stand_scale.scale(train = df_train, test = df_test)
    df_test_scaled = stand_scale.test_scaled
    print('\033[1mShape of df_test_scaled (after scaling):\033[0m ' + str(df_test_scaled.shape) + '.')

    # Assessing missing values (after standardizing numerical features):
    num_miss_test = df_test.isnull().sum().sum()
    num_miss_test_scaled = df_test_scaled.isnull().sum().sum()
    if num_miss_test_scaled > 0:
        print('\033[1mNumber of overall missings:\033[0m ' + str(num_miss_test_scaled) + '.')
    else:
        print('\033[1mNo missing values detected (test data)!\033[0m')

    if num_miss_test_scaled != num_miss_test:
        print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

else:
    df_train_scaled = df_train.copy()
    df_val_scaled = df_val.copy()
    df_test_scaled = df_test.copy()
    
    print('\033[1mNo transformation performed!\033[0m')

print('\n')
print('---------------------------------------------------------------------------------------------------------')
print('\n')

---------------------------------------------------------------------------------------------------------
[1mAPPLYING STANDARD SCALE TRANSFORMATION OVER NUMERICAL DATA[0m


[1mTraining data:[0m
[1mShape of df_train_scaled (after scaling):[0m (35897, 1668).
[1mNumber of overall missings:[0m 11856255.


[1mValidation data:[0m
[1mShape of df_val_scaled (after scaling):[0m (20940, 1668).
[1mNumber of overall missings:[0m 6967103.


[1mTest data:[0m
[1mShape of df_test_scaled (after scaling):[0m (21791, 1668).
[1mNumber of overall missings:[0m 7177024.


---------------------------------------------------------------------------------------------------------




#### Treating missing values

In [28]:
print('---------------------------------------------------------------------------------------------------------')
print('\033[1mTREATING MISSING VALUES\033[0m')
print('\n')

print('\033[1mTraining data:\033[0m')
num_miss_train = df_train_scaled.isnull().sum().sum()
print('\033[1mNumber of overall missing values detected before treatment:\033[0m ' +
      str(num_miss_train) + '.')

# Loop over features:
for f in df_train_scaled.drop(drop_vars, axis=1):
    # Checking if there is missing values for a given feature:
    if df_train_scaled[f].isnull().sum() > 0:
        check_missing = impute_missing(df_train_scaled[f])
        df_train_scaled[f] = check_missing['var']
        df_train_scaled['NA#' + f.replace('L#', '')] = check_missing['missing_var']

num_miss_train_treat = int(sum([sum(df_train_scaled[f]) for f in df_train_scaled.columns if 'NA#' in f]))
print('\033[1mNumber of overall missing values detected during treatment:\033[0m ' +
      str(num_miss_train_treat) + '.')

if num_miss_train_treat != num_miss_train:
    print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

if df_train_scaled.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (training data):\033[0m ' +
          str(df_train_scaled.isnull().sum().sum()) + '.')

print('\n')
print('\033[1mValidation data:\033[0m')
num_miss_val = df_val_scaled.isnull().sum().sum()
num_miss_val_treat = 0
print('\033[1mNumber of overall missing values detected before treatment:\033[0m ' + str(num_miss_val) + '.')

# Loop over features:
for f in df_val_scaled.drop(drop_vars, axis=1):
    # Check if there is dummy variable of missing value status for training data:
    if 'NA#' + f.replace('L#', '') in list(df_train_scaled.columns):
        check_missing = impute_missing(df_val_scaled[f])
        df_val_scaled[f] = check_missing['var']
        df_val_scaled['NA#' + f.replace('L#', '')] = check_missing['missing_var']
    else:
        # Checking if there are missings for variables without missings in training data:
        if df_val_scaled[f].isnull().sum() > 0:
            num_miss_val_treat += df_val_scaled[f].isnull().sum()
            df_val_scaled[f].fillna(0, axis=0, inplace=True)

num_miss_val_treat += int(sum([sum(df_val_scaled[f]) for f in df_val_scaled.columns if 'NA#' in f]))
print('\033[1mNumber of overall missing values detected during treatment:\033[0m ' +
      str(num_miss_val_treat) + '.')

if num_miss_val_treat != num_miss_val:
    print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

if df_val_scaled.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (val data):\033[0m ' +
          str(df_val_scaled.isnull().sum().sum()) + '.')
    
print('\n')
print('\033[1mTest data:\033[0m')
num_miss_test = df_test_scaled.isnull().sum().sum()
num_miss_test_treat = 0
print('\033[1mNumber of overall missing values detected before treatment:\033[0m ' + str(num_miss_test) + '.')

# Loop over features:
for f in df_test_scaled.drop(drop_vars, axis=1):
    # Check if there is dummy variable of missing value status for training data:
    if 'NA#' + f.replace('L#', '') in list(df_train_scaled.columns):
        check_missing = impute_missing(df_test_scaled[f])
        df_test_scaled[f] = check_missing['var']
        df_test_scaled['NA#' + f.replace('L#', '')] = check_missing['missing_var']
    else:
        # Checking if there are missings for variables without missings in training data:
        if df_test_scaled[f].isnull().sum() > 0:
            num_miss_test_treat += df_test_scaled[f].isnull().sum()
            df_test_scaled[f].fillna(0, axis=0, inplace=True)

num_miss_test_treat += int(sum([sum(df_test_scaled[f]) for f in df_test_scaled.columns if 'NA#' in f]))
print('\033[1mNumber of overall missing values detected during treatment:\033[0m ' +
      str(num_miss_test_treat) + '.')

if num_miss_test_treat != num_miss_test:
    print('\033[1mProblem - Inconsistent number of overall missings!\033[0m')

if df_test_scaled.isnull().sum().sum() > 0:
    print('\033[1mProblem - Number of overall missings detected (test data):\033[0m ' +
          str(df_test_scaled.isnull().sum().sum()) + '.')

print('\n')
print('---------------------------------------------------------------------------------------------------------')
print('\n')

---------------------------------------------------------------------------------------------------------
[1mTREATING MISSING VALUES[0m


[1mTraining data:[0m
[1mNumber of overall missing values detected before treatment:[0m 11856255.
[1mNumber of overall missing values detected during treatment:[0m 11856255.


[1mValidation data:[0m
[1mNumber of overall missing values detected before treatment:[0m 6967103.
[1mNumber of overall missing values detected during treatment:[0m 6967103.


[1mTest data:[0m
[1mNumber of overall missing values detected before treatment:[0m 7177024.
[1mNumber of overall missing values detected during treatment:[0m 7177024.


---------------------------------------------------------------------------------------------------------




<a id='categorical_transf'></a>

### Transforming categorical features

#### Creating dummies through one-hot encoding

In [29]:
# Create object for one-hot encoding:
categorical_transf = one_hot_encoding(categorical_features = cat_vars)

# Creating dummies:
categorical_transf.create_dummies(categorical_train = categorical_train,
                                  categorical_test = categorical_val)

# Selected dummies:
dummy_vars = list(categorical_transf.dummies_train.columns)

# Training data:
dummies_train = categorical_transf.dummies_train
dummies_train.index = df_train_scaled.index

# Validation data:
dummies_val = categorical_transf.dummies_test
dummies_val.index = df_val_scaled.index

# Create object for one-hot encoding:
categorical_transf = one_hot_encoding(categorical_features = cat_vars)

# Creating dummies:
categorical_transf.create_dummies(categorical_train = categorical_train,
                                  categorical_test = categorical_test)

# Test data:
dummies_test = categorical_transf.dummies_test
dummies_test.index = df_test_scaled.index

# Dropping original categorical features:
df_train_scaled.drop(cat_vars, axis=1, inplace=True)
df_val_scaled.drop(cat_vars, axis=1, inplace=True)
df_test_scaled.drop(cat_vars, axis=1, inplace=True)

print('\033[1mNumber of categorical features:\033[0m {}.'.format(len(categorical_transf.categorical_features)))
print('\033[1mNumber of overall selected dummies:\033[0m {}.'.format(dummies_train.shape[1]))
print('\033[1mShape of dummies_train for store ' + str(s) + ':\033[0m ' +
      str(dummies_train.shape) + '.')
print('\033[1mShape of dummies_val for store ' + str(s) + ':\033[0m ' +
      str(dummies_val.shape) + '.')
print('\033[1mShape of dummies_test for store ' + str(s) + ':\033[0m ' +
      str(dummies_test.shape) + '.')
print('\n')

dummies_train.head()

[1mNumber of categorical features:[0m 14.
[1mNumber of overall selected dummies:[0m 62.
[1mShape of dummies_train for store 6044:[0m (35897, 62).
[1mShape of dummies_val for store 6044:[0m (20940, 62).
[1mShape of dummies_test for store 6044:[0m (21791, 62).




Unnamed: 0,C#BILLINGCITY()#NA_VALUE,C#BILLINGCITY()#SAO_PAULO,C#BILLINGSTATE()#NA_VALUE,C#BILLINGSTATE()#SP,C#CREDITCARDBRAND()#AMERICAN_EXPRESS,C#CREDITCARDBRAND()#ELO/DISCOVER,C#CREDITCARDBRAND()#HIPERCARD,C#CREDITCARDBRAND()#MASTERCARD,C#CREDITCARDBRAND()#VISA,C#CREDITCARDSUBTYPE()#BLACK,...,C#SHIPPINGSTATE()#DF,C#SHIPPINGSTATE()#ES,C#SHIPPINGSTATE()#GO,C#SHIPPINGSTATE()#MG,C#SHIPPINGSTATE()#PE,C#SHIPPINGSTATE()#PR,C#SHIPPINGSTATE()#RJ,C#SHIPPINGSTATE()#RS,C#SHIPPINGSTATE()#SC,C#SHIPPINGSTATE()#SP
0,1,0,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,1
1,1,0,1,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
2,1,0,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,1,0,0,0,0,1,0,0,...,0,0,0,0,0,0,1,0,0,0
4,1,0,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


#### Concatenating all features

In [30]:
df_train_scaled = pd.concat([df_train_scaled, dummies_train], axis=1)
df_val_scaled = pd.concat([df_val_scaled, dummies_val], axis=1)
df_test_scaled = pd.concat([df_test_scaled, dummies_test], axis=1)

print('\033[1mShape of df_train_scaled for store ' + str(s) + ':\033[0m ' + str(df_train_scaled.shape) + '.')
print('\033[1mShape of df_val_scaled for store ' + str(s) + ':\033[0m ' + str(df_val_scaled.shape) + '.')
print('\033[1mShape of df_test_scaled for store ' + str(s) + ':\033[0m ' + str(df_test_scaled.shape) + '.')
print('\n')

df_train_scaled.head()

[1mShape of df_train_scaled for store 6044:[0m (35897, 2105).
[1mShape of df_val_scaled for store 6044:[0m (20940, 2105).
[1mShape of df_test_scaled for store 6044:[0m (21791, 2105).




Unnamed: 0,BUREAUBILLCITY(),BUREAUBILLSTATE(),BUREAUEMAIL(),BUREAUPHONE(),BUREAUPHONEAREACODE(),BUREAUSHIPCITY(),BUREAUSHIPSTATE(),CREDITCARDCOUNTRYSAMEASSHIPPING(),EMAILHASFRAUD(),EMAILSAMEAMOUNT(),...,C#SHIPPINGSTATE()#DF,C#SHIPPINGSTATE()#ES,C#SHIPPINGSTATE()#GO,C#SHIPPINGSTATE()#MG,C#SHIPPINGSTATE()#PE,C#SHIPPINGSTATE()#PR,C#SHIPPINGSTATE()#RJ,C#SHIPPINGSTATE()#RS,C#SHIPPINGSTATE()#SC,C#SHIPPINGSTATE()#SP
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,1
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0,0,0,0,0,0,1,0,0,0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0,0,0,0,0,0,0,0,1,0


In [31]:
# Assessing missing values (training data):
num_miss_train = df_train_scaled.isnull().sum().sum() > 0
if num_miss_train:
    print('\033[1mProblem - Number of overall missings detected (training data):\033[0m ' +
          str(df_train_scaled.isnull().sum().sum()) + '.')
    print('\n')

# Assessing missing values (validation data):
num_miss_val = df_val_scaled.isnull().sum().sum() > 0
if num_miss_val:
    print('\033[1mProblem - Number of overall missings detected (validation data):\033[0m ' +
          str(df_val_scaled.isnull().sum().sum()) + '.')
    print('\n')
    
# Assessing missing values (test data):
num_miss_test = df_test_scaled.isnull().sum().sum() > 0
if num_miss_test:
    print('\033[1mProblem - Number of overall missings detected (test data):\033[0m ' +
          str(df_test_scaled.isnull().sum().sum()) + '.')
    print('\n')

<a id='datasets_structure'></a>

### Datasets structure

In [32]:
# Checking consistency of structure between training and validation dataframes:
if len(list(df_train_scaled.columns)) != len(list(df_val_scaled.columns)):
    print('\033[1mProblem - Inconsistent number of columns between dataframes for training and validation data!\033[0m')

else:
    consistency_check = 0
    
    # Loop over variables:
    for c in list(df_train_scaled.columns):
        if list(df_train_scaled.columns).index(c) != list(df_val_scaled.columns).index(c):
            print('\033[1mProblem - Feature {0} was positioned differently in training and val validation!\033[0m'.format(c))
            consistency_check += 1
            
    # Reordering columns of val dataframe:
    if consistency_check > 0:
        ordered_columns = list(df_train_scaled.columns)
        df_val_scaled = df_val_scaled[ordered_columns]

In [33]:
# Checking consistency of structure between training and test dataframes:
if len(list(df_train_scaled.columns)) != len(list(df_test_scaled.columns)):
    print('\033[1mProblem - Inconsistent number of columns between dataframes for training and test data!\033[0m')

else:
    consistency_check = 0
    
    # Loop over variables:
    for c in list(df_train_scaled.columns):
        if list(df_train_scaled.columns).index(c) != list(df_test_scaled.columns).index(c):
            print('\033[1mProblem - Feature {0} was positioned differently in training and test dataframes!\033[0m'.format(c))
            consistency_check += 1
            
    # Reordering columns of test dataframe:
    if consistency_check > 0:
        ordered_columns = list(df_train_scaled.columns)
        df_test_scaled = df_test_scaled[ordered_columns]

<a id='basic_estimation'></a>

## Basic estimation

<a id='random_samples'></a>

### Random samples

In [34]:
np.random.seed(1)

# Dictionaries whose keys are classes and values are their shares in the entire data:
train_classes = {
    0: 1 - df_train_scaled.y.mean(),
    1: df_train_scaled.y.mean()
}

val_classes = {
    0: 1 - df_val_scaled.y.mean(),
    1: df_val_scaled.y.mean()
}

# Randomly picked indices for training and validation data:
train_sample = balanced_sample(df_train_scaled, categorical_var = 'y',
                               classes = train_classes, sample_share = 0.5)
val_sample = balanced_sample(df_val_scaled, categorical_var = 'y',
                               classes = val_classes, sample_share = 0.5)

In [35]:
# Random samples of training and validation data:
sample_train_scaled = df_train_scaled.loc[train_sample, :]
sample_val_scaled = df_val_scaled.loc[val_sample, :]

print('\033[1mTraining data:\033[0m')
print('Shape of df_train_scaled: {0}.'.format(df_train_scaled.shape))
print('Shape of sample_train_scaled: {0}.'.format(sample_train_scaled.shape))
print('Share of class 1 in the entire training data: {0}.'.format(round(df_train_scaled.y.mean(), 4)))
print('Share of class 1 in the sample of training data: {0}.'.format(round(sample_train_scaled.y.mean(), 4)))
print('\n')

print('\033[1mValidation data:\033[0m')
print('Shape of df_val_scaled: {0}.'.format(df_val_scaled.shape))
print('Shape of sample_val_scaled: {0}.'.format(sample_val_scaled.shape))
print('Share of class 1 in the entire validation data: {0}.'.format(round(df_val_scaled.y.mean(), 4)))
print('Share of class 1 in the sample of validation data: {0}.'.format(round(sample_val_scaled.y.mean(), 4)))

[1mTraining data:[0m
Shape of df_train_scaled: (35897, 2105).
Shape of sample_train_scaled: (17948, 2105).
Share of class 1 in the entire training data: 0.0153.
Share of class 1 in the sample of training data: 0.0153.


[1mValidation data:[0m
Shape of df_val_scaled: (20940, 2105).
Shape of sample_val_scaled: (10469, 2105).
Share of class 1 in the entire validation data: 0.0117.
Share of class 1 in the sample of validation data: 0.0117.


<a id='basic_model'></a>

### Basic neural network

This first estimation seeks to develop codes that should be used during more in-depth tests. Therefore, no hyper-parameter is explored here, besides the fact that the architecture is as simple as possible: just one hidden layer containing a single neuron. The cost function used is the cross-entropy function, while activations for hidden and output neurons are rectified linear unit (ReLu) and sigmoid, respectively.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

#### Model architecture and hyper-parameters

In [37]:
# Model architecture:
model_architecture = {1: {'neurons': 1,
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture

{1: {'neurons': 1, 'activation': 'relu', 'dropout_param': 0}}

In [38]:
# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = None
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Model structure

In [101]:
# Declaring the model object:
model = Sequential()

hidden_layers = [model.add(Dense(units = model_architecture[i]['neurons'],
                                 activation = model_architecture[i]['activation'],
                                 kernel_regularizer = l2(l = regul_param))) for i in model_architecture.keys()]

dropout_layers = [model.add(Dropout(rate = model_architecture[i]['dropout_param'])) for i in
                  model_architecture.keys()]

# Dropout for the input layer:
model.add(Dropout(input_dropout, input_shape=(X_train.shape[1],)))

# Hidden layers with dropout:
for layer, dropout_layer in zip(hidden_layers, dropout_layers):
    layer
    dropout_layer

# Final layer with one neuron:
model.add(Dense(units = 1, activation = output_activation))

# Compiling the model to prepare it to be fitted:
if default_adam:
    model.compile(loss = cost_function, optimizer = 'adam')

else:
    model.compile(loss = cost_function, optimizer = opt)

#### Model estimation

In [102]:
estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

model.fit(x = X_train, 
          y = y_train,
          validation_data = (X_val, y_val),
          epochs = num_epochs,
          batch_size = batch_size,
          shuffle = False,
          callbacks=None,
          verbose = 1
          )

# Assessing running time:
nn_end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + nn_start_time.strftime('%Y-%m-%d') + ', ' + nn_start_time.strftime('%H:%M:%S'))
print('End time: ' + nn_end_time.strftime('%Y-%m-%d') + ', ' + nn_end_time.strftime('%H:%M:%S'))
print('\n')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
------------------------------------
[1mOverall running time:[0m 0.13 minutes.
Start time: 2020-12-22, 13:11:24
End time: 2020-12-22, 13:11:32




#### Cost function by training epoch

In [105]:
model_costs = pd.DataFrame(model.history.history)
model_costs['epoch'] = [i + 1 for i in model_costs.index]
model_costs.head()

Unnamed: 0,loss,val_loss,epoch
0,0.132033,0.52125,1
1,0.403645,0.311269,2
2,0.266714,0.222395,3
3,0.200348,0.171816,4
4,0.16107,0.140236,5


In [42]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': False}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.loss, name='Training cost',
               hovertemplate =
                'loss = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.val_loss, name='Validation cost',
               hovertemplate = 'val_loss = %{y:.4f}',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=False,
)

# Changing layout:
fig.update_layout(
    title_text='Cost function by epoch of training',
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='cost', secondary_y=False)

fig.show()

#### Performance metrics on validation data

In [43]:
val_roc_auc = roc_auc_score(y_val, [p[0] for p in model.predict(X_val)])
val_avg_prec_score = average_precision_score(y_val, [p[0] for p in model.predict(X_val)])
val_brier_score = brier_score_loss(y_val, [p[0] for p in model.predict(X_val)])

print('\033[1mPerformance metrics:\033[0m')
print('Test ROC-AUC: {0}.'.format(round(val_roc_auc, 4)))
print('Test average precision score: {0}.'.format(round(val_avg_prec_score, 4)))
print('Test Brier score: {0}.'.format(round(val_brier_score, 4)))

[1mPerformance metrics:[0m
Test ROC-AUC: 0.5518.
Test average precision score: 0.0836.
Test Brier score: 0.0134.


In [44]:
# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam
    },
    'performance_metrics': {
        'application': 'validation',
        'min_cost': model_costs.loss.min(),
        'epoch_min_cost': model_costs.loss.idxmin() + 1,
        'min_cost': model_costs.val_loss.min(),
        'epoch_min_cost': model_costs.val_loss.idxmin() + 1,
        'roc_auc': val_roc_auc,
        'avg_prec_score': val_avg_prec_score,
        'brier_score': val_brier_score
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": "Basic estimation for codes development."
}

In [45]:
if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

<a id='var_assessment'></a>

### Variability assessment

In order to assess how uncertain are performance metrics from neural networks applied to this empirical context, loops of estimations will be executed, so averaging of results is able to provide an estimate for standard deviation of performance metrics evaluated on validation data.
<br>
<br>
In addition to the development of codes for estimation loops, another outcome from this section will be a class $keras\_nn$ for neural network estimations based on Tensorflow and Keras funcions and classes.

In [46]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

#### Setting

In [57]:
# Number of estimations:
n_estimations = 1000

# Model architecture:
model_architecture = {1: {'neurons': 1,
                          'activation': 'relu',
                          'dropout_param': 0}}

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = None
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Estimation loop

In [58]:
val_roc_auc = []
val_avg_prec_score = []
val_brier_score = []

In [59]:
estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

bar = progressbar.ProgressBar(maxval=n_estimations,
                              widgets=['\033[1mEstimation progress:\033[0m ',
                              progressbar.Bar('-', '[', ']'), ' ',
                              progressbar.Percentage()])

bar.start()

# Loop over estimations:
for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam,
                     regul_param = regul_param, input_dropout = input_dropout)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              verbose = 0)
    
    # Performance metrics on validation data:
    val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
    val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
    val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
    
    bar.update(t+1)
    sleep(0.01)

# Assessing running time:
nn_end_time = datetime.now()

print('\033[1mNumber of estimations:\033[0m {0}.'.format(n_estimations))
print('\n')

print('\033[1mValidation ROC-AUC:\033[0m')
print('Average: {0}.'.format(round(np.nanmean(val_roc_auc), 4)))
print('Standard deviation : {0}.'.format(round(np.nanstd(val_roc_auc), 4)))
print('\n')

print('\033[1mValidation average precision score:\033[0m')
print('Average: {0}.'.format(round(np.nanmean(val_avg_prec_score), 4)))
print('Standard deviation : {0}.'.format(round(np.nanstd(val_avg_prec_score), 4)))
print('\n')

print('\033[1mValidation Brier score:\033[0m')
print('Average: {0}.'.format(round(np.nanmean(val_brier_score), 4)))
print('Standard deviation : {0}.'.format(round(np.nanstd(val_brier_score), 4)))
print('\n')

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + nn_start_time.strftime('%Y-%m-%d') + ', ' + nn_start_time.strftime('%H:%M:%S'))
print('End time: ' + nn_end_time.strftime('%Y-%m-%d') + ', ' + nn_end_time.strftime('%H:%M:%S'))
print('\n')

[1mEstimation progress:[0m [---------------------------------------------------] 100%

[1mNumber of estimations:[0m 1000.


[1mValidation ROC-AUC:[0m
Average: 0.6328.
Standard deviation : 0.1292.


[1mValidation average precision score:[0m
Average: 0.1003.
Standard deviation : 0.0553.


[1mValidation Brier score:[0m
Average: 0.0128.
Standard deviation : 0.0014.


------------------------------------
[1mOverall running time:[0m 100.05 minutes.
Start time: 2020-12-22, 08:57:32
End time: 2020-12-22, 10:37:36




In [60]:
# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'validation',
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std_roc_auc': np.nanstd(val_roc_auc),
        'std_avg_prec_score': np.nanstd(val_avg_prec_score),
        'std_brier_score': np.nanstd(val_brier_score)
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": "Estimation loop for variability assessment."
}

In [62]:
if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

<a id='architecture'></a>

## Architecture definition

This section will try on several distinct designs for the model architecture, using different values for their parameters. The setting for these tests is given by:
* Random samples of training and validation data.
* Cross-entropy cost function, rectified linear unit and sigmoid activation functions for hidden and output neurons, respectively.
* Default mini-batch size (from Tensorflow and Keras) and 10 epochs of training.
* Adam optimizer for model estimation (non-fixed and parameter-specific learning rates).
* No regularization, no dropout and no early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

First, only neurons for a single hidden layer will be defined. The designs to be explored are the following, where $J_1$ is the number of neurons in the unique hidden layer:
1. Number of neurons between number of inputs and number of outputs:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \alpha*(num\_inputs + num\_outputs)
\end{equation}
<br>
For $\alpha \in \{0.1, 0.2, ..., 0.9\}$.
<br>
<br>
2. Rule-of-thumb considering the number of observations:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \frac{num\_obs}{\alpha*(num\_inputs + num\_outputs)}
\end{equation}
<br>
Where $\alpha \in \{2, 3, ..., 10\}$.
<br>
<br>
3. Concave functions of the product between number of inputs and number of outputs:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \sqrt{num\_inputs*num\_outputs}
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \log(num\_inputs*num\_outputs)
\end{equation}

Once $J_1$ has been defined, further hidden layers will be tested, and the number of their neurons will follow the same strategy implemented for the initial single hidden layer.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='neurons_single_hidden_layer'></a>

### Neurons for a single hidden layer

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = None
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of model architectures

In [38]:
# One hidden layer and J1 between number of inputs and number of outputs:
architectures = [{1: {'neurons': int(np.floor((X_train.shape[1] + 1)*q)),
                      'activation': 'relu',
                      'dropout_param': 0}} for q in [(i + 1)/10 for i in range(10) if (i + 1)/10 < 1]]

architectures_def = ['One hidden layer: J1 = (num_inputs + num_outputs)*{0}'.format(q) for q in
                     [(i + 1)/10 for i in range(10) if (i + 1)/10 < 1]]

In [39]:
# Alternative rule-of-thumb for the J1 in the hidden layer:
for q in [2, 4, 6]:
    if int(np.floor(X_train.shape[0]/(q*(X_train.shape[1] + 1)))) > 0:
        architectures.append({1: {'neurons': int(np.floor(X_train.shape[0]/(q*(X_train.shape[1] + 1)))),
                                  'activation': 'relu',
                                  'dropout_param': 0}})
        architectures_def.append('One hidden layer: J1 = num_obs/({0}*(num_inputs + num_outputs))'.format(q))

# Squared-root of the product between the number of inputs and the number of outputs:
architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('One hidden layer: J1 = sqrt(num_inputs*num_outputs)')

# Natural logarithm of the product between the number of inputs and the number of outputs:
architectures.append({1: {'neurons': int(np.log(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('One hidden layer: J1 = log(num_inputs*num_outputs)')

In [40]:
architectures

[{1: {'neurons': 209, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 419, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 629, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 839, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1049, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1258, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1468, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1678, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1888, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 4, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 2, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 1, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 7, 'activation': 'relu', 'dropout_param': 0}}]

#### Estimation loop

In [69]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(architectures),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over architectures:
for a in range(len(architectures)):
# indices = [12,13]
# for a in indices:
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = architectures[a], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(architectures[a]),
            'num_hidden_neurons': [architectures[a][l]['neurons'] for l in architectures[a].keys()],
            'hidden_activations': [architectures[a][l]['activation'] for l in architectures[a].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [architectures[a][l]['dropout_param'] for l in architectures[a].keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": architectures_def[a]
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(a+1)
#     test_bar.update(indices.index(a)+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 23.5 minutes.
Start time: 2020-12-26, 20:23:46
End time: 2020-12-26, 20:47:16




#### Assessing results

In [59]:
estimation_ids = []
archs = []
num_neurons = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over architectures with a single hidden layer:
for e in [model_assessment[e] for e in model_assessment.keys() if
          'Testing architectures. One hidden layer' in model_assessment[e]['comment']]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split('One hidden layer: ')[1].replace(' ', ''))
    num_neurons.append(e['architecture']['num_hidden_neurons'][0])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by architecture with a single hidden layer:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_neurons': num_neurons,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_neurons,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
12,1609025026,J1=sqrt(num_inputs*num_outputs),45,0.860571,0.015237,0.240236,0.031074,56.47962,7.731185,12.37
0,1608940658,J1=(num_inputs+num_outputs)*0.1,209,0.850727,0.012944,0.224068,0.035353,65.721111,6.337948,28.73
1,1608942382,J1=(num_inputs+num_outputs)*0.2,419,0.835747,0.017298,0.216742,0.029311,48.314574,7.394535,50.73
2,1608945427,J1=(num_inputs+num_outputs)*0.3,629,0.832776,0.021014,0.19637,0.034508,39.630446,5.690637,67.28
4,1608955195,J1=(num_inputs+num_outputs)*0.5,1049,0.830936,0.019053,0.159038,0.04061,43.611771,3.916244,117.35
6,1608970776,J1=(num_inputs+num_outputs)*0.7,1468,0.829818,0.021406,0.162789,0.026689,38.765363,6.099423,165.97
3,1608949465,J1=(num_inputs+num_outputs)*0.4,839,0.828557,0.019698,0.183143,0.034401,42.062456,5.32373,80.25
5,1608962237,J1=(num_inputs+num_outputs)*0.6,1258,0.828069,0.023164,0.160903,0.03148,35.748432,5.111314,142.3
7,1608990410,J1=(num_inputs+num_outputs)*0.8,1678,0.827038,0.017709,0.16242,0.025709,46.701436,6.317679,184.35
8,1609001472,J1=(num_inputs+num_outputs)*0.9,1888,0.825239,0.018819,0.156171,0.030072,43.851261,5.193195,211.15


In [38]:
print('\033[1mBest architecture based on average ROC_AUC:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.avg_roc_auc.idxmax(), :].architecture))
print('\033[1mBest architecture based on std ROC_AUC:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.std_roc_auc.idxmin(), :].architecture))
print('\n')

print('\033[1mBest architecture based on avg precision score:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.avg_prec.idxmax(), :].architecture))
print('\033[1mBest architecture based on avg precision score:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.std_prec.idxmin(), :].architecture))
print('\n')

print('\033[1mBest architecture based on ratio between avg and std ROC-AUC:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.ratio_roc_auc.idxmax(), :].architecture))
print('\033[1mBest architecture based on ratio between avg and std ROC-AUC:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.ratio_prec.idxmax(), :].architecture))
print('\n')

print('\033[1mBest architecture based on running time:\033[0m\n {0}.'.format(
    metrics.iloc[metrics.running_time.idxmin(), :].architecture))

[1mBest architecture based on average ROC_AUC:[0m
 Numberofneurons=sqrt(num_inputs*num_outputs).
[1mBest architecture based on std ROC_AUC:[0m
 Numberofneurons=(num_inputs+num_outputs)*0.1.


[1mBest architecture based on avg precision score:[0m
 Numberofneurons=sqrt(num_inputs*num_outputs).
[1mBest architecture based on avg precision score:[0m
 Numberofneurons=(num_inputs+num_outputs)*0.8.


[1mBest architecture based on ratio between avg and std ROC-AUC:[0m
 Numberofneurons=(num_inputs+num_outputs)*0.1.
[1mBest architecture based on ratio between avg and std ROC-AUC:[0m
 Numberofneurons=sqrt(num_inputs*num_outputs).


[1mBest architecture based on running time:[0m
 Numberofneurons=num_obs/(6*(num_inputs+num_outputs)).


<a id='neurons_two_hidden_layers'></a>

### Neurons for two hidden layers

The assessment of results from above has shown that the following rules for defining the number of neurons lead to the best expected outcomes:
<br>
<br>
\begin{equation}
    \displaystyle J = \sqrt{num\_inputs*num\_outputs}
\end{equation}
<br>
Which has the highest average ROC-AUC and average precision score with a reasonable running time. Also:
<br>
<br>
\begin{equation}
    \displaystyle J = 0.1*(num\_inputs + num\_outputs)
\end{equation}
<br>
Has the second highest average ROC-AUC and average precision score, besides of the lowest standard deviation of ROC-AUC. Consequently, in order to assess impacts on outcomes of the addition of a second hidden layer, two different architectures will be tested:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \sqrt{num\_inputs*J_2}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = \sqrt{J_1*num\_outputs}
\end{equation}
<br>
Using both equations:
\begin{equation}
    \displaystyle J_1 = num\_inputs^{2/3}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = num\_inputs^{1/3}
\end{equation}
<br>
<br>
And the alternative:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = (num\_inputs + J_2)*0.1
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = (J_1 + num\_outputs)*0.1
\end{equation}
<br>
Using both equations:
\begin{equation}
    \displaystyle J_1 = \frac{0.1*num\_inputs + 0.01*num\_outputs}{0.99}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = (J_1 + num\_outputs)*0.1
\end{equation}
<br>
<br>
In addition to these, alternatives that do not define simultaneously $J_1$ and $J_2$ are also going to be tested. While $J_1$ follows picked definitions, $J_2$ is merely given by the following options: $J_2 = J_1/2$ or $J_2 = J_1$.

#### Setting

In [38]:
# Number of estimations:
n_estimations = 100

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = None
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of model architectures

In [39]:
# Two hidden layers and number of neurons between number of inputs and number of outputs:
architectures = [{1: {'neurons': int(((0.1*X_train.shape[1]) + (0.01*1))/0.99),
                          'activation': 'relu',
                          'dropout_param': 0},
                  2: {'neurons': int((int(((0.1*X_train.shape[1]) + (0.01*1))/0.99) + 1)*0.1),
                          'activation': 'relu',
                          'dropout_param': 0}}]
architectures_def = ['Two hidden layers: J1 = (num_inputs + J2)*0.1, J2 = (J1 + num_outputs)*0.1']

In [40]:
# Squared-root of the product between the number of inputs and the number of outputs:
architectures.append({1: {'neurons': int(X_train.shape[1]**(2/3)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(X_train.shape[1]**(1/3)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*J2), J2 = sqrt(J1 + num_outputs)')

In [41]:
# Alternative presentations of the above architectures:
architectures.append({1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1/2)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Two hidden layers: J1 = (num_inputs + num_outputs)*0.1, J2 = J1/2')

architectures.append({1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Two hidden layers: J1 = (num_inputs + num_outputs)*0.1, J2 = J1')

architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2')

architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1')

In [42]:
architectures

[{1: {'neurons': 211, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 21, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 163, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 12, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 209, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 104, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 209, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 209, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0}}]

#### Estimation loop

In [44]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(architectures),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over architectures:
for a in range(len(architectures)):
# indices = [3, 5]
# for a in indices:
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = architectures[a], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(architectures[a]),
            'num_hidden_neurons': [architectures[a][l]['neurons'] for l in architectures[a].keys()],
            'hidden_activations': [architectures[a][l]['activation'] for l in architectures[a].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [architectures[a][l]['dropout_param'] for l in architectures[a].keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg__roc_auc': np.nanmean(val_roc_auc),
            'avg__avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg__brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": architectures_def[a]
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(a+1)
#     test_bar.update(indices.index(a)+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 45.08 minutes.
Start time: 2021-01-02, 15:40:48
End time: 2021-01-02, 16:25:53




#### Assessing results

In [60]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over architectures:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing architectures. One hidden layer' in model_assessment[e]['comment']) |
          ('Testing architectures. Two hidden layers' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by architecture with two hidden layers:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
17,1609598769,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2",2,"[45, 22]",0.868335,0.013726,0.262224,0.032911,63.260803,7.967559,13.4
12,1609025026,J1 = sqrt(num_inputs*num_outputs),1,[45],0.860571,0.015237,0.240236,0.031074,56.47962,7.731185,12.37
19,1609614745,"J1 = sqrt(num_inputs*num_outputs), J2 = J1",2,"[45, 45]",0.851741,0.020639,0.21602,0.02194,41.268569,9.845743,13.47
0,1608940658,J1 = (num_inputs + num_outputs)*0.1,1,[209],0.850727,0.012944,0.224068,0.035353,65.721111,6.337948,28.73
1,1608942382,J1 = (num_inputs + num_outputs)*0.2,1,[419],0.835747,0.017298,0.216742,0.029311,48.314574,7.394535,50.73
2,1608945427,J1 = (num_inputs + num_outputs)*0.3,1,[629],0.832776,0.021014,0.19637,0.034508,39.630446,5.690637,67.28
4,1608955195,J1 = (num_inputs + num_outputs)*0.5,1,[1049],0.830936,0.019053,0.159038,0.04061,43.611771,3.916244,117.35
6,1608970776,J1 = (num_inputs + num_outputs)*0.7,1,[1468],0.829818,0.021406,0.162789,0.026689,38.765363,6.099423,165.97
3,1608949465,J1 = (num_inputs + num_outputs)*0.4,1,[839],0.828557,0.019698,0.183143,0.034401,42.062456,5.32373,80.25
5,1608962237,J1 = (num_inputs + num_outputs)*0.6,1,[1258],0.828069,0.023164,0.160903,0.03148,35.748432,5.111314,142.3


<a id='neurons_three_hidden_layers'></a>

### Neurons for three hidden layers

The second assessment of results has shown that the following rule for defining the number of neurons using two hidden layers imply the best results:
<br>
<br>
\begin{equation}
    \displaystyle J = \sqrt{num\_inputs*num\_outputs}
\end{equation}
<br>
The best specification so far applies this rule for neurons ($J_1$) in the first hidden layer, while the number of neurons in the second hidden layer is given by $J_2 = J_1/2$. However, another formulations will also be tested here. So, these are the architectures for experiments:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \sqrt{num\_inputs*J_2}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = \sqrt{J_1*J_3}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_3 = \sqrt{J_2*num\_outputs}
\end{equation}
<br>
<br>
Solving the system:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = num\_inputs^{3/4}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = num\_inputs^{1/2}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_3 = num\_inputs^{1/4}
\end{equation}
<br>
<br>
Additionally, the following specifications will also be used:
<br>
<br>
\begin{equation}
    \displaystyle J_1 = \sqrt{num\_inputs*num\_outputs}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = J_1/2
\end{equation}
<br>
\begin{equation}
    \displaystyle J_3 = J_1/4
\end{equation}
<br>
<br>
And:
\begin{equation}
    \displaystyle J_1 = \sqrt{num\_inputs*num\_outputs}
\end{equation}
<br>
\begin{equation}
    \displaystyle J_2 = J_1
\end{equation}
<br>
\begin{equation}
    \displaystyle J_3 = J_1
\end{equation}

#### Setting

In [39]:
# Number of estimations:
n_estimations = 100

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = None
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of model architectures

In [40]:
# Squared-root of the product between the number of inputs and the number of outputs:
architectures = [{1: {'neurons': int(X_train.shape[1]**(3/4)),
                      'activation': 'relu',
                      'dropout_param': 0},
                  2: {'neurons': int(X_train.shape[1]**(1/2)),
                      'activation': 'relu',
                      'dropout_param': 0},
                  3: {'neurons': int(X_train.shape[1]**(1/4)),
                      'activation': 'relu',
                      'dropout_param': 0}}]
architectures_def = ['Three hidden layers: J1 = sqrt(num_inputs*J2), J2 = sqrt(J1 + J3), J3 = sqrt(J2 + num_outputs)']

In [41]:
# Alternative presentations of the above architectures:
architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0},
                      3: {'neurons': int(np.sqrt(X_train.shape[1]*1)/4),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Three hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2, J3 = J1/4')

architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      3: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0}})
architectures_def.append('Three hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1, J3 = J1')

In [42]:
architectures

[{1: {'neurons': 309, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  3: {'neurons': 6, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'relu', 'dropout_param': 0},
  3: {'neurons': 11, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  3: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0}}]

#### Estimation loop

In [44]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(architectures_def),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over architectures:
for a in range(len(architectures)):
# indices = [3, 5]
# for a in indices:
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = architectures[a], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(architectures[a]),
            'num_hidden_neurons': [architectures[a][l]['neurons'] for l in architectures[a].keys()],
            'hidden_activations': [architectures[a][l]['activation'] for l in architectures[a].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [architectures[a][l]['dropout_param'] for l in architectures[a].keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg__roc_auc': np.nanmean(val_roc_auc),
            'avg__avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg__brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": architectures_def[a]
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(a+1)
#     test_bar.update(indices.index(a)+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 74.62 minutes.
Start time: 2021-01-04, 23:07:22
End time: 2021-01-05, 00:21:59




#### Assessing results

In [61]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over architectures:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing architectures. One hidden layer' in model_assessment[e]['comment']) |
          ('Testing architectures. Two hidden layers' in model_assessment[e]['comment']) |
          ('Testing architectures. Three hidden layers' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by architecture with three hidden layers:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
17,1609598769,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2",2,"[45, 22]",0.868335,0.013726,0.262224,0.032911,63.260803,7.967559,13.4
12,1609025026,J1 = sqrt(num_inputs*num_outputs),1,[45],0.860571,0.015237,0.240236,0.031074,56.47962,7.731185,12.37
19,1609614745,"J1 = sqrt(num_inputs*num_outputs), J2 = J1",2,"[45, 45]",0.851741,0.020639,0.21602,0.02194,41.268569,9.845743,13.47
0,1608940658,J1 = (num_inputs + num_outputs)*0.1,1,[209],0.850727,0.012944,0.224068,0.035353,65.721111,6.337948,28.73
21,1609815126,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2, ...",3,"[45, 22, 11]",0.844691,0.022947,0.197475,0.038794,36.809745,5.090319,14.75
1,1608942382,J1 = (num_inputs + num_outputs)*0.2,1,[419],0.835747,0.017298,0.216742,0.029311,48.314574,7.394535,50.73
2,1608945427,J1 = (num_inputs + num_outputs)*0.3,1,[629],0.832776,0.021014,0.19637,0.034508,39.630446,5.690637,67.28
22,1609816012,"J1 = sqrt(num_inputs*num_outputs), J2 = J1, J3...",3,"[45, 45, 45]",0.831545,0.0406,0.170068,0.043264,20.481481,3.930913,15.12
4,1608955195,J1 = (num_inputs + num_outputs)*0.5,1,[1049],0.830936,0.019053,0.159038,0.04061,43.611771,3.916244,117.35
6,1608970776,J1 = (num_inputs + num_outputs)*0.7,1,[1468],0.829818,0.021406,0.162789,0.026689,38.765363,6.099423,165.97


<a id='fitting_params'></a>

## Fitting hyper-parameters

After a first inquirement into architecture definition, this section will try on several distinct values for fitting hyper-parameters: mini-batch size ($S$) and number of epochs ($T$). The setting for these tests is given by:
* Random samples of training and validation data.
* Cross-entropy cost function, rectified linear unit and sigmoid activation functions for hidden and output neurons, respectively.
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* Adam optimizer for model estimation (non-fixed and parameter-specific learning rates).
* No regularization, no dropout and no early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.

**Mini-batch size**

The strategy for experiments involves two tests:
1. **Grid of mini-batch sizes:** a grid of different values will be sequentially tested, where all numbers are powers of 2: $S \in [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]$. At each iteration, a collection of 100 estimations will be executed, so average and standard deviation of performance metrics can be stored, in addition to running time.
2. **Velocity approach to mini-batch size definition**: for different values, validation performance will be plotted against aggregated running time at each epoch of training. The best mini-batch size $S^*$ according to this approach is that one which achieves its highest improvement in validation performance at the minimum running time. Only 1 estimation will take place here.

**Number of epochs**

Early stopping will be used during final estimation where performance metrics are to be assessed using test data. Here, a study will help to understand which values may be appropriate for the number of epochs $T$, as a relatively large number of epochs, $T = 500$, are set for a single estimation. Then, model costs and performance metrics will be plotted against epoch.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='grid_mini_batch_sizes'></a>

### Grid of mini-batch sizes

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of values

In [38]:
mini_batch_sizes = [2**i for i in range(1, 11)]
mini_batch_sizes

[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

#### Estimation loop

In [39]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(mini_batch_sizes),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over mini-batch sizes:
for b in range(len(mini_batch_sizes)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = mini_batch_sizes[b],
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': mini_batch_sizes[b],
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing values for mini-batch size using powers of 2.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(b+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 237.02 minutes.
Start time: 2021-01-06, 20:14:43
End time: 2021-01-07, 00:11:44




#### Assessing results

In [62]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over mini-batch sizes:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing values for mini-batch size using powers of 2.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by mini-batch size:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'batch_sizes': batch_sizes,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,batch_sizes,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
8,1609988509,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,0.825623,0.011539,0.217563,0.015804,71.549261,13.76669,5.2
9,1609988822,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",1024,0.825585,0.011405,0.224429,0.016805,72.387168,13.355286,4.7
7,1609988133,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",256,0.823035,0.013996,0.202985,0.015361,58.803477,13.214244,6.27
4,1609986526,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",32,0.816262,0.019307,0.183668,0.02847,42.278426,6.451349,11.4
5,1609987211,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",64,0.812532,0.013973,0.184694,0.01825,58.149015,10.120042,8.35
6,1609987713,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",128,0.811479,0.018083,0.194542,0.017772,44.875439,10.946804,6.98
3,1609985481,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",16,0.786144,0.026773,0.125664,0.039055,29.363367,3.217615,17.42
2,1609983695,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",8,0.742617,0.049855,0.066218,0.024114,14.895576,2.746016,29.75
1,1609980669,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",4,0.737145,0.057025,0.053312,0.024682,12.926707,2.160003,50.43
0,1609974883,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",2,0.719575,0.053166,0.038943,0.01523,13.534499,2.556971,96.42


<a id='velocity_approach'></a>

### Velocity approach to mini-batch size definition

#### Setting

In [37]:
# Number of estimations:
n_estimations = 1

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of values

In [38]:
mini_batch_sizes = [2**i for i in range(1, 11)]
mini_batch_sizes

[2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

#### Estimation

In [39]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(mini_batch_sizes),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over mini-batch sizes:
for b in range(len(mini_batch_sizes)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []
    epoch_performance = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = mini_batch_sizes[b],
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)
        
        # Running time and performance metrics on validation data by epoch of training:
        epoch_performance.append(model.epoch_performance)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': mini_batch_sizes[b],
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score),
            'avg_epoch_performance': {
                'roc_auc': [sum(l)/len(l) for l in zip(*[d['epochroc_auc'] for d in epoch_performance])],
                'avg_prec_score': [sum(l)/len(l) for l in zip(*[d['epochavg_prec_score'] for d in
                                                                epoch_performance])],
                'brier_score': [sum(l)/len(l) for l in zip(*[d['epochbrier_score'] for d in
                                                             epoch_performance])],
                'running_time': [sum(l)/len(l) for l in zip(*[d['running_time'] for d in epoch_performance])]
            }
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Velocity approach to mini-batch size definition.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(b+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 4.72 minutes.
Start time: 2021-01-10, 17:15:29
End time: 2021-01-10, 17:20:13




#### Assessing results

In [81]:
estimation_ids = []
batch_sizes = []
time_max_roc_auc = []
time_max_avg_prec = []
time_max_diff_roc_auc = []
time_max_diff_avg_prec = []

# Loop over mini-batch sizes:
for e in [model_assessment[e] for e in model_assessment.keys() if ('Velocity' in model_assessment[e]['comment'])]:
    agg_times = list(np.cumsum(e['performance_metrics']['avg_epoch_performance']['running_time']))
    roc_auc = e['performance_metrics']['avg_epoch_performance']['roc_auc']
    avg_prec = e['performance_metrics']['avg_epoch_performance']['avg_prec_score']
    diff_roc_auc = list(np.diff(roc_auc, prepend=np.NaN))
    diff_avg_prec = list(np.diff(avg_prec, prepend=np.NaN))
    
    max_roc_auc = np.nanmax(roc_auc)
    max_avg_prec = np.nanmax(avg_prec)
    max_diff_roc_auc = np.nanmax(diff_roc_auc)
    max_diff_avg_prec = np.nanmax(diff_avg_prec)

    estimation_ids.append(e)
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    time_max_roc_auc.append(agg_times[roc_auc.index(max_roc_auc)])
    time_max_avg_prec.append(agg_times[avg_prec.index(max_avg_prec)])
    time_max_diff_roc_auc.append(agg_times[diff_roc_auc.index(max_diff_roc_auc)])
    time_max_diff_avg_prec.append(agg_times[diff_avg_prec.index(max_diff_avg_prec)])
    
# Dataframe with performance metrics by mini-batch size:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'batch_sizes': batch_sizes,
    'time_max_roc_auc': time_max_roc_auc,
    'time_max_avg_prec': time_max_avg_prec,
    'time_max_diff_roc_auc': time_max_diff_roc_auc,
    'time_max_diff_avg_prec': time_max_diff_avg_prec
})

metrics.sort_values('time_max_diff_roc_auc', ascending=True)

Unnamed: 0,estimation_id,batch_sizes,time_max_roc_auc,time_max_avg_prec,time_max_diff_roc_auc,time_max_diff_avg_prec
9,"{'architecture': {'num_hidden_layers': 2, 'num...",1024,2.787965,2.787965,0.892296,0.892296
8,"{'architecture': {'num_hidden_layers': 2, 'num...",512,3.081902,3.081902,0.998291,1.526358
7,"{'architecture': {'num_hidden_layers': 2, 'num...",256,6.944381,6.944381,1.018974,1.891033
5,"{'architecture': {'num_hidden_layers': 2, 'num...",64,7.641162,7.641162,2.115796,3.771834
6,"{'architecture': {'num_hidden_layers': 2, 'num...",128,7.330622,7.330622,2.779716,4.328759
4,"{'architecture': {'num_hidden_layers': 2, 'num...",32,8.348578,8.348578,4.049817,5.85988
2,"{'architecture': {'num_hidden_layers': 2, 'num...",8,20.819293,20.819293,4.92175,4.92175
1,"{'architecture': {'num_hidden_layers': 2, 'num...",4,32.816411,21.917382,6.840339,6.840339
3,"{'architecture': {'num_hidden_layers': 2, 'num...",16,11.863147,10.700609,8.165246,10.700609
0,"{'architecture': {'num_hidden_layers': 2, 'num...",2,40.473972,61.520196,12.64006,61.520196


<a id='number_epochs'></a>

### Number of epochs

*Results from tests for mini-batch size have indicated $S = 512$ as an appropriate value for this current learning task.* Besides, all further [settings](#fitting_params)<a href='#fitting_params'></a> previously discussed also apply here.

#### Setting

In [63]:
# Number of estimations:
n_estimations = 1

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 500
es_param = None
batch_size = 512
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Estimation

In [64]:
start_time = datetime.now()

estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

# Lists to store results:
epoch_costs = []
min_cost = []
epoch_min_cost = []
min_cost = []
epoch_min_cost = []
val_roc_auc = []
val_avg_prec_score = []
val_brier_score = []
epoch_performance = []

# Loop over estimations:
for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam,
                     regul_param = regul_param, input_dropout = input_dropout)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              verbose = 0)

    # Performance metrics on validation data:
    val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
    val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
    val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))

    # Cost function by training epoch:
    model_costs = model.model_costs
    epoch_costs.append({'epoch': list(model_costs['epoch']),
                        'loss': list(model_costs['loss']),
                        'val_loss': list(model_costs['val_loss'])})

    min_cost.append(model_costs.loss.min())
    epoch_min_cost.append(model_costs.loss.idxmin() + 1)
    min_cost.append(model_costs.val_loss.min())
    epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Running time and performance metrics on validation data by epoch of training:
    epoch_performance.append(model.epoch_performance)

# Assessing running time:
nn_end_time = datetime.now()

# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'validation',
        'epoch_costs': epoch_costs,
        'avg_epoch_costs': {
            'epoch': [sum(l)/len(l) for l in zip(*[d['epoch'] for d in epoch_costs])],
            'loss': [sum(l)/len(l) for l in zip(*[d['loss'] for d in epoch_costs])],
            'val_loss': [sum(l)/len(l) for l in zip(*[d['val_loss'] for d in epoch_costs])]
        },
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std__roc_auc': np.nanstd(val_roc_auc),
        'std__avg_prec_score': np.nanstd(val_avg_prec_score),
        'std__brier_score': np.nanstd(val_brier_score),
        'avg_epoch_performance': {
            'roc_auc': [sum(l)/len(l) for l in zip(*[d['epochroc_auc'] for d in epoch_performance])],
            'avg_prec_score': [sum(l)/len(l) for l in zip(*[d['epochavg_prec_score'] for d in
                                                            epoch_performance])],
            'brier_score': [sum(l)/len(l) for l in zip(*[d['epochbrier_score'] for d in
                                                         epoch_performance])],
            'running_time': [sum(l)/len(l) for l in zip(*[d['running_time'] for d in epoch_performance])]
        }
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": '{0}. Assessing number of epochs by using a large value for it.'.format(model_architecture_def)
}

if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

------------------------------------
[1mOverall running time:[0m 12.07 minutes.
Start time: 2021-01-12, 12:26:38
End time: 2021-01-12, 12:38:42




#### Assessing results

In [19]:
outcomes = [model_assessment[e] for e in model_assessment.keys() if
            'Assessing number of epochs' in model_assessment[e]['comment']][0]

# Cost function by training epoch:
model_costs = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'loss': outcomes['performance_metrics']['avg_epoch_costs']['loss'],
    'val_loss': outcomes['performance_metrics']['avg_epoch_costs']['val_loss']
})

epoch_performances = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'roc_auc': outcomes['performance_metrics']['avg_epoch_performance']['roc_auc'],
    'avg_prec_score': outcomes['performance_metrics']['avg_epoch_performance']['avg_prec_score']
})

In [20]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': False}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.loss, name='Training cost',
               hovertemplate =
                'loss = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.val_loss, name='Validation cost',
               hovertemplate = 'val_loss = %{y:.4f}',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=False,
)

# Changing layout:
fig.update_layout(
    title_text='Cost function by epoch of training',
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='cost', secondary_y=False)

fig.show()

In [21]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': True}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.roc_auc, name='Val ROC-AUC',
               hovertemplate =
                'ROC-AUC = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.avg_prec_score, name='Val avg precision',
               hovertemplate = 'Avg precision = %{y:.4f}<br>'+
                               'epoch = %{x}<br>',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=True,
)

# Changing layout:
fig.update_layout(
    title_text='Validation performance by epoch of training',
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='performance', secondary_y=False)

fig.show()

<a id='functions'></a>

## Functions

In this section, tests will explore alternative **activation functions for hidden neurons** and options for **cost function**. Until now, only rectified linear units (ReLU) were considered for hidden layers together with cross-entropy cost function. First, different cost functions will be tried on. These are the alternatives covered here:
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Binary cross-entropy: } C(y, a^L(x)) = -\frac{1}{N}\sum_x[y\log(a^L(x)) + (1 - y)\log(1 - a^L(x))]
\end{equation}
Where $y \in \{0, 1\}$.
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Hinge loss: } C(y, a^L(x)) = -\frac{1}{N}\sum_x\max{(0, 1 - ya^L(x))}
\end{equation}
<br>
Where $y \in \{-1, 1\}$.
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Squared Hinge loss: } C(y, a^L(x)) = -\frac{1}{N}\sum_x\max{(0, 1 - ya^L(x))^2}
\end{equation}
<br>
Where $y \in \{-1, 1\}$.
<br>
<br>
Making use of the best cost function found among those alternatives, tests turn to defining the most appropriate activation function. While *sigmoid activation* will still be applied for the output neuron, the following alternatives configure a grid of alternative activation functions for hidden neurons:
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Sigmoid activation: } \sigma(z_j^l) = \frac{1}{1 + \exp{(z_j^l)}}
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Tanh activation: } \tanh(z_j^l) = \frac{\exp{(z_j^l)} - \exp{(-z_j^l)}}{\exp{(z_j^l)} + \exp{(-z_j^l)}}
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Rectified linear unit (ReLU) activation: } relu(z_j^l) = \max{(0, z_j^l)}
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Exponential linear unit (ELU) activation: } elu(z_j^l) =
    \left \{
    \begin{array}{ll}
    \alpha(\exp{(z_j^l)}-1), & \mbox{ if } z_j^l \leq 0 \\
    z_j^l, & \mbox{ if } z_j^l > 0
    \end{array}
    \right.
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Scaled exponential linear unit (SELU) activation: } selu(z_j^l) = 
    \left \{
    \begin{array}{ll}
    \lambda \alpha(\exp{(z_j^l)}-1), & \mbox{ if } z_j^l \leq 0 \\
    \lambda z_j^l, & \mbox{ if } z_j^l > 0
    \end{array}
    \right.
\end{equation}
<br>
Where $\lambda = 1.05070098$ and $\alpha = 1.67326324$.
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Leaky rectified linear unit (ReLU) activation: } leaky_relu(z_j^l) = 
    \left \{
    \begin{array}{ll}
    0.01z_j^l, & \mbox{ if } z_j^l \leq 0 \\
    z_j^l, & \mbox{ if } z_j^l > 0
    \end{array}
    \right.
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Parametric rectified linear unit (PReLU) activation: } prelu(z_j^l) = 
    \left \{
    \begin{array}{ll}
    \alpha z_j^l, & \mbox{ if } z_j^l \leq 0 \\
    z_j^l, & \mbox{ if } z_j^l > 0
    \end{array}
    \right.
\end{equation}
<br>
<br>
\begin{equation}
    \displaystyle \mbox{Swish activation: } swish(z_j^l) = \frac{z_j^l}{1 + \exp{(z_j^l)}}
\end{equation}
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* Fitting hyper-parameters: *after tests from last section, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* Adam optimizer for model estimation (non-fixed and parameter-specific learning rates).
* No regularization, no dropout and no early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='cost_function'></a>

### Cost function

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of cost functions

In [40]:
cost_functions = ['binary_crossentropy', 'hinge', 'squared_hinge']

#### Estimation loop

In [43]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(cost_functions),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over cost functions:
for c in range(len(cost_functions)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []
    
    if cost_functions[c] not in ['hinge', 'squared_hinge']:
        y_train = sample_train_scaled['y'].values
        y_val = samplescaled['y'].values

    else:
        # Converting labels 0 into -1 in order to apply Hinge cost functions:
        y_train = []
        y_val = []

        # Loop over observations
        for i in sample_train_scaled['y'].values:
            if i == 0:
                y_train.append(-1.0)
            else:
                y_train.append(1.0)

        y_train = np.array(y_train)

        # Loop over observations
        for i in samplescaled['y'].values:
            if i == 0:
                y_val.append(-1.0)
            else:
                y_val.append(1.0)

        y_val = np.array(y_val)

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_functions[c],
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_functions[c],
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative cost functions.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(c+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 50.88 minutes.
Start time: 2021-01-15, 21:55:12
End time: 2021-01-15, 22:46:06




#### Assessing results

In [46]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
costs = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over mini-batch sizes:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative cost functions.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    costs.append(e['architecture']['cost_function'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std__roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std__avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std__roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std__avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by cost function:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'batch_sizes': batch_sizes,
    'cost_function': costs,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_ues('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,batch_sizes,cost_function,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
0,1610758512,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,binary_crossentropy,0.825063,0.01235,0.21683,0.016672,66.807243,13.005352,16.67
1,1610759513,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,hinge,0.695188,0.037362,0.117656,0.03194,18.60675,3.683641,17.22
2,1610760546,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,squared_hinge,0.685966,0.032641,0.106505,0.031416,21.015342,3.390197,16.98


<a id='activation_functions'></a>

### Activation functions

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regul_param = 0
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of activation functions

In [38]:
activation_functions = ['sigmoid', 'tanh', 'relu', 'elu', 'selu', 'leaky_relu', 'prelu', 'swish']

model_architectures = [
    {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
         'activation': a,
         'dropout_param': 0},
     2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
         'activation': a,
         'dropout_param': 0}} for a in activation_functions]

model_architectures

[{1: {'neurons': 45, 'activation': 'sigmoid', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'sigmoid', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'relu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'elu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'elu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'selu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'selu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'leaky_relu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'leaky_relu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'prelu', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'prelu', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'swish', 'dropout_param': 0},
  2: {'neurons

#### Estimation loop

In [45]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(activation_functions),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over activation functions:
for a in range(len(activation_functions)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architectures[a], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regul_param = regul_param, input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architectures[a]),
            'num_hidden_neurons': [model_architectures[a][l]['neurons'] for l in model_architectures[a].keys()],
            'hidden_activations': [model_architectures[a][l]['activation'] for l in model_architectures[a].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architectures[a][l]['dropout_param'] for l in model_architectures[a].keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative activation functions.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(a+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 20.7 minutes.
Start time: 2021-01-17, 17:41:12
End time: 2021-01-17, 18:01:55




#### Assessing results

In [7]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over activation functions:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative activation functions.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by activation function:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'batch_sizes': batch_sizes,
    'activation_function': activations,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,batch_sizes,activation_function,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
1,1610905503,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,tanh,0.840113,0.01224,0.215579,0.010599,68.636417,20.340183,16.85
4,1610908768,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,selu,0.834494,0.012304,0.240145,0.019824,67.821606,12.113811,27.98
3,1610907581,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,elu,0.833973,0.011489,0.238648,0.018832,72.587216,12.672494,19.78
5,1610910448,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,leaky_relu,0.826563,0.010019,0.224233,0.016487,82.499836,13.600794,30.35
6,1610912269,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,prelu,0.824457,0.010283,0.220836,0.016831,80.180429,13.120543,31.45
7,1610916072,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,swish,0.823466,0.011124,0.216896,0.016818,74.028194,12.896742,20.7
2,1610906515,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,relu,0.822166,0.010714,0.218378,0.015513,76.740598,14.077458,17.77
0,1610904413,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",512,sigmoid,0.803191,0.011603,0.185369,0.014967,69.221882,12.385405,18.15


<a id='regularization'></a>

## Regularization

This section will explore **L2 regularization** to reduce overfitting and improve generalization. First, different values for the regularization parameter $\lambda$ will be tried on, revealing appropriate ranges for a complete grid (or random) search that should be implemented in a second round of tests.
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$. However, for the first round of tests, in which only one estimation will take place, $T = 100$ will be used*.
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* Adam optimizer for model estimation (non-fixed and parameter-specific learning rates).
* No dropout and no early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: for the first round of tests, only one estimation per value of regularization parameter will be implemented. For the second round of tests, a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='assessing_overfitting_l2'></a>

### Assessing overfitting through L2 regularization

#### Setting

In [37]:
# Number of estimations:
n_estimations = 1

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 100
es_param = None
batch_size = 512
regularization = 'l2'
input_dropout = 0

# Choose a value for the regularization parameter [0, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10]:
regul_param = 0

# Defining the optimizer:
default_adam = True

#### Estimation

In [38]:
start_time = datetime.now()

estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

# Lists to store results:
epoch_costs = []
min_cost = []
epoch_min_cost = []
min_cost = []
epoch_min_cost = []
val_roc_auc = []
val_avg_prec_score = []
val_brier_score = []
epoch_performance = []

# Loop over estimations:
for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam,
                     regularization = regularization, regul_param = regul_param, input_dropout = input_dropout)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              verbose = 0)

    # Performance metrics on validation data:
    val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
    val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
    val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))

    # Cost function by training epoch:
    model_costs = model.model_costs
    epoch_costs.append({'epoch': list(model_costs['epoch']),
                        'loss': list(model_costs['loss']),
                        'val_loss': list(model_costs['val_loss'])})

    min_cost.append(model_costs.loss.min())
    epoch_min_cost.append(model_costs.loss.idxmin() + 1)
    min_cost.append(model_costs.val_loss.min())
    epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Running time and performance metrics on validation data by epoch of training:
    epoch_performance.append(model.epoch_performance)

# Assessing running time:
nn_end_time = datetime.now()

# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regularization': regularization,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'validation',
        'epoch_costs': epoch_costs,
        'avg_epoch_costs': {
            'epoch': [sum(l)/len(l) for l in zip(*[d['epoch'] for d in epoch_costs])],
            'loss': [sum(l)/len(l) for l in zip(*[d['loss'] for d in epoch_costs])],
            'val_loss': [sum(l)/len(l) for l in zip(*[d['val_loss'] for d in epoch_costs])]
        },
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std_roc_auc': np.nanstd(val_roc_auc),
        'std_avg_prec_score': np.nanstd(val_avg_prec_score),
        'std_brier_score': np.nanstd(val_brier_score),
        'avg_epoch_performance': {
            'roc_auc': [sum(l)/len(l) for l in zip(*[d['epochroc_auc'] for d in epoch_performance])],
            'avg_prec_score': [sum(l)/len(l) for l in zip(*[d['epochavg_prec_score'] for d in
                                                            epoch_performance])],
            'brier_score': [sum(l)/len(l) for l in zip(*[d['epochbrier_score'] for d in
                                                         epoch_performance])],
            'running_time': [sum(l)/len(l) for l in zip(*[d['running_time'] for d in epoch_performance])]
        }
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": '{0}. Assessing overfitting by trying different values for the L2 regularization parameter.'.format(model_architecture_def)
}

if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

------------------------------------
[1mOverall running time:[0m 3.43 minutes.
Start time: 2021-01-23, 12:26:14
End time: 2021-01-23, 12:29:40




#### Assessing results

In [44]:
outcomes = [model_assessment[e] for e in model_assessment.keys() if
            ('Assessing overfitting by trying different values for the L2 regularization parameter.' in
            model_assessment[e]['comment']) &
            (model_assessment[e]['hyper_parameters']['regul_param'] == regul_param)][0]

# Cost function by training epoch:
model_costs = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'loss': outcomes['performance_metrics']['avg_epoch_costs']['loss'],
    'val_loss': outcomes['performance_metrics']['avg_epoch_costs']['val_loss']
})

epoch_performances = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'roc_auc': outcomes['performance_metrics']['avg_epoch_performance']['roc_auc'],
    'avg_prec_score': outcomes['performance_metrics']['avg_epoch_performance']['avg_prec_score']
})

In [40]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': False}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.loss, name='Training cost',
               hovertemplate =
                'loss = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.val_loss, name='Validation cost',
               hovertemplate = 'val_loss = %{y:.4f}',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=False,
)

# Changing layout:
fig.update_layout(
    title_text='Cost function by epoch of training - Regularization parameter = {0}'.format(regul_param),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='cost', secondary_y=False)

fig.show()
fig.write_html("Plots/epoch_costs_" + regularization + "_regul_param_" + str(regul_param) + ".html")

In [45]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': True}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.roc_auc, name='Val ROC-AUC',
               hovertemplate =
                'ROC-AUC = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.avg_prec_score, name='Val avg precision',
               hovertemplate = 'Avg precision = %{y:.4f}<br>'+
                               'epoch = %{x}<br>',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=True,
)

# Changing layout:
fig.update_layout(
    title_text='Validation performance by epoch of training - Regularization parameter = {0}'.format(regul_param),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='performance', secondary_y=False)

fig.show()
fig.write_html("Plots/epoch_roc_auc_" + regularization + "_regul_param_" + str(regul_param) + ".html")

<a id='assessing_overfitting_l1'></a>

### Assessing overfitting through L1 regularization

#### Setting

In [83]:
# Number of estimations:
n_estimations = 1

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 100
es_param = None
batch_size = 512
regularization = 'l1'
input_dropout = 0

# Choose a value for the regularization parameter [0, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10]:
regul_param = 10

# Defining the optimizer:
default_adam = True

#### Estimation

In [84]:
start_time = datetime.now()

estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

# Lists to store results:
epoch_costs = []
min_cost = []
epoch_min_cost = []
min_cost = []
epoch_min_cost = []
val_roc_auc = []
val_avg_prec_score = []
val_brier_score = []
epoch_performance = []

# Loop over estimations:
for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam,
                     regularization = regularization, regul_param = regul_param, input_dropout = input_dropout)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              verbose = 0)

    # Performance metrics on validation data:
    val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
    val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
    val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))

    # Cost function by training epoch:
    model_costs = model.model_costs
    epoch_costs.append({'epoch': list(model_costs['epoch']),
                        'loss': list(model_costs['loss']),
                        'val_loss': list(model_costs['val_loss'])})

    min_cost.append(model_costs.loss.min())
    epoch_min_cost.append(model_costs.loss.idxmin() + 1)
    min_cost.append(model_costs.val_loss.min())
    epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Running time and performance metrics on validation data by epoch of training:
    epoch_performance.append(model.epoch_performance)

# Assessing running time:
nn_end_time = datetime.now()

# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regularization': regularization,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'validation',
        'epoch_costs': epoch_costs,
        'avg_epoch_costs': {
            'epoch': [sum(l)/len(l) for l in zip(*[d['epoch'] for d in epoch_costs])],
            'loss': [sum(l)/len(l) for l in zip(*[d['loss'] for d in epoch_costs])],
            'val_loss': [sum(l)/len(l) for l in zip(*[d['val_loss'] for d in epoch_costs])]
        },
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std_roc_auc': np.nanstd(val_roc_auc),
        'std_avg_prec_score': np.nanstd(val_avg_prec_score),
        'std_brier_score': np.nanstd(val_brier_score),
        'avg_epoch_performance': {
            'roc_auc': [sum(l)/len(l) for l in zip(*[d['epochroc_auc'] for d in epoch_performance])],
            'avg_prec_score': [sum(l)/len(l) for l in zip(*[d['epochavg_prec_score'] for d in
                                                            epoch_performance])],
            'brier_score': [sum(l)/len(l) for l in zip(*[d['epochbrier_score'] for d in
                                                         epoch_performance])],
            'running_time': [sum(l)/len(l) for l in zip(*[d['running_time'] for d in epoch_performance])]
        }
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": '{0}. Assessing overfitting by trying different values for the L1 regularization parameter.'.format(model_architecture_def)
}

if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

------------------------------------
[1mOverall running time:[0m 1.58 minutes.
Start time: 2021-01-23, 13:58:01
End time: 2021-01-23, 13:59:37




#### Assessing results

In [85]:
outcomes = [model_assessment[e] for e in model_assessment.keys() if
            ('Assessing overfitting by trying different values for the L1 regularization parameter.' in
            model_assessment[e]['comment']) &
            (model_assessment[e]['hyper_parameters']['regul_param'] == regul_param)][0]

# Cost function by training epoch:
model_costs = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'loss': outcomes['performance_metrics']['avg_epoch_costs']['loss'],
    'val_loss': outcomes['performance_metrics']['avg_epoch_costs']['val_loss']
})

epoch_performances = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'roc_auc': outcomes['performance_metrics']['avg_epoch_performance']['roc_auc'],
    'avg_prec_score': outcomes['performance_metrics']['avg_epoch_performance']['avg_prec_score']
})

In [86]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': False}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.loss, name='Training cost',
               hovertemplate =
                'loss = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.val_loss, name='Validation cost',
               hovertemplate = 'val_loss = %{y:.4f}',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=False,
)

# Changing layout:
fig.update_layout(
    title_text='Cost function by epoch of training - L1 Regularization parameter = {0}'.format(regul_param),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='cost', secondary_y=False)

fig.show()
fig.write_html("Plots/epoch_costs_" + regularization + "_regul_param_" + str(regul_param) + ".html")

In [87]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': True}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.roc_auc, name='Val ROC-AUC',
               hovertemplate =
                'ROC-AUC = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=epoch_performances.epoch,
               y=epoch_performances.avg_prec_score, name='Val avg precision',
               hovertemplate = 'Avg precision = %{y:.4f}<br>'+
                               'epoch = %{x}<br>',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=True,
)

# Changing layout:
fig.update_layout(
    title_text='Validation performance by epoch of training - L1 Regularization parameter = {0}'.format(regul_param),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='performance', secondary_y=False)

fig.show()
fig.write_html("Plots/epoch_roc_auc_" + regularization + "_regul_param_" + str(regul_param) + ".html")

<a id='grid_l2_regul_params'></a>

### Grid of L2 regularization parameters

#### Setting

In [48]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of values

In [46]:
regul_params = sorted([1/(10**i) for i in range(9)])
regul_params.append(10)
regul_params

[1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10]

#### Estimation loop

In [49]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(regul_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over regularization parameters:
for r in range(len(regul_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regularization = regularization, regul_param = regul_params[r],
                         input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_params[r],
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for the L2 regularization parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(r+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 23.82 minutes.
Start time: 2021-01-25, 10:32:50
End time: 2021-01-25, 10:56:40




<a id='grid_l1_regul_params'></a>

### Grid of L1 regularization parameters

#### Setting

In [50]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l1'
input_dropout = 0

# Defining the optimizer:
default_adam = True

#### Grid of values

In [38]:
regul_params = sorted([1/(10**i) for i in range(9)])
regul_params.append(10)
regul_params

[1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10]

#### Estimation loop

In [53]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(regul_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over regularization parameters:
for r in range(len(regul_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regularization = regularization, regul_param = regul_params[r],
                         input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_params[r],
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for the L1 regularization parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(r+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 30.38 minutes.
Start time: 2021-01-25, 10:59:52
End time: 2021-01-25, 11:30:16




#### Assessing results

In [54]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
regul_option = []
regul_params = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over regularization parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative values for the L1 regularization parameter.' in model_assessment[e]['comment']) |
          ('Testing alternative values for the L2 regularization parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    regul_option.append(e['hyper_parameters']['regularization'])
    regul_params.append(e['hyper_parameters']['regul_param'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by regularization parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'regularization': regul_option,
    'regul_params': regul_params,
    'activation_function': activations,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,regularization,regul_params,activation_function,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
10,1611445622,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l1,1e-08,tanh,0.842478,0.010684,0.219281,0.012745,78.852707,17.205115,27.73
20,1611581570,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,0.0,tanh,0.841854,0.011076,0.218019,0.010311,76.004348,21.144747,23.82
3,1611425007,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,1e-05,tanh,0.84173,0.010619,0.220132,0.011486,79.264487,19.165792,28.42
12,1611449073,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l1,1e-06,tanh,0.841708,0.01121,0.218036,0.010125,75.088199,21.53474,33.6
0,1611421300,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,1e-08,tanh,0.840573,0.010802,0.217344,0.01222,77.814185,17.786486,17.98
4,1611426713,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,0.0001,tanh,0.840472,0.011759,0.226658,0.011092,71.472031,20.434551,31.82
2,1611423572,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,1e-06,tanh,0.839394,0.011732,0.217219,0.011849,71.548231,18.332164,23.92
21,1611583192,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l1,0.0,tanh,0.839098,0.012917,0.217027,0.010187,64.962496,21.303333,30.38
5,1611428623,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,0.001,tanh,0.839048,0.009865,0.245409,0.01063,85.052495,23.086329,42.0
1,1611422380,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",l2,1e-07,tanh,0.838922,0.014618,0.21793,0.010546,57.391121,20.664231,19.85


<a id='dropout'></a>

## Dropout

This section will explore an additional approach to reduce overfitting and improve generalization. **Dropout layers** attenuate the influence of specific observations by turning to zero a fraction $\rho$ of their neurons at each mini-batch update. As a consequence of their activations being set equal to zero, all associated parameters are not updated in a given mini-batch update.
<br>
<br>
Dropout applies for both input and hidden layers, and the same grid of values $\rho \in [0, 0.1, 0.2, 0.3, 0.4, 0.5]$ will be tested separately for each kind of dropout. First, tests will be conducted regarding the input layer, and in a second moment the best value of dropout fraction for hidden layers will be assessed.
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* Adam optimizer for model estimation (non-fixed and parameter-specific learning rates).
* *The section concerning regularization has shown that L2 regularization with $\lambda = 1e-5$ is the best alternative to improve generalization by using standard regularization techniques.*
* No early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='input_dropout'></a>

### Grid of values for input dropout

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5

# Defining the optimizer:
default_adam = True

#### Grid of values

In [38]:
dropout_params = [0, 0.1, 0.2, 0.3, 0.4, 0.5]
dropout_params

[0, 0.1, 0.2, 0.3, 0.4, 0.5]

#### Estimation loop

In [43]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(dropout_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over dropout parameters:
for d in range(len(dropout_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = dropout_params[d])

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': dropout_params[d],
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for input dropout parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(d+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 23.78 minutes.
Start time: 2021-01-27, 08:56:01
End time: 2021-01-27, 09:19:48




#### Assessing results

In [7]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
dropout_params = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over dropout parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative values for input dropout parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    dropout_params.append(e['hyper_parameters']['input_dropout'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by dropout parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'input_dropout': dropout_params,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,input_dropout,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
0,1611710233,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.0,0.839178,0.012864,0.217772,0.010811,65.235886,20.143439,29.57
1,1611712007,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.1,0.837265,0.010962,0.216944,0.010359,76.381586,20.942726,22.12
2,1611713335,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.2,0.832779,0.012708,0.214787,0.009934,65.532764,21.622277,22.03
3,1611714657,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.3,0.828563,0.010786,0.208723,0.009775,76.815692,21.352891,24.8
4,1611716146,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.4,0.822564,0.012071,0.201364,0.00926,68.142388,21.746335,33.42
5,1611748561,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,0.5,0.814562,0.014758,0.190004,0.007745,55.195296,24.532837,23.77


<a id='hidden_dropout'></a>

### Grid of values for dropout of hidden neurons

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1

# Defining the optimizer:
default_adam = True

#### Grid of values

In [41]:
dropout_params = dropout_params = [0, 0.1, 0.2, 0.3, 0.4, 0.5]
model_architectures = [{1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                            'activation': 'tanh',
                            'dropout_param': d},
                        2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                            'activation': 'tanh',
                            'dropout_param': d}} for d in dropout_params]

model_architectures

[{1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0.1},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0.1}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0.2},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0.2}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0.3},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0.3}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0.4},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0.4}},
 {1: {'neurons': 45, 'activation': 'tanh', 'dropout_param': 0.5},
  2: {'neurons': 22, 'activation': 'tanh', 'dropout_param': 0.5}}]

#### Estimation loop

In [43]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(model_architectures),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over dropout parameters:
for d in range(len(model_architectures)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architectures[d], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architectures[d]),
            'num_hidden_neurons': [model_architectures[d][l]['neurons'] for l in model_architectures[d].keys()],
            'hidden_activations': [model_architectures[d][l]['activation'] for l in model_architectures[d].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architectures[d][l]['dropout_param'] for l in model_architectures[d].keys()],
            'default_adam': default_adam
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for hidden dropout parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(d+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 114.18 minutes.
Start time: 2021-01-27, 17:50:18
End time: 2021-01-27, 19:44:30




#### Assessing results

In [8]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
dropout_params = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over dropout parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative values for hidden dropout parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    dropout_params.append(e['hyper_parameters']['hidden_dropout'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by dropout parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'hidden_dropout': dropout_params,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,hidden_dropout,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
0,1611758452,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0, 0]",0.836742,0.010091,0.216486,0.011702,82.91932,18.49974,20.03
1,1611780618,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0.1, 0.1]",0.835017,0.011447,0.211302,0.010447,72.944082,20.226319,22.2
2,1611781951,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0.2, 0.2]",0.830879,0.011435,0.204076,0.010755,72.663473,18.974657,19.65
3,1611783130,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0.3, 0.3]",0.828517,0.012693,0.195914,0.010254,65.273031,19.105422,18.85
4,1611784262,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0.4, 0.4]",0.824138,0.010985,0.183365,0.011847,75.025663,15.477594,22.97
5,1611785641,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,"[0.5, 0.5]",0.817505,0.013682,0.167977,0.011663,59.749227,14.402298,30.47


<a id='learning_rate'></a>

## Learning rate

So far, learning rate has been approached by the use of Adam optimizer, which dynamically calculates learning rates parameter-specific. Besides, all hyper-parameters of Adam has been using their Keras default values. This implies values of $\eta = 0.001$, $\beta_1 = 0.9$ and $\beta_2 = 0.999$. While $\eta$ is the initial learning rate, $\beta_1$ and $\beta_2$ control the decay of learning rates. The decay causes faster learning during initial updates, sequentially slowing down as iterations take place. Therefore, the higher decay parameters are, the more dynamic the learning process will be. Note that no decay (and no momentum) results in standard stochastic gradient descent (SGD).
<br>
<br>
Consequently, two tests shoud be implemented in order to fine tune the learning process: first, alternative values for Adam hyper-parameters will be used; then, tests will focus on different settings for SGD optimizer. The comparison between Adam and SGD optimizers will result from these two tests.
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* *The section concerning regularization has shown that L2 regularization with $\lambda = 1e-5$ is the best alternative to improve generalization by using standard regularization techniques.*
* *Previous tests have pointed to the adequacy of using dropout layers with parameters $\rho_{input} = 0.1$ and $\rho_{hidden} = 0.1$ for input and hidden layers, respectively.*
* No early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data. Defining the best SGD setting will also make use of cost function evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='adam_params'></a>

### Testing Adam hyper-parameters

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1

# Defining the optimizer:
default_adam = False
optimizer = 'adam'

#### Grid of values

In [42]:
opt_params = {
    'learning_rate': [0.001, 0.01, 0.0001],
    'beta_1': [0.9, 0.75, 0.99],
    'beta_2': [0.999, 0.9, 0.9999],
    'epsilon': [1e-07]
}

# List with all permutations of possible values for hyper-parameters:
opt_params = permutation(opt_params)

#### Estimation loop

In [42]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(opt_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over optimizer parameters:
for o in range(len(opt_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params[o],
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params[o]
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'stdroc_auc': np.nanstd(val_roc_auc),
            'stdavg_prec_score': np.nanstd(val_avg_prec_score),
            'stdbrier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for Adam hyper-parameters.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(o+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 135.88 minutes.
Start time: 2021-01-31, 13:31:52
End time: 2021-01-31, 15:47:45




#### Assessing results

In [11]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
optimizers = []
learning_rates = []
beta1s = []
beta2s = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over optimizer parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative values for Adam hyper-parameters.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by Adam parameters:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
19,1612103127,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,0.841918,0.009224,0.220372,0.010299,91.27895,21.397904,33.63
1,1612035335,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.999,0.840332,0.008792,0.216347,0.010226,95.575174,21.157383,18.32
18,1612101357,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.9,0.9999,0.834922,0.01101,0.212843,0.012475,75.834169,17.062192,29.48
0,1612034069,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.9,0.999,0.833715,0.012357,0.211345,0.01098,67.468275,19.248567,21.08
10,1612061623,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9,0.831532,0.008319,0.21304,0.009796,99.957116,21.747802,20.62
9,1612060550,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.9,0.9,0.822889,0.013187,0.205331,0.010529,62.40277,19.501402,17.87
6,1612042453,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.0001,0.9,0.999,0.797746,0.013293,0.139704,0.012795,60.011605,10.918304,35.15
7,1612058405,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.0001,0.75,0.999,0.797398,0.010884,0.13882,0.012903,73.264003,10.75902,18.1
24,1612114011,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.0001,0.9,0.9999,0.796865,0.012053,0.137727,0.012769,66.114846,10.786355,20.02
25,1612115212,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.0001,0.75,0.9999,0.796087,0.011199,0.1403,0.013054,71.083088,10.747258,29.05


<a id='sgd_opt1'></a>

### SGD optimizer (no momentum and no decay)

This analysis tries to identify an appropriate value for the learning rate $\eta_0$ such that learning occurs rapidly enough without too much oscillation.

#### Setting

In [104]:
# Number of estimations:
n_estimations = 1

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'relu',
                          'dropout_param': 0},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'relu',
                          'dropout_param': 0}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 100
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1
opt_params = {'learning_rate': 0.005, 'momentum': 0.0, 'decay': 0.0}

# Defining the optimizer:
default_adam = False
optimizer = 'sgd'

#### Estimation

In [105]:
start_time = datetime.now()

estimation_id = str(int(time.time()))

nn_start_time = datetime.now()

# Lists to store results:
epoch_costs = []
min_cost = []
epoch_min_cost = []
min_cost = []
epoch_min_cost = []
val_roc_auc = []
val_avg_prec_score = []
val_brier_score = []
epoch_performance = []

# Loop over estimations:
for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                     regularization = regularization, regul_param = regul_param,
                     input_dropout = input_dropout)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              verbose = 0)

    # Performance metrics on validation data:
    val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
    val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
    val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))

    # Cost function by training epoch:
    model_costs = model.model_costs
    epoch_costs.append({'epoch': list(model_costs['epoch']),
                        'loss': list(model_costs['loss']),
                        'val_loss': list(model_costs['val_loss'])})

    min_cost.append(model_costs.loss.min())
    epoch_min_cost.append(model_costs.loss.idxmin() + 1)
    min_cost.append(model_costs.val_loss.min())
    epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Running time and performance metrics on validation data by epoch of training:
    epoch_performance.append(model.epoch_performance)

# Assessing running time:
nn_end_time = datetime.now()

# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'batch_size': batch_size,
        'es_param': es_param,
        'regularization': regularization,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam,
        'optimizer': optimizer,
        'opt_params': opt_params
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'validation',
        'avg_epoch_costs': {
            'epoch': [sum(l)/len(l) for l in zip(*[d['epoch'] for d in epoch_costs])],
            'loss': [sum(l)/len(l) for l in zip(*[d['loss'] for d in epoch_costs])],
            'val_loss': [sum(l)/len(l) for l in zip(*[d['val_loss'] for d in epoch_costs])]
        },
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std_roc_auc': np.nanstd(val_roc_auc),
        'std_avg_prec_score': np.nanstd(val_avg_prec_score),
        'std_brier_score': np.nanstd(val_brier_score),
        'avg_epoch_performance': {
            'roc_auc': [sum(l)/len(l) for l in zip(*[d['epoch_val_roc_auc'] for d in epoch_performance])],
            'avg_prec_score': [sum(l)/len(l) for l in zip(*[d['epoch_avg_prec_score'] for d in
                                                            epoch_performance])],
            'brier_score': [sum(l)/len(l) for l in zip(*[d['epoch_brier_score'] for d in
                                                         epoch_performance])],
            'running_time': [sum(l)/len(l) for l in zip(*[d['running_time'] for d in epoch_performance])]
        }
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": '{0}. Defining learning rate with no momentum and no decay.'.format(model_architecture_def)
}

if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

------------------------------------
[1mOverall running time:[0m 2.03 minutes.
Start time: 2021-02-01, 12:33:04
End time: 2021-02-01, 12:35:07




#### Assessing results

In [12]:
lrs = dict(zip([e for e in model_assessment.keys() if ('Defining learning rate with no momentum and no decay.' in
                                        model_assessment[e]['comment'])],
               [model_assessment[e]['hyper_parameters']['opt_params']['learning_rate'] for e in
                model_assessment.keys() if ('Defining learning rate with no momentum and no decay.' in
                                            model_assessment[e]['comment'])]))
lrs

{'1612134743': 0.01,
 '1612135513': 0.1,
 '1612135652': 1,
 '1612136651': 0.001,
 '1612139237': 0.0001,
 '1612193584': 0.005}

In [13]:
# Choose an estimation_id:
es_id = '1612193584'

outcomes = model_assessment[es_id]

# Cost function by training epoch:
model_costs = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'loss': outcomes['performance_metrics']['avg_epoch_costs']['loss'],
    'val_loss': outcomes['performance_metrics']['avg_epoch_costs']['val_loss']
})

epoch_performances = pd.DataFrame(data = {
    'epoch': outcomes['performance_metrics']['avg_epoch_costs']['epoch'],
    'roc_auc': outcomes['performance_metrics']['avg_epoch_performance']['roc_auc'],
    'avg_prec_score': outcomes['performance_metrics']['avg_epoch_performance']['avg_prec_score']
})

In [14]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': False}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.loss, name='Training cost',
               hovertemplate =
                'loss = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=model_costs.epoch,
               y=model_costs.val_loss, name='Validation cost',
               hovertemplate = 'val_loss = %{y:.4f}',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=False,
)

# Changing layout:
fig.update_layout(
    title_text='Cost function by epoch of training - Learning rate = {0}'.format(lrs[es_id]),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='cost', secondary_y=False)

fig.show()

<a id='sgd_opt2'></a>

### SGD optimizer

#### Setting

In [45]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1

# Defining the optimizer:
default_adam = False
optimizer = 'sgd'

#### Grid of values

In [47]:
opt_params = {
    'learning_rate': [0.01, 0.001, 0.005],
    'momentum': [0, 0.1, 0.9],
    'decay': [0, 0.1, 0.9],
}

# List with all permutations of possible values for hyper-parameters:
opt_params = permutation(opt_params)

#### Estimation loop

In [50]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(opt_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over optimizer parameters:
for o in range(len(opt_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params[o],
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params[o]
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative values for SGD hyper-parameters.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(o+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 19.08 minutes.
Start time: 2021-02-03, 22:51:32
End time: 2021-02-03, 23:10:38




#### Assessing results

In [10]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
optimizers = []
learning_rates = []
momentums = []
decays = []
activations = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over optimizer parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative values for SGD hyper-parameters.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    momentums.append(e['hyper_parameters']['opt_params']['momentum'])
    decays.append(e['hyper_parameters']['opt_params']['decay'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by SGD parameters:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'momentum': momentums,
    'decay': decays,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,momentum,decay,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
2,1612198223,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.01,0.9,0.0,0.816414,0.011014,0.192285,0.016794,74.126492,11.449813,28.55
8,1612233471,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.005,0.9,0.0,0.814363,0.012583,0.172694,0.018062,64.72075,9.561348,28.97
1,1612196986,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.01,0.1,0.0,0.795981,0.014481,0.144865,0.019058,54.966973,7.601138,20.6
0,1612195537,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.01,0.0,0.0,0.791969,0.015075,0.139983,0.020774,52.535652,6.738231,24.13
5,1612229152,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.001,0.9,0.0,0.774697,0.020814,0.12865,0.015749,37.220781,8.168975,27.78
7,1612232174,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.005,0.1,0.0,0.771879,0.02362,0.116165,0.018876,32.678861,6.154082,21.62
6,1612230820,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.005,0.0,0.0,0.768147,0.019225,0.110459,0.019086,39.956433,5.787545,22.55
11,1612299796,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.01,0.9,0.1,0.717245,0.027667,0.073812,0.018261,25.924611,4.041969,20.72
4,1612201860,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.001,0.1,0.0,0.712283,0.021429,0.058935,0.016131,33.238699,3.653569,37.37
3,1612199938,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,sgd,0.001,0.0,0.0,0.704824,0.023954,0.062284,0.019388,29.424035,3.212468,32.02


<a id='parameters_init'></a>

## Parameters initialization

The default distribution for weights initialization is given by **Glorot Uniform** specification. Under this setting, initial values for its weights $w_{jk}^l$ are extracted from a Uniform distribution $Unif(-a, a)$:
<br>
<br>
\begin{equation}
    \displaystyle a = \sqrt{\frac{6}{fan\_in + fan\_out}}
\end{equation}
<br>
<br>
Where $fan\_in$ is the number of predecessor neurons and $fan\_out$ is the number of subsequent neurons with respect to weight $w_{jk}^l$. Considering a fully connected layer $l$, $fan\_in = J_{l-1}$ and $fan\_out = J_{l+1}$, since all neurons in layer $l$ are connected to all neurons in predecessor layer $l-1$ and to all neurons in subsequent layer $l+1$. An immediate alternative to Glorot Uniform is given by replacing a Uniform distribution by a Normal distribution. This creates the **Glorot Normal** setting, under which initial parameters follow from $N(0, \sigma)$:
<br>
<br>
\begin{equation}
    \displaystyle \sigma = \sqrt{\frac{2}{fan\_in + fan\_out}}
\end{equation}
<br>
<br>
Instead of these specifications that consider the density connection of neurons, two further alternatives involve the use of constant **Uniform distribution**, $Unif(-a, a)$, and constant **Normal distribution**, $N(0, \sigma)$. In tests below, it will be used $a = 0.05$ and $\sigma = 0.05$. A final option regards using a **truncated Normal distribution**, with specification $N(0, \sigma)$.
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *After tests for number of neurons and number of hidden layers, the architecture will be given by two hidden layers with the following number of neurons:*
    * $J_1 = \sqrt{num\_inputs*num\_outputs}$.
    * $J_2 = J_1/2$.
<br>
<br>
* *For the learning rate setting, tests have shown that Adam is a better option when compared to SGD optimizer. The best specification of its hyper-parameters is as follows: $learning\_rate = 0.001$, $\beta_1 = 0.75$ and $\beta_2 = 0.9999$. This represents a modification to default values ($learning\_rate = 0.001$, $\beta_1 = 0.9$, $beta_2 = 0.999$).*
* *The section concerning regularization has shown that L2 regularization with $\lambda = 1e-5$ is the best alternative to improve generalization by using standard regularization techniques.*
* *Previous tests have pointed to the adequacy of using dropout layers with parameters $\rho_{input} = 0.1$ and $\rho_{hidden} = 0.1$ for input and hidden layers, respectively.*
* No early stopping.
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='distributions'></a>

### Grid of distributions

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.75,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of distributions

In [38]:
dists = ['glorot_uniform', 'glorot_normal', 'random_uniform', 'random_normal', 'truncated_normal']

#### Estimation loop

In [39]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(dists),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over weights initialization:
for d in range(len(dists)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout,
                         weights_init = dists[d])

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': dists[d]
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Testing alternative distributions for weights initialization.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(d+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 110.38 minutes.
Start time: 2021-02-06, 15:24:16
End time: 2021-02-06, 17:14:39




#### Assessing results

In [40]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
optimizers = []
learning_rates = []
beta1s = []
beta2s = []
activations = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over weights initialization:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Testing alternative distributions for weights initialization.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by weights initialization:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
0,1612635856,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.84132,0.008685,0.218564,0.011458,96.872546,19.074985,20.25
3,1612639360,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,random_normal,0.840486,0.006709,0.213651,0.010358,125.281024,20.625824,22.13
1,1612637071,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,glorot_normal,0.840143,0.009398,0.220747,0.010508,89.394914,21.008163,19.3
4,1612640689,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,truncated_normal,0.839116,0.005847,0.215265,0.009317,143.508209,23.105433,29.83
2,1612638230,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,random_uniform,0.834036,0.005748,0.21234,0.009228,145.093344,23.00979,18.83


<a id='architecture_review'></a>

## Architecture review

After specifying all definitions and hyper-parameters relevant for constructing a fully connected feedforward neural network, it is time to review the architecture previously defined. The following alternatives will be tested:
* Current definition: $J_1 = \sqrt{num\_inputs*num\_outputs}$ and $J_2 = J_1/2$.
* Alternative 1: $J_1 = \sqrt{num\_inputs*num\_outputs}$.
* Alternative 2: $J_1 = (num\_inputs + num\_outputs)*0.1$.
* Alternative 3: $J_1 = \sqrt{num\_inputs*J_2}$ and $J_2 =\sqrt{J_1*num\_outputs}$.
* Alternative 4: $J_1 = \sqrt{num\_inputs*num\_outputs}$ and $J_2 = J_1$.
* Alternative 5: $J_1 = (num\_inputs + J_2)*0.1$ and $J_2 = (J_1 + num\_outputs)*0.1$.
* Alternative 6: $J_1 = (num\_inputs + num\_outputs)*0.1$ and $J_2 = J_1/2$.
* Alternative 7: $J_1 = (num\_inputs + num\_outputs)*0.1$ and $J_2 = J_1$.
* Alternative 8: $J_1 = \sqrt{num\_inputs*J_2}$ and $J_2 =\sqrt{J_1*J_3}$ and $J_3 = \sqrt{J_2*num\_outputs}$.
* Alternative 9: $J_1 = \sqrt{num\_inputs*num\_outputs}$, $J_2 = J_1/2$, $J_3 = J_1/4$.
* Alternative 10: $J_1 = \sqrt{num\_inputs*num\_outputs}$, $J_2 = J_1$, $J_3 = J_1$.
<br>
<br>
All estimations will follow the best alternatives derived from tests above:
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *For the learning rate setting, tests have shown that Adam is a better option when compared to SGD optimizer. The best specification of its hyper-parameters is as follows: $learning\_rate = 0.001$, $\beta_1 = 0.75$ and $\beta_2 = 0.9999$. This represents a modification to default values ($learning\_rate = 0.001$, $\beta_1 = 0.9$, $beta_2 = 0.999$).*
* *The section concerning regularization has shown that L2 regularization with $\lambda = 1e-5$ is the best alternative to improve generalization by using standard regularization techniques.*
* *Previous tests have pointed to the adequacy of using dropout layers with parameters $\rho_{input} = 0.1$ and $\rho_{hidden} = 0.1$ for input and hidden layers, respectively.*
* No early stopping.
* *Testing alternative distributions for weights initialization has shown that Glorot Uniform is the best option among those tested.*
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='testing_architectures'></a>

### Testing alternative architectures

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.75,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of model architectures

One hidden layer

In [38]:
# First alternative with 1 hidden layer:
model_architectures = [{1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                      'activation': 'tanh',
                      'dropout_param': 0.1}}]
model_architectures_def = ['One hidden layer: J1 = sqrt(num_inputs*num_outputs)']

# Second alternative with 1 hidden layer:
model_architectures.append({1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('One hidden layer: J1 = (num_inputs + num_outputs)*0.1')

Two hidden layers

In [39]:
# First alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(X_train.shape[1]**(2/3)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(X_train.shape[1]**(1/3)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*J2), J2 = sqrt(J1*num_outputs)')

# Second alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2')

# Third alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1')

# Fourth alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(((0.1*X_train.shape[1]) + (0.01*1))/0.99),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int((int(((0.1*X_train.shape[1]) + (0.01*1))/0.99) + 1)*0.1),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = (num_inputs + J2)*0.1, J2 = (J1 + num_outputs)*0.1')

# Fifth alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1/2)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = (num_inputs + num_outputs)*0.1, J2 = J1/2')

# Sixth: alternative with 2 hidden layers:
model_architectures.append({1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Two hidden layers: J1 = (num_inputs + num_outputs)*0.1, J2 = J1')

Three hidden layers

In [40]:
# First alternative with 3 hidden layers:
model_architectures.append({1: {'neurons': int(X_train.shape[1]**(3/4)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(X_train.shape[1]**(1/2)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      3: {'neurons': int(X_train.shape[1]**(1/4)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Three hidden layers: J1 = sqrt(num_inputs*J2), J2 = sqrt(J1*J3), J3 = sqrt(J2*num_outputs)')

# Second alternative with 3 hidden layers:
model_architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)/2),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      3: {'neurons': int(np.sqrt(X_train.shape[1]*1)/4),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Three hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1/2, J3 = J1/4')

# Third alternative with 3 hidden layers:
model_architectures.append({1: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      2: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1},
                      3: {'neurons': int(np.sqrt(X_train.shape[1]*1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}})
model_architectures_def.append('Three hidden layers: J1 = sqrt(num_inputs*num_outputs), J2 = J1, J3 = J1')

#### Estimation loop

In [49]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(model_architectures),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over architectures:
for a in range(len(model_architectures)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architectures[a], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout,
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architectures[a]),
            'num_hidden_neurons': [model_architectures[a][l]['neurons'] for l in model_architectures[a].keys()],
            'hidden_activations': [model_architectures[a][l]['activation'] for l in model_architectures[a].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architectures[a][l]['dropout_param'] for l in model_architectures[a].keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Reviewing model architecture.'.format(model_architectures_def[a])
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(a+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 40.48 minutes.
Start time: 2021-02-12, 07:42:50
End time: 2021-02-12, 08:23:20




#### Assessing results

In [43]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
optimizers = []
learning_rates = []
beta1s = []
beta2s = []
activations = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over architectures:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Reviewing model architecture.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_val_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_val_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_val_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_val_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by architecture:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
1,1613085785,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.851381,0.004499,0.263687,0.008464,189.244136,31.152457,33.45
2,1613087793,"J1 = sqrt(num_inputs*J2), J2 = sqrt(J1*num_out...",2,"[163, 12]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.846525,0.008377,0.224794,0.010787,101.05854,20.83957,35.68
0,1613084242,J1 = sqrt(num_inputs*num_outputs). Reviewing m...,1,[45],tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.845097,0.005432,0.23292,0.010338,155.589769,22.530213,25.72
3,1613089935,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2. ...",2,"[45, 22]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.842236,0.00808,0.219782,0.01045,104.23582,21.031618,34.3
9,1613126570,"J1 = sqrt(num_inputs*num_outputs), J2 = J1/2, ...",3,"[45, 22, 11]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.840472,0.00848,0.210435,0.011061,99.114987,19.025528,20.85
10,1613127822,"J1 = sqrt(num_inputs*num_outputs), J2 = J1, J3...",3,"[45, 45, 45]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.840373,0.00656,0.217723,0.010884,128.098596,20.004111,19.63
4,1613091994,"J1 = sqrt(num_inputs*num_outputs), J2 = J1. Re...",2,"[45, 45]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.838903,0.007885,0.221652,0.01053,106.393521,21.04921,38.1
8,1613102037,"J1 = sqrt(num_inputs*J2), J2 = sqrt(J1*J3), J3...",3,"[309, 45, 6]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.833386,0.011575,0.210855,0.01311,71.998131,16.083695,43.33
5,1613096606,"J1 = (num_inputs + J2)*0.1, J2 = (J1 + num_out...",2,"[211, 21]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.82873,0.013027,0.227253,0.010914,63.61795,20.822321,29.05
7,1613100177,"J1 = (num_inputs + num_outputs)*0.1, J2 = J1. ...",2,"[209, 209]",tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.822881,0.007938,0.241864,0.010523,103.664205,22.983567,31.0


<a id='further_review'></a>

## Further review

This final round of tests aims to explore even further three hyper-parameters that have shown some potential to contribute more with improving performance metrics: mini-batch size, parameter of input dropout, parameter of hidden dropout, and Adam hyper-parameters. Tests will occur subsequently as follows:
* Mini-batch size: a grid of three best ranked values will be tested again: $S \in [32, 512, 1024]$.
* Input and hidden dropouts: the best ranked values of $\rho_{input} = 0.1$ and $\rho_{output} = 0.1$ will be opposed to no dropout, $\rho_{input} = 0$ and $\rho_{hidden}$.
* Adam hyper-parameters: $\beta_1 \in [0.25, 0.5, 0.75]$ and $\beta_2 \in [0.999, 0.9999]$.
<br>
<br>
All estimations will follow the best alternatives derived from tests above (except when the values for the hyper-parameter are to be tested):
* Random samples of training and validation data.
* *Following results of tests from an earlier section, cross-entropy cost function will be used. Regarding activation functions, sigmoid will be applied for the neuron in the output layer, while tanh activation function will be used for neurons in hidden layers, since this alternative has shown the best results during tests.*
* Fitting hyper-parameters: *after previous tests, mini-batch size is set to $S = 512$, while number of epochs is still kept as low as possible to simplify estimations, $T = 10$.*
* *After further tests for number of neurons and number of hidden layers, the architecture will be given by one hidden layer with the following number of neurons:* $J_1 = (num\_inputs + num\_outputs)*0.1$.
* *For the learning rate setting, tests have shown that Adam is a better option when compared to SGD optimizer. The best specification of its hyper-parameters is as follows: $learning\_rate = 0.001$, $\beta_1 = 0.75$ and $\beta_2 = 0.9999$. This represents a modification to default values ($learning\_rate = 0.001$, $\beta_1 = 0.9$, $beta_2 = 0.999$).*
* *The section concerning regularization has shown that L2 regularization with $\lambda = 1e-5$ is the best alternative to improve generalization by using standard regularization techniques.*
* *Previous tests have pointed to the adequacy of using dropout layers with parameters $\rho_{input} = 0.1$ and $\rho_{hidden} = 0.1$ for input and hidden layers, respectively.*
* No early stopping.
* *Testing alternative distributions for weights initialization has shown that Glorot Uniform is the best option among those tested.*
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [36]:
# Converting data from dataframes into nd-arrays:
X_train = sample_train_scaled.drop(drop_vars, axis=1).values
y_train = sample_train_scaled['y'].values

X_val = sample_val_scaled.drop(drop_vars, axis=1).values
y_val = sample_val_scaled['y'].values

<a id='mini_batch_size_review'></a>

### Mini-batch size

#### Setting

In [37]:
# Model architecture:
model_architecture = {1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1'

# Number of estimations:
n_estimations = 100

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0.1
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.75,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of mini-batch sizes

In [40]:
batch_sizes = [32, 512, 1024]

#### Estimation loop

In [41]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(batch_sizes),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over mini-batch sizes:
for b in range(len(batch_sizes)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_sizes[b],
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout,
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_sizes[b],
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Reviewing mini-batch size.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(b+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 119.42 minutes.
Start time: 2021-02-13, 09:36:40
End time: 2021-02-13, 11:36:05




#### Assessing results

In [43]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
optimizers = []
learning_rates = []
beta1s = []
beta2s = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over mini-batch sizes:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Reviewing mini-batch size.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by mini-batch size:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'batch_size': batch_sizes,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,batch_size,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
2,1613225322,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],1024,tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.871868,0.006789,0.276781,0.009435,128.420236,29.337094,27.38
1,1613223667,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],512,tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.851573,0.004619,0.26205,0.008517,184.348425,30.768012,27.57
0,1613219800,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],32,tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.797981,0.005497,0.213972,0.015684,145.161158,13.643076,64.45


<a id='input_dropout_review'></a>

### Input dropout

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1.'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.75,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of values  for input dropout

In [38]:
dropouts = [0, 0.1]

#### Estimation loop

In [40]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(dropouts),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over dropout parameters:
for d in range(len(dropouts)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = dropouts[d],
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': dropouts[d],
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Reviewing input dropout parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(d+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 29.57 minutes.
Start time: 2021-02-13, 13:07:36
End time: 2021-02-13, 13:37:11




#### Assessing results

In [41]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
optimizers = []
input_dropouts = []
learning_rates = []
beta1s = []
beta2s = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over dropout parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Reviewing input dropout parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    input_dropouts.append(e['hyper_parameters']['input_dropout'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by dropout parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'input_dropout': input_dropouts,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,input_dropout,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
1,1613232456,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.1,0.85246,0.005029,0.262223,0.008156,169.513525,32.151235,29.55
0,1613227483,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.0,0.852042,0.00395,0.262149,0.007758,215.717138,33.791632,33.27


<a id='hidden_dropout_review'></a>

### Hidden dropout

#### Setting

In [42]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.75,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of values  for hidden dropout

In [43]:
dropout_params = [0, 0.1]
model_architectures = [{1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                            'activation': 'tanh',
                            'dropout_param': d}} for d in dropout_params]

model_architectures

[{1: {'neurons': 209, 'activation': 'tanh', 'dropout_param': 0}},
 {1: {'neurons': 209, 'activation': 'tanh', 'dropout_param': 0.1}}]

#### Estimation loop

In [47]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(model_architectures),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over dropout parameters:
for d in range(len(model_architectures)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architectures[d], num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout,
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architectures[d]),
            'num_hidden_neurons': [model_architectures[d][l]['neurons'] for l in model_architectures[d].keys()],
            'hidden_activations': [model_architectures[d][l]['activation'] for l in model_architectures[d].keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architectures[d][l]['dropout_param'] for l in model_architectures[d].keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Reviewing hidden dropout parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(d+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 65.65 minutes.
Start time: 2021-02-13, 14:10:31
End time: 2021-02-13, 15:16:11




#### Assessing results

In [48]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
optimizers = []
hidden_dropouts = []
learning_rates = []
beta1s = []
beta2s = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over dropout parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Reviewing hidden dropout parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    hidden_dropouts.append(e['hyper_parameters']['hidden_dropout'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by dropout parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'hidden_dropout': hidden_dropouts,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,hidden_dropout,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
1,1613238130,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,[0.1],0.852702,0.00463,0.260801,0.00833,184.156093,31.306822,34.0
0,1613236231,J1 = (num_inputs + num_outputs)*0.1. Reviewing...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,[0],0.852324,0.004588,0.261745,0.008355,185.759093,31.329117,31.65


<a id='adam_params_review'></a>

### Adam hyper-parameters

#### Setting

In [37]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1.'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
regul_param = 1e-5
input_dropout = 0
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'

#### Grid of values

In [44]:
opt_params = {
    'learning_rate': [0.001],
    'beta_1': [0.75, 0.5, 0.25],
    'beta_2': [0.999, 0.9999],
    'epsilon': [1e-07]
}

# List with all permutations of possible values for hyper-parameters:
opt_params = permutation(opt_params)

#### Estimation loop

In [48]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(opt_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over optimizer parameters:
for o in range(len(opt_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params[o],
                         regularization = regularization, regul_param = regul_param,
                         input_dropout = input_dropout,
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_param,
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params[o],
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0}. Reviewing Adam hyper-parameters.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(o+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 27.83 minutes.
Start time: 2021-02-14, 14:22:26
End time: 2021-02-14, 14:50:17




#### Assessing results

In [49]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
optimizers = []
input_dropouts = []
learning_rates = []
beta1s = []
beta2s = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over optimizer parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Reviewing Adam hyper-parameters.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by Adam parameters:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta_1': beta1s,
    'beta_2': beta2s,
    'weights_init': weights_inits,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,optimizer,learning_rate,beta_1,beta_2,weights_init,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
3,1613317710,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.75,0.9999,glorot_uniform,0.852777,0.004395,0.261098,0.008003,194.019065,32.626574,30.37
1,1613314316,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.5,0.999,glorot_uniform,0.852422,0.003781,0.278219,0.007551,225.45912,36.844418,28.62
4,1613319533,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.5,0.9999,glorot_uniform,0.852271,0.003659,0.277869,0.008265,232.94979,33.620443,32.45
0,1613312591,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.75,0.999,glorot_uniform,0.852164,0.004125,0.261219,0.007224,206.608472,36.159437,28.73
5,1613323346,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.25,0.9999,glorot_uniform,0.846702,0.004362,0.271481,0.007765,194.128934,34.963586,27.83
2,1613316034,J1 = (num_inputs + num_outputs)*0.1.. Reviewin...,1,[209],tanh,adam,0.001,0.25,0.999,glorot_uniform,0.845258,0.004496,0.270069,0.008576,187.987971,31.489725,27.93


<a id='regul_param_grid_search'></a>

## Grid search for regularization parameter

Previous tests have been using a random sample with 50% of training data, which has accelerated the assessment of best choices for definitions and hyper-parameters such as: activation and cost functions, architecture (number of hidden layers and number of their neurons), fitting hyper-parameters (mini-batch size), dropout parameters (input and hidden), regularization parameter, optimizer parameters, and random distribution for weights initialization.
<br>
<br>
Now, it is time to estimate fully connected feedforward neural networks using the entire training data. Besides, the regularization parameter will be defined from a grid search based on the evaluation of performance metrics on validation data. Lastly, a final model will be estimated making use of early stopping to determine the optimal number of epochs.
<br>
<br>
The setting for estimation is given by definitions and values of hyper-parameters previously picked as the best choices.
* Activation function for hidden neurons: tanh.
* Cost function: cross-entropy.
* Arquitecture: one hidden layer with the number of neurons given by: $J_1 = (num\_inputs + num\_outputs)*0.1$.
* Mini-batch size: $S = 512$.
* Number of epochs: $T = 10$, in order to simplify tests.
* No early stopping so far.
* Dropout parameters: $\rho_{input} = \rho_{hidden} = 0$.
* L2 regularization parameter: defined through grid search, $\lambda \in [1e^{-8}, 1e^{-7}, 1e^{-6}, 1e^{-5}, 1e^{-4}, 0]$.
* Optimizer: Adam.
* Optimizer hyper-parameters: $\eta = 0.001$, $\beta_1 = 0.5$, $\beta_2 = 0.9999$.
* Random distribution for weights initialization: Glorot Uniform.
<br>
<br>
* Conclusions will be guided by performance metrics evaluated on validation data.
* Averaging: a collection of 100 estimations will be implemented, so performance metrics can be assessed in terms of average and standard deviation.

In [34]:
# Converting data from dataframes into nd-arrays:
X_train = df_train_scaled.drop(drop_vars, axis=1).values
y_train = df_train_scaled['y'].values

X_val = df_val_scaled.drop(drop_vars, axis=1).values
y_val = df_val_scaled['y'].values

#### Setting

In [35]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1.'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 10
batch_size = 512
es_param = None
regularization = 'l2'
input_dropout = 0
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.5,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

#### Grid of values

In [36]:
regul_params = [1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0]
regul_params

[1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0]

#### Estimation loop

In [38]:
start_time = datetime.now()

test_bar = progressbar.ProgressBar(maxval=len(regul_params),
                                   widgets=['\033[1mTest progress:\033[0m ',
                                   progressbar.Bar('-', '[', ']'), ' ',
                                   progressbar.Percentage()])

test_bar.start()

# Loop over regularization parameters:
for r in range(len(regul_params)):
    estimation_id = str(int(time.time()))

    nn_start_time = datetime.now()
    
    # Lists to store results:
    min_cost = []
    epoch_min_cost = []
    val_min_cost = []
    epoch_min_cost = []
    val_roc_auc = []
    val_avg_prec_score = []
    val_brier_score = []

    # Loop over estimations:
    for t in range(n_estimations):
        # Creating neural network object, declaring its architecture and defining hyper-parameters:
        model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                         output_activation = output_activation, cost_function = cost_function,
                         num_epochs = num_epochs, batch_size = batch_size,
                         default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                         regularization = regularization, regul_param = regul_params[r],
                         input_dropout = input_dropout,
                         weights_init = weights_init)

        # Training the model:
        model.run(train_inputs = X_train, train_output = y_train,
                  val_inputs = X_val, val_output = y_val,
                  verbose = 0)

        # Performance metrics on validation data:
        val_roc_auc.append(roc_auc_score(y_val, [p[0] for p in model.predictions]))
        val_avg_prec_score.append(average_precision_score(y_val, [p[0] for p in model.predictions]))
        val_brier_score.append(brier_score_loss(y_val, [p[0] for p in model.predictions]))
        
        # Cost function by training epoch:
        model_costs = model.model_costs
        
        min_cost.append(model_costs.loss.min())
        epoch_min_cost.append(model_costs.loss.idxmin() + 1)
        val_min_cost.append(model_costs.val_loss.min())
        epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)

    # Assessing running time:
    nn_end_time = datetime.now()

    # Dictionary with information on model structure and performance:
    model_assessment[estimation_id] = {
        'architecture': {
            'num_hidden_layers': len(model_architecture),
            'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
            'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
            'output_activation': output_activation,
            'cost_function': cost_function,
        },
        'hyper_parameters': {
            'num_epochs': num_epochs,
            'batch_size': batch_size,
            'es_param': es_param,
            'regularization': regularization,
            'regul_param': regul_params[r],
            'input_dropout': input_dropout,
            'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
            'default_adam': default_adam,
            'optimizer': optimizer,
            'opt_params': opt_params,
            'weights_init': weights_init
        },
        'n_estimations': n_estimations,
        'performance_metrics': {
            'application': 'validation',
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_min_cost': np.nanmean(min_cost),
            'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
            'avg_roc_auc': np.nanmean(val_roc_auc),
            'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
            'avg_brier_score': np.nanmean(val_brier_score),
            'std_roc_auc': np.nanstd(val_roc_auc),
            'std_avg_prec_score': np.nanstd(val_avg_prec_score),
            'std_brier_score': np.nanstd(val_brier_score)
        },
        'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
        "comment": '{0} Grid search for L2 regularization parameter.'.format(model_architecture_def)
    }
    
    if export:
        with open('Datasets/model_assessment.json', 'w') as json_file:
            json.dump(model_assessment, json_file, indent=2)
    
    test_bar.update(r+1)
    sleep(0.01)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

[1mTest progress:[0m [---------------------------------------------------------] 100%

------------------------------------
[1mOverall running time:[0m 70.4 minutes.
Start time: 2021-02-16, 23:22:21
End time: 2021-02-17, 00:32:45




#### Assessing results

In [20]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
activations = []
regul_params = []
weights_inits = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over regularization parameters:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Grid search for L2 regularization parameter.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    regul_params.append(e['hyper_parameters']['regul_param'])
    weights_inits.append(e['hyper_parameters']['weights_init'])
    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/e['performance_metrics']['std_roc_auc'])
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/e['performance_metrics']['std_avg_prec_score'])
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by regularization parameter:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'activation_function': activations,
    'weights_init': weights_inits,
    'regul_param': regul_params,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,activation_function,weights_init,regul_param,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
3,1613507413,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,1e-05,0.937422,0.001591,0.453413,0.006286,589.276382,72.132531,67.57
2,1613473761,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,1e-06,0.93702,0.001405,0.45368,0.00639,667.111547,71.003575,80.87
0,1613432541,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,1e-08,0.936917,0.00155,0.453971,0.004623,604.356581,98.199658,81.55
4,1613515311,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,0.0001,0.936823,0.001285,0.427436,0.009849,729.222978,43.397373,64.23
1,1613441057,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,1e-07,0.93679,0.001402,0.453518,0.004802,668.135151,94.44313,64.45
5,1613528541,J1 = (num_inputs + num_outputs)*0.1. Grid sear...,1,[209],tanh,glorot_uniform,0.0,0.936585,0.001596,0.453855,0.004928,586.87305,92.094651,70.4


<a id='final_estimation'></a>

## Final estimation with early stopping

The last hyper-parameter that needs to be specified is the number of epochs, $T$. As introducted earlier, the strategy here is to estimate the model using training data and then to monitor performance metrics using validation data. When the validation performance decreases ($\delta = 0$) after $P^*$ consecutive epochs, the model stops training. The ensemble of models to be saved can be either those that have maximized the validation performance at each iteration or the last updated model at each iteration.
<br>
<br>
In a first moment, a sensitivity analysis will occur in order to define the patiencie parameter, i.e., the early stopping parameter $P^*$. Then, the final estimation will take place leading to an ensemble of models that constitute the reference for the current dataset of the fully connected feedforward neural network paradigm.
<br>
<br>
The setting for estimation is given by definitions and values of hyper-parameters previously picked as the best choices.
* Activation function for hidden neurons: tanh.
* Cost function: cross-entropy.
* Arquitecture: one hidden layer with the number of neurons given by: $J_1 = (num\_inputs + num\_outputs)*0.1$.
* Mini-batch size: $S = 512$.
* Number of epoch defined through early stopping with $(\delta, P^*) = (0, P^*)$.
* Dropout parameters: $\rho_{input} = \rho_{hidden} = 0$.
* L2 regularization parameter: $\lambda = 1e^{-6}$.
* Optimizer: Adam.
* Optimizer hyper-parameters: $\eta = 0.001$, $\beta_1 = 0.5$, $\beta_2 = 0.9999$.
* Random distribution for weights initialization: Glorot Uniform.
<br>
<br>
* Conclusions will be guided by performance metrics evaluated on validation data.

In [34]:
# Converting data from dataframes into nd-arrays:
X_train = df_train_scaled.drop(drop_vars, axis=1).values
y_train = df_train_scaled['y'].values

X_val = df_val_scaled.drop(drop_vars, axis=1).values
y_val = df_val_scaled['y'].values

X_test = df_test_scaled.drop(drop_vars, axis=1).values
y_test = df_test_scaled['y'].values

#### Setting

In [35]:
# Number of estimations:
n_estimations = 100

# Model architecture:
model_architecture = {1: {'neurons': int(np.floor((X_train.shape[1] + 1)*0.1)),
                          'activation': 'tanh',
                          'dropout_param': 0.1}}
model_architecture_def = 'One hidden layer: J1 = (num_inputs + num_outputs)*0.1.'

# Functions:
output_activation = 'sigmoid'
cost_function = 'binary_crossentropy'

# Hyper-parameters:
num_epochs = 200
batch_size = 512
es_param = {'min_delta': 0, 'patience': 40, 'consecutive_patience': False}
regularization = 'l2'
regul_param = 1e-06
input_dropout = 0
weights_init = 'glorot_uniform'

# Defining the optimizer:
default_adam = False
optimizer = 'adam'
opt_params = {
    'learning_rate': 0.001,
    'beta_1': 0.5,
    'beta_2': 0.9999,
    'epsilon': 1e-07
}

# Declare whether to export models:
export_model = False

# Declare whether the best model (True) or the last model (False) should be exported:
export_best_model = False

#### Early stoppping estimation

In [None]:
start_time = datetime.now()

estimation_id = str(int(time.time()))
if export_model:
    os.makedirs('Models/{0}'.format(estimation_id))

nn_start_time = datetime.now()

# Lists to store results:
min_cost = []
epoch_min_cost = []
val_min_cost = []
val_epoch_min_cost = []
effective_num_epochs = []
test_roc_auc = []
test_avg_prec_score = []
test_brier_score = []

for t in range(n_estimations):
    # Creating neural network object, declaring its architecture and defining hyper-parameters:
    model = KerasNN(model_architecture = model_architecture, num_inputs = X_train.shape[1],
                     output_activation = output_activation, cost_function = cost_function,
                     num_epochs = num_epochs, batch_size = batch_size,
                     default_adam = default_adam, optimizer = optimizer, opt_params = opt_params,
                     regularization = regularization, regul_param = regul_param,
                     input_dropout = input_dropout,
                     weights_init = weights_init)

    # Training the model:
    model.run(train_inputs = X_train, train_output = y_train,
              val_inputs = X_val, val_output = y_val,
              test_inputs = X_test, test_output = y_test,
              verbose = 0, early_stopping = True, es_params = es_param, save_best_model = export_best_model)

    # Performance metrics on validation data:
    test_roc_auc.append(roc_auc_score(y_test, [p[0] for p in model.predictions['test']]))
    test_avg_prec_score.append(average_precision_score(y_test, [p[0] for p in model.predictions['test']]))
    test_brier_score.append(brier_score_loss(y_test, [p[0] for p in model.predictions['test']]))

    # Effective number of epochs:
    effective_num_epochs.append(len(model.epoch_performance['epoch_val_roc_auc']))

    # Cost function by training epoch:
    model_costs = model.model_costs

    min_cost.append(model_costs.loss.min())
    epoch_min_cost.append(model_costs.loss.idxmin() + 1)
    val_min_cost.append(model_costs.val_loss.min())
    val_epoch_min_cost.append(model_costs.val_loss.idxmin() + 1)
    
    # Exporting the model:
    if export_model:
        if export_best_model:
            model.best_model.save('Models/{0}/best_model_{1}.h5'.format(estimation_id, t+1))
        else:
            model.model.save('Models/{0}/model_{1}.h5'.format(estimation_id, t+1))

# Assessing running time:
nn_end_time = datetime.now()

# Dictionary with information on model structure and performance:
model_assessment[estimation_id] = {
    'architecture': {
        'num_hidden_layers': len(model_architecture),
        'num_hidden_neurons': [model_architecture[l]['neurons'] for l in model_architecture.keys()],
        'hidden_activations': [model_architecture[l]['activation'] for l in model_architecture.keys()],
        'output_activation': output_activation,
        'cost_function': cost_function,
    },
    'hyper_parameters': {
        'num_epochs': num_epochs,
        'avg_effective_num_epochs': np.nanmean(effective_num_epochs),
        'batch_size': batch_size,
        'es_param': es_param,
        'regularization': regularization,
        'regul_param': regul_param,
        'input_dropout': input_dropout,
        'hidden_dropout': [model_architecture[l]['dropout_param'] for l in model_architecture.keys()],
        'default_adam': default_adam,
        'optimizer': optimizer,
        'opt_params': opt_params,
        'weights_init': weights_init,
    },
    'n_estimations': n_estimations,
    'performance_metrics': {
        'application': 'test',
        'avg_min_cost': np.nanmean(min_cost),
        'avg_epoch_min_cost': np.nanmean(epoch_min_cost),
        'avg_min_val_cost': np.nanmean(val_min_cost),
        'avg_epoch_min_val_cost': np.nanmean(val_epoch_min_cost),
        'avg_roc_auc': np.nanmean(val_roc_auc),
        'avg_avg_prec_score': np.nanmean(val_avg_prec_score),
        'avg_brier_score': np.nanmean(val_brier_score),
        'std_roc_auc': np.nanstd(val_roc_auc),
        'std_avg_prec_score': np.nanstd(val_avg_prec_score),
        'std_brier_score': np.nanstd(val_brier_score)
    },
    'running_time': str(round(((nn_end_time - nn_start_time).seconds)/60, 2)) + ' minutes',
    "comment": '{0} Final estimation.'.format(model_architecture_def)
}

if export:
    with open('Datasets/model_assessment.json', 'w') as json_file:
        json.dump(model_assessment, json_file, indent=2)

# Assessing running time:
end_time = datetime.now()

print('------------------------------------')
print('\033[1mOverall running time:\033[0m ' + str(round(((end_time - start_time).seconds)/60, 2)) +
      ' minutes.')
print('Start time: ' + start_time.strftime('%Y-%m-%d') + ', ' + start_time.strftime('%H:%M:%S'))
print('End time: ' + end_time.strftime('%Y-%m-%d') + ', ' + end_time.strftime('%H:%M:%S'))
print('\n')

#### Assessing results

In [85]:
estimation_ids = []
archs = []
num_layers = []
num_neurons = []
batch_sizes = []
effective_num_epochs = []
activations = []
regularizations = []
regul_params = []
input_dropouts = []
hidden_dropouts = []
weights_inits = []
optimizers = []
learning_rates = []
beta1s = []
beta2s = []
min_deltas = []
patiences = []
consecutive_patiences = []
avg_roc_auc = []
std_roc_auc = []
avg_prec = []
std_prec = []
ratio_roc_auc = []
ratio_prec = []
running_time = []

# Loop over estimations:
for e in [model_assessment[e] for e in model_assessment.keys() if
          ('Final estimation.' in model_assessment[e]['comment'])]:
    estimation_ids.append(list(model_assessment.keys())[list(model_assessment.values()).index(e)])
    archs.append(e['comment'].split(': ')[1])
    num_layers.append(e['architecture']['num_hidden_layers'])
    num_neurons.append(e['architecture']['num_hidden_neurons'])
    batch_sizes.append(e['hyper_parameters']['batch_size'])
    effective_num_epochs.append(e['hyper_parameters']['avg_effective_num_epochs'])
    
    activations.append(np.unique(e['architecture']['hidden_activations'])[0])
    
    regularizations.append(e['hyper_parameters']['regularization'])
    regul_params.append(e['hyper_parameters']['regul_param'])
    input_dropouts.append(e['hyper_parameters']['input_dropout'])
    hidden_dropouts.append(e['hyper_parameters']['hidden_dropout'])
    
    weights_inits.append(e['hyper_parameters']['weights_init'])
    
    optimizers.append(e['hyper_parameters']['optimizer'])
    learning_rates.append(e['hyper_parameters']['opt_params']['learning_rate'])
    beta1s.append(e['hyper_parameters']['opt_params']['beta_1'])
    beta2s.append(e['hyper_parameters']['opt_params']['beta_2'])
    
    min_deltas.append(e['hyper_parameters']['es_param']['min_delta'])
    patiences.append(e['hyper_parameters']['es_param']['patience'])
    consecutive_patiences.append(e['hyper_parameters']['es_param']['consecutive_patience'])

    avg_roc_auc.append(e['performance_metrics']['avg_roc_auc'])
    std_roc_auc.append(e['performance_metrics']['std_roc_auc'])
    avg_prec.append(e['performance_metrics']['avg_avg_prec_score'])
    std_prec.append(e['performance_metrics']['std_avg_prec_score'])
    ratio_roc_auc.append(e['performance_metrics']['avg_roc_auc']/(e['performance_metrics']['std_roc_auc'] + 1e-7))
    ratio_prec.append(e['performance_metrics']['avg_avg_prec_score']/(e['performance_metrics']['std_avg_prec_score'] + 1e-7))
    running_time.append(float(e['running_time'].split(' minutes')[0]))
    
# Dataframe with performance metrics by estimation:
metrics = pd.DataFrame(data={
    'estimation_id': estimation_ids,
    'architecture': archs,
    'num_layers': num_layers,
    'num_neurons': num_neurons,
    'batch_size': batch_sizes,
    'effective_num_epochs': effective_num_epochs,
    'activation_function': activations,
    'regularization': regularizations,
    'regul_param': regul_params,
    'input_dropout': input_dropouts,
    'hidden_dropout': hidden_dropouts,
    'weights_init': weights_inits,
    'optimizer': optimizers,
    'learning_rate': learning_rates,
    'beta1': beta1s,
    'beta2': beta2s,
    'min_delta': min_deltas,
    'patience': patiences,
    'consecutive_patience': consecutive_patiences,
    'avg_roc_auc': avg_roc_auc,
    'std_roc_auc': std_roc_auc,
    'avg_prec': avg_prec,
    'std_prec': std_prec,
    'ratio_roc_auc': ratio_roc_auc,
    'ratio_prec': ratio_prec,
    'running_time': running_time
})

metrics.sort_values('avg_roc_auc', ascending=False)

Unnamed: 0,estimation_id,architecture,num_layers,num_neurons,batch_size,effective_num_epochs,activation_function,regularization,regul_param,input_dropout,...,min_delta,patience,consecutive_patience,avg_roc_auc,std_roc_auc,avg_prec,std_prec,ratio_roc_auc,ratio_prec,running_time
0,1615054094,J1 = (num_inputs + num_outputs)*0.1. Final est...,1,[209],512,89.0725,tanh,l2,1e-06,0,...,0,40,False,0.938703,0.003113,0.457992,0.0131,301.538196,34.959909,


#### Performance metrics by epoch of training

In [49]:
# Create figure:
fig = make_subplots(specs=[[{'secondary_y': True}]])

# Create the plot (first axis):
fig.add_trace(
    go.Scatter(x=[i+1 for i in range(len(model.epoch_performance['epoch_val_roc_auc']))],
               y=model.epoch_performance['epoch_val_roc_auc'], name='Val ROC-AUC',
               hovertemplate =
                'ROC-AUC = %{y:.4f}<br>'+
                'epoch = %{x}<br>'
              ),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=[i+1 for i in range(len(model.epoch_performance['epoch_val_avg_prec_score']))],
               y=model.epoch_performance['epoch_val_avg_prec_score'], name='Val avg precision',
               hovertemplate = 'Avg precision = %{y:.4f}<br>'+
                               'epoch = %{x}<br>',
               marker_color='orange',
               mode='lines'
              ),
    secondary_y=True,
)

# Changing layout:
fig.update_layout(
    title_text='Validation performance by epoch of training<br>' +
#                'Patience parameter = {0}<br>'.format(es_param['patience']) +
               'Effective number of epochs = {0}'.format(int(effective_num_epochs[-1])),
    width=700,
    height=400
)

# Set labels:
fig.update_xaxes(title_text='epoch')
fig.update_yaxes(title_text='performance', secondary_y=False)

fig.show()