# Offline decoding

In this notebook, I compare the performance that multiple decoders achieve on the same test set. Every test will be repeated K (standard = 10) times changing the training set at every iteration. The K training sets will be the ones also used to tune the hyperparameters. That step is described in the notebook: [decoders_hyperparameters_optimization](./decoders_hyperparameters_optimization.ipynb).
 - The first few sections (1-3) just import packages, load the files, and preprocess them
 - Section 4 test the performance on the test set of the 5 decoders: DNN, RNN, GRU, LSTM, CNN, EEGNet
 - Section 5 display plots to compare the outcomes

## 1. Import Packages

Below, we import both standard packages, and functions from the accompanying .py files

In [1]:
#Adding working directory to python path
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
#Import standard packages
import numpy as np
np.random.seed(27) # Seed is important for having the same test set for performance comparison
import pickle as pkl
import json
from utils.functions import *
from math import ceil

#Import tensorflow
import tensorflow as tf
tf.get_logger().setLevel('ERROR')
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

#Import graphics
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline
import seaborn as sns

#Import function to laod and process dataset
from utils.data_processing import * 
from sklearn.preprocessing import LabelEncoder

#Import metrics
from sklearn.metrics import accuracy_score, r2_score, log_loss

#Import decoder functions
from utils.decoders import *

In [3]:
#Turn off deprecation warnings

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

## 2. Load Data
Neural data collected as numpy array of windows "number of neurons" x "number of time bins", where each entry is the firing rate of a given neuron in a given time bin. To each window is associated the object grasped in that trial.

### 2A. User Inputs
It is possible to define the file (i.e. the monkey) and the epoch to use as input of the decoder. Moreover, using the class ObjectSelector it is possible to define which objects to include and whether to group all the different sizes in a single class. K is the number of repetitions.

In [15]:
FILE = 'MRec40'  # MRec40, ZRec50 or ZRec50_Mini
PATH = f'../data/Objects Task DL Project/{FILE}.neo.mat' 
EPOCH = 'cue'
K = 5

selector = ObjectSelector()
new_classes = selector.get_non_special(group_labels=False)

### 2B. Load data

In [16]:
measurements, objects = load_dataset(PATH, EPOCH)

## 3. Preprocess Data
It is possible to define wheter to apply one-hot encoding or to normalize the number of elements for each class in the dataset.

In [17]:
# Preprocessing measurements
label_encoder = LabelEncoder()
X, Y = preprocess_dataset(measurements, objects, labelled_classes=new_classes, one_hot_encoder=label_encoder, norm_classes=False)

outputs = Y.shape[1]
(_, channels, window) = X.shape
outputs = Y.shape[1]

print('X shape and Y shape')
print(X.shape, Y.shape)
unique_y, n_repetition = np.unique(label_encoder.inverse_transform(Y.argmax(axis=1)), return_counts=True, axis=0)
print('object ID and repetition in the dataset:' )
for elem in zip(unique_y, n_repetition):
    print(elem, end='\t')

X shape and Y shape
(432, 552, 10) (432, 36)
object ID and repetition in the dataset:
('21', 12)	('22', 12)	('23', 12)	('24', 12)	('25', 12)	('26', 12)	('31', 12)	('32', 12)	('33', 12)	('34', 12)	('35', 12)	('36', 12)	('41', 12)	('42', 12)	('43', 12)	('44', 12)	('45', 12)	('46', 12)	('51', 12)	('52', 12)	('53', 12)	('54', 12)	('55', 12)	('56', 12)	('61', 12)	('62', 12)	('63', 12)	('64', 12)	('65', 12)	('66', 12)	('71', 12)	('72', 12)	('73', 12)	('74', 12)	('75', 12)	('76', 12)	

### 3B. Split into training / testing / validation sets
Note that hyperparameters should be determined using a separate validation set. 
Then, the goodness of fit should be be tested on a testing set (separate from the training and validation sets).

In [18]:
(x_train, y_train), (x_val, y_val), (x_test, y_test) = split_sets(X, Y, tr_split=0.7, val_split=0.15, repetitions=K)
print(f'number of repetitions K: {K}')
print(f'train: {K}*{np.array(x_train).shape[1:]} -- validation: {K}*{np.array(x_val).shape[1:]} -- test: {np.array(x_test).shape}')

number of repetitions K: 5
train: 5*(302, 552, 10) -- validation: 5*(65, 552, 10) -- test: (65, 552, 10)


## 4. Optimize Hyperparameters of decoders using "Hyperopt"
The general idea is that we will try to find the decoder hyperparameters that produce the highest accuracy score on the validation set.

In [19]:
def get_decoder(n_samples, n_channels, n_outputs, decoder_name, config_dict):
    (my_network, my_params) = (None, None)

    if decoder_name == 'DenseNN':
        my_params = config_dict['dnn']
        my_network = DenseNNClassification(n_samples, n_channels, n_outputs,
                                           units=int(my_params['num_units']),
                                           dropout=my_params['frac_dropout'])

    if decoder_name == 'SimpleRNN':
        my_params = config_dict['rnn']
        my_network = SimpleRNNClassification(n_samples, n_channels, n_outputs,
                                             units=int(my_params['num_units']),
                                             dropout=my_params['frac_dropout'])

    if decoder_name == 'GRU':
        my_params = config_dict['gru']
        my_network = GRUClassification(n_samples, n_channels, n_outputs,
                                       units=int(my_params['num_units']),
                                       dropout=my_params['frac_dropout'])

    if decoder_name == 'LSTM':
        my_params = config_dict['lstm']
        my_network = LSTMClassification(n_samples, n_channels, n_outputs,
                                        units=int(my_params['num_units']),
                                        dropout=my_params['frac_dropout'])

    if decoder_name == 'CNN':
        my_params = config_dict['cnn']
        my_network = CNNClassification(n_samples, n_channels, n_outputs,
                                       filters=int(my_params['num_filters']),
                                       size=(int(my_params['kernel_size_1']),
                                            int(my_params['kernel_size_2'])),
                                       dropout=my_params['frac_dropout'],
                                       pool_size=2)

    if decoder_name == 'EEGNet':
        # my_params = config_dict['eeg_net']
        my_params = {'fac_dropout': 0.1, 'n_epochs': 10, 'batch_size': 6}
        my_network = EEGNet(dropout=my_params['fac_dropout'])

    if decoder_name == 'EEGNetv2':
        my_params = config_dict['eegnet2']
        my_network = EEGNetv2(n_channels, n_outputs,
                              filters=[my_params['n_filters_1'], my_params['n_filters_2'], my_params['n_filters_3']],
                              filters_size=[int(my_params['size_1']), int(my_params['size_2']), int(my_params['size_3'])],
                              dropout=my_params['frac_dropout'],
                              units=int(my_params['n_units']),
                              neurons=my_params['n_neurons'])

    return my_network, my_params

In [20]:
networks_to_try = ['DenseNN', 'SimpleRNN', 'GRU', 'LSTM', 'CNN', 'EEGNetv2']
with open('../utils/hyperparameters.json', 'r') as f:
    config = json.load(f)

In [None]:
for net_name in networks_to_try:
    print(f'### {net_name} ###')
    network, params = get_decoder(window, channels, outputs, net_name, config)
    #network.model.summary()

    test_accuracy = []
    test_loss = []
    print('Repetition: ', end='\t')
    for k in range(K):
        network.reset_weights()
        network.fit(x_train[k], y_train[k], num_epochs=int(params['n_epochs']),
                    batch_size=int(params['batch_size']))
        prediction = network.predict(x_test)
        test_accuracy.append(accuracy_score(y_true=y_test.argmax(axis=1), y_pred=prediction.argmax(axis=1)))
        test_loss.append(log_loss(y_test, prediction))
        print(f'{k + 1}/{K} [{round(test_accuracy[-1], 3)}]', end='\t')
    test_accuracy = np.array(test_accuracy)
    test_loss = np.array(test_loss)
    print(f'\nAccuracy [mean | std] : {test_accuracy.mean()} | {test_accuracy.std()}')
    print(f'Loss [mean | std] : {test_loss.mean()} | {test_loss.std()}\n')

### DenseNN ###
Repetition: 	1/5 [0.215]	2/5 [0.169]	3/5 [0.185]	4/5 [0.185]	5/5 [0.185]	
Accuracy [mean | std] : 0.18769230769230769 | 0.015073783032511869
Loss [mean | std] : 2.346352003904489 | 0.04533938683421757

### SimpleRNN ###
Repetition: 	1/5 [0.123]	2/5 [0.108]	3/5 [0.138]	4/5 [0.185]	5/5 [0.138]	
Accuracy [mean | std] : 0.13846153846153847 | 0.02574338543181771
Loss [mean | std] : 2.476368814615103 | 0.05254124487756506

### GRU ###
Repetition: 	1/5 [0.231]	2/5 [0.231]	3/5 [0.154]	4/5 [0.292]	5/5 [0.292]	
Accuracy [mean | std] : 0.24000000000000005 | 0.05111768531026508
Loss [mean | std] : 2.0739410378382757 | 0.057762445793382786

### LSTM ###
Repetition: 	