# Results

In this notebook a summary of the best results that have been obtained so far is displayed.

## Setup

In [1]:
import os
import sys
if os.getcwd().endswith('notebooks'):
    project_path = os.path.abspath(os.path.join('..'))
    if project_path not in sys.path:
        sys.path.append(project_path)
    os.chdir(os.pardir)

from src.models import train
from src.models.model_utils import load_dataset, save_model, load_model,\
                            redefine_labels, standardize_per_example
from src.models import spectrogram_models
from src.models import signal_models
from src.models import scalogram_models

In [2]:
from tensorflow.keras.utils import plot_model

## Best results

The metric used to assess the performance of a model is the F1 score:

$$ \mathrm{F1 score} = 2 \dfrac{\mathrm{precision \cdot recall}}{\mathrm{precision} + \mathrm{recall}} $$

Here is a summary of the best results so far, extracted from the ```results``` CSV file.

| Model      | Disease | Dataset            | Epochs | Learning rate | Batch size | Train f1 | Dev f1 | Test f1 | Name                            |
|------------|---------|--------------------|--------|---------------|------------|----------|--------|---------|---------------------------------|
| sig_CNN_1  | als     | signals-dataset    | 300    | 0.0001        | 64         | 0.8887   | 0.7833 | 1.0000  | sig_CNN_1_als_201904201054      |
| scal_CNN_1 | control | scalograms-dataset | 300    | 0.0001        | 64         | 1.0000   | 0.9256 | 0.9148  | scal_CNN_1_control_201904201052 |
| scal_CNN_1 | hunt    | scalograms-dataset | 300    | 0.0001        | 64         | 0.9980   | 0.8398 | 1.0000  | scal_CNN_1_hunt_201904201042    |
| sig_CNN_1  | park    | signals-dataset    | 500    | 0.0001        | 64         | 0.9087   | 0.8978 | 0.0000  | sig_CNN_1_park                  |

It is surprising that in some cases, test performance is better than performance on dev and train sets.

However, it should be noted that the test set, which is made up of one patient per disease, might be to small to give an accurate estimate of the models' performance on unseen examples. In fact, if the models shown above are retrained with the same configuration the test performance might not be exactly the same.

Now let's see the two architectures that have given the best results:

## One-dimensional convolutional model ```sig_CNN_1```

This architecture takes the normalized (and downsampled) 1D signal segments of length 900.

In [3]:
# visualize models
model = signal_models.sig_CNN_1()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d (Conv1D)              (None, 891, 32)           352       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 882, 32)           10272     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 294, 32)           0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 285, 64)           20544     
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 276, 64)           41024     
_________________________________________________________________
global_average_pooling1d (Gl (None, 64)                0         
_________________________________________________________________
dropout (Dropout)            (None, 64)                0         
__________

In [5]:
plot_model(model, to_file='img/sig_CNN_1.png', show_shapes=True, show_layer_names=True)

Here is a visualization of the model:
<img src="../img/sig_CNN_1.png" alt="model1D" width="300"/>

## Two-dimensional convolutional model ```scal_CNN_1```

This architecture takes as input the one-channel 100x100 images representing the signal scalogram.

In [8]:
model = scalogram_models.scal_CNN_1()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 100, 100, 32)      832       
_________________________________________________________________
activation (Activation)      (None, 100, 100, 32)      0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 50, 50, 64)        51264     
_________________________________________________________________
activation_1 (Activation)    (None, 50, 50, 64)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 25, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 25, 25, 64)        102464    
__________

In [9]:
plot_model(model, to_file='img/scal_CNN_1.png', show_layer_names=True, show_shapes=True)

Here is a visualization of the model:
<img src="../img/scal_CNN_1.png" alt="model2D" width="300"/>

## Remarks

Some of the **conclusions** from these results, as well as from the performance obtained in other experiments, are:

1. The 2D convolutional models trained on scalograms (particularly the scal_CNN_1 model) work relatively well for predicting Control and Huntington’s disease.

2. The 1D convolutional model trained on signal segments works decently predicting ALS. Its performance is comparable to the same model's performance predicting other classes.

3. None of the models trained on spectrograms yields better performance than the models trained on scalograms

4. No model right now is able to accurately classify Parkinson’s disease on the test set.

5. Training for more epochs does not help performance on test set, it only further overfits the train set.