# 4. Validation (& Testing)

When building a predicitive pipeline, there are a lot of hyperparameters which one can choose from. During validation we try to asses whether we have found a suitable of hyperparameters. The basic idea of any validation method is to split the training data into a two separate datasets, one of which is called the validation set. 

During training, the validation set is kept seperate from the training data and, once training is completed, is fed through the trained network in order to get predictions. This validation score will give you an intuition, whether the change in hyperparameters you have made has led to an improvement in predictive performance.

Within this notebook we will introduce to you four popular methods on how to perform validation within the Human Activity Recognition community.

These methods are:
1. Train-valid split
2. k-fold cross-validation
3. Per-participant cross-validation
4. Cross-participant cross-validation

Solely relying and tuning based on the validation scores would inherit what is called overfitting. Overfitting means that your trained model would be too well optimized on the validation set and thus not general anymore resulting in bad prediction performance on unseen data. Assessing how your model would perform on completly unseen data is called testing. The test set is a separate dataset which is kept separate from the training and validation loop. It is only used to gain insights on the predictive performance of the model and must not (!) be used as a reference for tuning hyperparameters.

We thus begin by splitting the original dataset into two separate datasets, one used for the validation loop and one used for testing. There are multiple ways how to split the data into the two respective datasets, for example:

- **Subject-wise:** split according to participants within the dataset. This means that we are reserving certain subjects to be included in the train + valid and test set respectively. For example, given that there are a total of 10 subjects, you could use 8 subjects for trainig + validation and 2 for testing.
- **Percentage-wise:** state how large percentage-wise your train + valid and test dataset should be compared to the full dataset. For example, you could use 80% of your data for training and validation and 20% for testing. The two splits can then be chosen to be stratified, meaning that the relative label distribution within each of the two dataset is kept the same as in the full dataset.
- **Record-wise:** state how many records should be in your train + valid and test dataset should be contained, i.e. define a cutoff point. For example, given that there are 1 million records in your full dataset, you could have the first 800 thousand records to be contained in train + valid dataset and the remaining 200 thousand records to be contained in the test dataset.

For this notebook we will be using the data of three subjects instead of just one. For simplicity purposes we will use the first two subjects for training + validation and the remaining subject for testing.

**WARNING:** splitting your dataset record-wise, percentage-wise and/ or applying shuffling during splitting (which is needed for stratified splits) will destroy your time-dependencies among the data records. To minimize this effect, apply a sliding window on top of your data before splitting. This way, time-dependencies will at least be preserved within the windows.

**WARNING FOR COLAB USERS:**  
- Set use_colab to True if you are accessing this notebook
- Change your runtime time to GPU by clicking: Runtime -> Change runtime type -> Dropdown -> GPU -> Save

In [26]:
import os
import sys

use_colab = False

module_path = os.path.abspath(os.path.join('..'))

if use_colab:
    # clone package repository
    !git clone https://github.com/mariusbock/dl-for-har.git

    # navigate to dl-for-har directory
    %cd dl-for-har/

    # get modifications made on the repo
    !git pull origin master
else:
    os.chdir(module_path)
    
# this statement is needed so that we can use the methods of the DL-ARC pipeline
if module_path not in sys.path:
    sys.path.append(module_path)

### Task 1: Loading the dataset and splitting off the test data

1. Load the 'rwhar_3sbjs_data.csv' as done in previous notebooks.
2. Split the loaded dataset into a train_valid and test dataset based on the split criteria defined above, i.e. subjects with identifiers 0 and 1 to be in the train_valid and the subject with identifier 2 to be in the test dataset.

In [22]:
import pandas as pd


# folder where the data is located and name of dataset
data_folder = 'data'
dataset = 'rwhar_3sbjs_data.csv'

# read in the data using the pandas read_csv function; define the header as done previously
data = pd.read_csv(os.path.join(data_folder, dataset), names=['subject_id', 'acc_x', 'acc_y', 'acc_z', 'activity_label'])
print(data.head())

# label dictionary needed for converting the string label names to integers 
label_dict = {
    'climbing_down': 0,
    'climbing_up': 1,
    'jumping': 2,
    'lying': 3,
    'running': 4,
    'sitting': 5,
    'standing': 6,
    'walking': 7
}

# all activity names
class_names = ['climbing_down', 'climbing_up', 'jumping', 'lying', 'running', 'sitting', 'standing', 'walking']

# replace values within the 'activity_label' column using the label_dict (use .replace())
data['activity_label'] = data['activity_label'].replace(label_dict) 

# define the train + valid data to be the data of the first two subjects
train_valid_data = data[data.subject_id <= 1]
# define the test data to be the data of the third subject
test_data = data[data.subject_id == 2]

print(train_valid_data.shape, test_data.shape)

   subject_id     acc_x      acc_y     acc_z activity_label
0           0  0.378284  10.168175  0.847547    climbing_up
1           0  0.383671  10.172364  0.849942    climbing_up
2           0  0.372298  10.181941  0.859518    climbing_up
3           0  0.342969  10.170568  0.834379    climbing_up
4           0  0.319626  10.159795  0.818817    climbing_up
(430456, 5) (228804, 5)


Within the next part we will go step by step explaining all the different validation methods there are when dealing with Human Activity Recognition. Note that all these methods are featured in the DL-ARC and can be interchanged within the main script. Throughout all experiments, we will use the configurations as listed below. As you already know some of these parameters from the training notebook, feel free to adjust the configurations. 

In [14]:
config = {
    # sliding window settings
    'sw_length': 50,
    'sw_unit': 'units',
    'sampling_rate': 50,
    'sw_overlap': 30,
    # network settings
    'nb_conv_blocks': 2,
    'conv_block_type': 'normal',
    'nb_filters': 64,
    'filter_width': 11,
    'nb_units_lstm': 128,
    'nb_layers_lstm': 1,
    'drop_prob': 0.5,
    # training settings
    'epochs': 10,
    'batch_size': 100,
    'loss': 'cross_entropy',
    'use_weights': True,
    'weights_init': 'xavier_uniform',
    'optimizer': 'adam',
    'lr': 1e-4,
    'weight_decay': 1e-6,
    # general settings (do not alter!)
    'batch_norm': False,
    'dilation': 1,
    'pooling': False,
    'pool_type': 'max',
    'pool_kernel_width': 2,
    'reduce_layer': False,
    'reduce_layer_output': 10,
    'nb_classes': 8,
    'seed': 1,
    'gpu': 'cuda:0',
    'verbose': False,
    'print_freq': 10,
    'save_gradient_plot': False,
    'print_counts': False,
    'adj_lr': False,
    'adj_lr_patience': 5,
    'early_stopping': False,
    'es_patience': 5,
    'save_test_preds': False
}

## 4.1. Train-Valid Split

The most basic validation method is to split your train + validation data, like for the test set, into two separate datasets. As mentioned above there are multiple ways how to do so. For simplicity purposes, we will use a subject-wise split within this notebook.

### Task 2: Implementing the train-valid validation loop

1. Split the train_valid dataset into the train and valid dataset. The former shall contain all data related to the subject with the identifier 0 and the latter shall contain all data related to the subject with the identifier 1.
2. Apply a sliding window on top of both datasets. You can use the predefined method 'apply_sliding_window', which is part of the DL-ARC pipeline, to do so. The funtion needs both features and labels as input and will return you a windowed version of both.
3. Using the windowed features and labels of both the train and valid set to train a model and obtain validation results.

In [3]:
import time
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, jaccard_score
from model.train import train
from model.DeepConvLSTM import DeepConvLSTM
from data_processing.sliding_window import apply_sliding_window

# needed for saving results
log_date = time.strftime('%Y%m%d')
log_timestamp = time.strftime('%H%M%S')

# split the data into train and validation data
train_data = train_valid_data[train_valid_data.subject_id == 0]
valid_data = train_valid_data[train_valid_data.subject_id == 1]

print(train_data.shape, valid_data.shape)

# apply the sliding window on top of both the train and validation data; use the "apply_sliding_window" function
# found in data_processing.sliding_window
X_train, y_train = apply_sliding_window(train_data.iloc[:, :-1], train_data.iloc[:, -1],
                                        sliding_window_size=config['sw_length'],
                                        unit=config['sw_unit'],
                                        sampling_rate=config['sampling_rate'],
                                        sliding_window_overlap=config['sw_overlap'],
                                        )

print(X_train.shape, y_train.shape)

X_valid, y_valid = apply_sliding_window(valid_data.iloc[:, :-1], valid_data.iloc[:, -1],
                                        sliding_window_size=config['sw_length'],
                                        unit=config['sw_unit'],
                                        sampling_rate=config['sampling_rate'],
                                        sliding_window_overlap=config['sw_overlap'],
                                        )

print(X_valid.shape, y_valid.shape)

# omit the first feature column (subject_identifier) from the train and validation dataset
X_train, X_valid = X_train[:, :, 1:], X_valid[:, :, 1:]

# within the config file, set the parameters 'window_size' and 'nb_channels' accordingly
# window_size = size of the sliding window in units
# nb_channels = number of feature channels
config['window_size'] = X_train.shape[1]
config['nb_channels'] = X_train.shape[2]

# define the network to be a DeepConvLSTM object; can be imported from model.DeepConvLSTM
# pass it the config object
net = DeepConvLSTM(config=config)

# convert the features of the train and validation to float32 and labels to uint8 for GPU compatibility 
X_train, y_train = X_train.astype(np.float32), y_train.astype(np.uint8)
X_valid, y_valid = X_valid.astype(np.float32), y_valid.astype(np.uint8)

# feed the datasets into the train function; can be imported from model.train
train_valid_net, val_output, train_output = train(X_train, y_train, X_valid, y_valid,
                                                  network=net, 
                                                  config=config, 
                                                  log_date=log_date,
                                                  log_timestamp=log_timestamp)

# the next bit prints out your results if you did everything correctly
cls = np.array(range(config['nb_classes']))

print('\nVALIDATION RESULTS: ')
print("\nAvg. Accuracy: {0}".format(jaccard_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Avg. Precision: {0}".format(precision_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Avg. Recall: {0}".format(recall_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Avg. F1: {0}".format(f1_score(val_output[:, 1], val_output[:, 0], average='macro')))

print("\nVALIDATION RESULTS (PER CLASS): ")
print("\nAccuracy:")
for i, rslt in enumerate(jaccard_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nPrecision:")
for i, rslt in enumerate(precision_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nRecall:")
for i, rslt in enumerate(recall_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nF1:")
for i, rslt in enumerate(f1_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
    print("   {0}: {1}".format(class_names[i], rslt))

print("\nGENERALIZATION GAP ANALYSIS: ")
print("\nTrain-Val-Accuracy Difference: {0}".format(jaccard_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                  jaccard_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Train-Val-Precision Difference: {0}".format(precision_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                   precision_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Train-Val-Recall Difference: {0}".format(recall_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                recall_score(val_output[:, 1], val_output[:, 0], average='macro')))
print("Train-Val-F1 Difference: {0}".format(f1_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                            f1_score(val_output[:, 1], val_output[:, 0], average='macro')))

(214659, 5) (168565, 5)
(6132, 50, 4) (6132,)
(4815, 50, 4) (4815,)
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Params: 302024
Applied weighted class weights: 
[1.06163435 0.99674902 6.33471074 0.85930493 0.88103448 0.81716418
 0.84046053 0.84416

## 4.2. K-Fold Cross-Validation

The k-fold cross-validation is the most popular form of cross-validation. In it the train + valid dataset is split into k folds, i.e. k chunks of data. Having the data split the training process is repeated k times with each time having one fold as being the validation set and all other folds being the training set. Results are then averaged across folds.

**Note:** It is recommended to use stratified folds, i.e. each fold has the same distribution of labels as the original full dataset. This avoids the risk, especially for unbalanced datasets, of having certain labels missing within the train dataset, which would cause the validation process to break as it would see unknown labels during prediction time. Nevertheless, as also stated above, when using stratifed splits, which inheritly require shuffling, one must always first apply the sliding window before applying the split. Doing so one can at least minimize the destroyed time-dependencies among records by at least maintaining them within each window.

### Task 3: Implementing the k-fold CV loop 

1. Define the stratified k-fold object.
2. Apply the sliding window on top of the train + valid data and omit the subject identifier column
3. Define the k-fold loop; use the split function of the stratified k-fold object to obtain indeces to split the train + valid data
4. Run the train function and add up obtained results to the accumulated result objects

In [4]:
from sklearn.model_selection import StratifiedKFold


# number of splits, i.e. folds
config['splits_kfold'] = 10

# needed for saving results
log_date = time.strftime('%Y%m%d')
log_timestamp = time.strftime('%H%M%S')

# define the stratified k-fold object; it is already imported for you
# pass it the number of splits, i.e. folds, and seed as well as set shuffling to true
skf = StratifiedKFold(n_splits=config['splits_kfold'],
                      shuffle=True,
                      random_state=config['seed'])
    
    
print(train_valid_data.shape)

# apply the sliding window on top of both the train + valid data; use the "apply_sliding_window" function
# found in data_processing.sliding_window
X_train_valid, y_train_valid = apply_sliding_window(train_valid_data.iloc[:, :-1], train_valid_data.iloc[:, -1],
                                                    sliding_window_size=config['sw_length'],
                                                    unit=config['sw_unit'],
                                                    sampling_rate=config['sampling_rate'],
                                                    sliding_window_overlap=config['sw_overlap'])

print(X_train_valid.shape, y_train_valid.shape)

# omit the first feature column (subject_identifier) from the train + valid dataset
X_train_valid = X_train_valid[:, :, 1:]

# result objects used for accumulating the scores across folds; add each fold result to these objects so that they
# are averaged at the end of the k-fold loop
kfold_accuracy = np.zeros(config['nb_classes'])
kfold_precision = np.zeros(config['nb_classes'])
kfold_recall = np.zeros(config['nb_classes'])
kfold_f1 = np.zeros(config['nb_classes'])
    
kfold_accuracy_gap = 0
kfold_precision_gap = 0
kfold_recall_gap = 0
kfold_f1_gap = 0

# k-fold validation loop; for each loop iteration return fold identifier and indeces which can be used to split
# the train + valid data into train and validation data according to the current fold
for j, (train_index, valid_index) in enumerate(skf.split(X_train_valid, y_train_valid)):
    print('\nFold {0}/{1}'.format(j + 1, config['splits_kfold']))
    
    # split the data into train and validation data; to do so, use the indeces produces by the split function
    X_train, X_valid = X_train_valid[train_index], X_train_valid[valid_index]
    y_train, y_valid = y_train_valid[train_index], y_train_valid[valid_index]
    
    # within the config file, set the parameters 'window_size' and 'nb_channels' accordingly
    # window_size = size of the sliding window in units
    # nb_channels = number of feature channels
    config['window_size'] = X_train.shape[1]
    config['nb_channels'] = X_train.shape[2]
    
    # define the network to be a DeepConvLSTM object; can be imported from model.DeepConvLSTM
    # pass it the config object
    net = DeepConvLSTM(config=config)
    
    # convert the features of the train and validation to float32 and labels to uint8 for GPU compatibility 
    X_train, y_train,  = X_train.astype(np.float32), y_train.astype(np.uint8)
    X_valid, y_valid = X_valid.astype(np.float32), y_valid.astype(np.uint8)
    
    # feed the datasets into the train function; can be imported from model.train
    kfold_net, val_output, train_output = train(X_train, y_train, X_valid, y_valid, network=net, config=config,
                                                log_date=log_date, log_timestamp=log_timestamp)
        
    # in the following validation and train evaluation metrics are calculated
    cls = np.array(range(config['nb_classes']))
    val_accuracy = jaccard_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
    val_precision = precision_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
    val_recall = recall_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
    val_f1 = f1_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
    train_accuracy = jaccard_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
    train_precision = precision_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
    train_recall = recall_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
    train_f1 = f1_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
    
    # add up the fold results
    kfold_accuracy += val_accuracy
    kfold_precision += val_precision
    kfold_recall += val_recall
    kfold_f1 += val_f1

    # add up the generalization gap results
    kfold_accuracy_gap += train_accuracy - val_accuracy
    kfold_precision_gap += train_precision - val_precision
    kfold_recall_gap += train_recall - val_recall
    kfold_f1_gap += train_f1 - val_f1
    
# the next bit prints out the average results across folds if you did everything correctly
print("\nK-FOLD VALIDATION RESULTS: ")
print("Accuracy: {0}".format(np.mean(kfold_accuracy / config['splits_kfold'])))
print("Precision: {0}".format(np.mean(kfold_precision / config['splits_kfold'])))
print("Recall: {0}".format(np.mean(kfold_recall / config['splits_kfold'])))
print("F1: {0}".format(np.mean(kfold_f1 / config['splits_kfold'])))
    
print("\nVALIDATION RESULTS (PER CLASS): ")
print("\nAccuracy:")
for i, rslt in enumerate(kfold_accuracy / config['splits_kfold']):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nPrecision:")
for i, rslt in enumerate(kfold_precision / config['splits_kfold']):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nRecall:")
for i, rslt in enumerate(kfold_recall / config['splits_kfold']):
    print("   {0}: {1}".format(class_names[i], rslt))
print("\nF1:")
for i, rslt in enumerate(kfold_f1 / config['splits_kfold']):
    print("   {0}: {1}".format(class_names[i], rslt))
    
print("\nGENERALIZATION GAP ANALYSIS: ")
print("\nAccuracy: {0}".format(kfold_accuracy_gap / config['splits_kfold']))
print("Precision: {0}".format(kfold_precision_gap / config['splits_kfold']))
print("Recall: {0}".format(kfold_recall_gap / config['splits_kfold']))
print("F1: {0}".format(kfold_f1_gap / config['splits_kfold']))

(383224, 5)
(10947, 50, 4) (10947,)

Fold 1/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Params: 302024
Applied weighted class weights: 
[0.95911215 1.77962428 5.35434783 0.77113338 1.02114428 0.75367197
 0.76681196 0.76824704]
EPOCH: 1/10 Trai

EPOCH: 10/10 Train Loss: 0.6770 Train Acc: 0.7498 Train Prec: 0.8829 Train Rcll: 0.8386 Train F1: 0.8418 Val Loss: 0.5508 Val Acc: 0.7152 Val Prec: 0.8524 Val Rcll: 0.8185 Val F1: 0.8231

Fold 4/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Para

EPOCH: 9/10 Train Loss: 0.7514 Train Acc: 0.6200 Train Prec: 0.6908 Train Rcll: 0.7467 Train F1: 0.7087 Val Loss: 0.6621 Val Acc: 0.5958 Val Prec: 0.6767 Val Rcll: 0.7277 Val F1: 0.6922
EPOCH: 10/10 Train Loss: 0.6669 Train Acc: 0.5333 Train Prec: 0.7071 Train Rcll: 0.6873 Train F1: 0.6245 Val Loss: 0.7759 Val Acc: 0.5303 Val Prec: 0.6649 Val Rcll: 0.6662 Val F1: 0.6229

Fold 7/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |


EPOCH: 8/10 Train Loss: 1.0157 Train Acc: 0.5721 Train Prec: 0.6837 Train Rcll: 0.6889 Train F1: 0.6619 Val Loss: 0.9280 Val Acc: 0.5320 Val Prec: 0.6701 Val Rcll: 0.6473 Val F1: 0.6357
EPOCH: 9/10 Train Loss: 0.9360 Train Acc: 0.5924 Train Prec: 0.6919 Train Rcll: 0.7048 Train F1: 0.6814 Val Loss: 0.8379 Val Acc: 0.5591 Val Prec: 0.6763 Val Rcll: 0.6761 Val F1: 0.6608
EPOCH: 10/10 Train Loss: 0.8660 Train Acc: 0.6318 Train Prec: 0.7171 Train Rcll: 0.7311 Train F1: 0.7131 Val Loss: 0.7649 Val Acc: 0.5985 Val Prec: 0.6997 Val Rcll: 0.7062 Val F1: 0.6951

Fold 10/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   

## 4.3. Per-Participant Cross-Validation

Per-Participant cross-validation validates the predictive perfomance of each subject individually. This means that for each subject contained in the dataset, one one validation loop is run. Usually, this done by applying a stratified shuffle split, i.e. multiple stratified train-valid splits with each time randomly shuffled records, of multiple rounds per subject. 

**Note:** as mentioned above when dealing with stratified splits, which inheritly require shuffling, and shuffling in general, one must always first apply the sliding window before applying the split. Doing so one can at least minimize the destroyed time-dependencies among records by at least maintaining them within each window.

### Task 4: Implementing the per-participant CV loop

1. Define a for loop which iterates over all subjects
2. Within the loop define a stratified shuffle split object
3. Define the subject data by filtering the train + valid data for the current subject
4. Apply the sliding window on top of the filtered subject data and omit the 'subject_identifier' column
5. Define the stratified shuffle split loop; use the split function of the stratified shuffle split object to obtain indeces to split the current subject data
6. Run the train function and add up obtained results to the accumulated result objects

In [8]:
from sklearn.model_selection import StratifiedShuffleSplit


# size of the train portion within each split and number of splits per subject
config['size_sss'] = 0.6
config['splits_sss'] = 10

# needed for saving results
log_date = time.strftime('%Y%m%d')
log_timestamp = time.strftime('%H%M%S')

# iterate over all subjects
for i, sbj in enumerate(np.unique(train_valid_data.iloc[:, 0])):
    print('\n VALIDATING FOR SUBJECT {0} OF {1}'.format(int(sbj) + 1, int(np.max(train_valid_data.iloc[:, 0])) + 1))
    
    # define the stratified shuffle split object for the current subject
    # pass it the size of the train portion of the split, number of splits and seed
    sss = StratifiedShuffleSplit(train_size=config['size_sss'],
                                 n_splits=config['splits_sss'],
                                 random_state=config['seed'])
    
    # define the subject data by filtering the train + valid dataset for the identifier of the current subject 
    subject_data = train_valid_data[train_valid_data.iloc[:, 0] == sbj]
    
    print(subject_data.shape)
    
    # apply the sliding window on top of both the subject data; use the "apply_sliding_window" function
    # found in data_processing.sliding_window 
    X_subject, y_subject = apply_sliding_window(subject_data.iloc[:, :-1], subject_data.iloc[:, -1],
                                                        sliding_window_size=config['sw_length'],
                                                        unit=config['sw_unit'],
                                                        sampling_rate=config['sampling_rate'],
                                                        sliding_window_overlap=config['sw_overlap'])

    print(X_subject.shape, y_subject.shape)
    
    # omit the first feature column (subject_identifier) from the subject data
    X_subject = X_subject[:, :, 1:]
    
    # result objects used for accumulating the scores across splits; add each split results to the objects so that
    # they are averaged at the end of the stratified shuffle split loop
    subject_accuracy = np.zeros(config['nb_classes'])
    subject_precision = np.zeros(config['nb_classes'])
    subject_recall = np.zeros(config['nb_classes'])
    subject_f1 = np.zeros(config['nb_classes'])
    
    subject_accuracy_gap = 0
    subject_precision_gap = 0
    subject_recall_gap = 0
    subject_f1_gap = 0
    
    # stratified shuffle split validation loop; for each loop iteration returns a split identifier and indeces 
    # which can be used to split the subject data into train and validation data according to the current split
    for j, (train_index, test_index) in enumerate(sss.split(X_subject, y_subject)):
        print('\nSPLIT {0}/{1}'.format(j + 1, config['splits_sss']))

        # split the data into train and validation data; to do so, use the indeces produces by the split function
        X_train, X_valid = X_subject[train_index], X_subject[test_index]
        y_train, y_valid = y_subject[train_index], y_subject[test_index]
        
        # within the config file, set the parameters 'window_size' and 'nb_channels' accordingly
        # window_size = size of the sliding window in units
        # nb_channels = number of feature channels
        config['window_size'] = X_train.shape[1]
        config['nb_channels'] = X_train.shape[2]
        
        # define the network to be a DeepConvLSTM object; can be imported from model.DeepConvLSTM
        # pass it the config object
        net = DeepConvLSTM(config=config)
        
        # convert the features of the train and validation to float32 and labels to uint8 for GPU compatibility 
        X_train, y_train,  = X_train.astype(np.float32), y_train.astype(np.uint8)
        X_valid, y_valid = X_valid.astype(np.float32), y_valid.astype(np.uint8)
        
        # feed the datasets into the train function; can be imported from model.train
        per_participant_net, val_output, train_output = train(X_train, y_train, X_valid, y_valid, network=net, 
                                                              config=config, log_date=log_date, 
                                                              log_timestamp=log_timestamp)
        
        # in the following validation and train evaluation metrics are calculated
        cls = np.array(range(config['nb_classes']))
        val_accuracy = jaccard_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
        val_precision = precision_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
        val_recall = recall_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
        val_f1 = f1_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)
        train_accuracy = jaccard_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
        train_precision = precision_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
        train_recall = recall_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
        train_f1 = f1_score(train_output[:, 1], train_output[:, 0], average=None, labels=cls)
        
        # add up the fold results
        subject_accuracy += val_accuracy
        subject_precision += val_precision
        subject_recall += val_recall
        subject_f1 += val_f1

        # add up train val gap evaluation
        subject_accuracy_gap += train_accuracy - val_accuracy
        subject_precision_gap += train_precision - val_precision
        subject_recall_gap += train_recall - val_recall
        subject_f1_gap += train_f1 - val_f1
    
    # the next bit prints out the average results per subject if you did everything correctly
    print("\nSUBJECT {0} VALIDATION RESULTS: ".format(int(sbj) + 1))
    print("Accuracy: {0}".format(np.mean(subject_accuracy / config['splits_sss'])))
    print("Precision: {0}".format(np.mean(subject_precision / config['splits_sss'])))
    print("Recall: {0}".format(np.mean(subject_recall / config['splits_sss'])))
    print("F1: {0}".format(np.mean(subject_f1 / config['splits_sss'])))
    
    print("\nVALIDATION RESULTS (PER CLASS): ")
    print("\nAccuracy:")
    for i, rslt in enumerate(subject_accuracy / config['splits_sss']):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nPrecision:")
    for i, rslt in enumerate(subject_precision / config['splits_sss']):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nRecall:")
    for i, rslt in enumerate(subject_recall / config['splits_sss']):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nF1:")
    for i, rslt in enumerate(subject_f1 / config['splits_sss']):
        print("   {0}: {1}".format(class_names[i], rslt))
    
    print("\nGENERALIZATION GAP ANALYSIS: ")
    print("\nAccuracy: {0}".format(subject_accuracy_gap / config['splits_sss']))
    print("Precision: {0}".format(subject_precision_gap / config['splits_sss']))
    print("Recall: {0}".format(subject_recall_gap / config['splits_sss']))
    print("F1: {0}".format(subject_f1_gap / config['splits_sss']))


 VALIDATING FOR SUBJECT 1 OF 2
(221621, 5)
(6331, 50, 4) (6331,)

SPLIT 1/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Params: 302024
Applied weighted class weights: 
[1.09137931 0.84325044 6.41554054 0.88079777 0.90428571 0.84325044
 0.861615

EPOCH: 10/10 Train Loss: 0.5311 Train Acc: 0.7940 Train Prec: 0.8867 Train Rcll: 0.8754 Train F1: 0.8776 Val Loss: 0.4563 Val Acc: 0.7918 Val Prec: 0.8818 Val Rcll: 0.8748 Val F1: 0.8755

SPLIT 4/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Par

EPOCH: 9/10 Train Loss: 0.5731 Train Acc: 0.7770 Train Prec: 0.8679 Train Rcll: 0.8639 Train F1: 0.8636 Val Loss: 0.4805 Val Acc: 0.7771 Val Prec: 0.8699 Val Rcll: 0.8618 Val F1: 0.8643
EPOCH: 10/10 Train Loss: 0.5475 Train Acc: 0.7816 Train Prec: 0.8708 Train Rcll: 0.8683 Train F1: 0.8680 Val Loss: 0.4553 Val Acc: 0.7868 Val Prec: 0.8737 Val Rcll: 0.8698 Val F1: 0.8706

SPLIT 7/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |

EPOCH: 8/10 Train Loss: 0.5788 Train Acc: 0.7667 Train Prec: 0.8634 Train Rcll: 0.8578 Train F1: 0.8568 Val Loss: 0.4785 Val Acc: 0.7657 Val Prec: 0.8582 Val Rcll: 0.8549 Val F1: 0.8523
EPOCH: 9/10 Train Loss: 0.5564 Train Acc: 0.7734 Train Prec: 0.8669 Train Rcll: 0.8627 Train F1: 0.8622 Val Loss: 0.4658 Val Acc: 0.7663 Val Prec: 0.8561 Val Rcll: 0.8551 Val F1: 0.8528
EPOCH: 10/10 Train Loss: 0.5171 Train Acc: 0.7662 Train Prec: 0.8618 Train Rcll: 0.8581 Train F1: 0.8563 Val Loss: 0.4643 Val Acc: 0.7598 Val Prec: 0.8505 Val Rcll: 0.8498 Val F1: 0.8464

SPLIT 10/10
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |  

EPOCH: 1/10 Train Loss: 1.9898 Train Acc: 0.2362 Train Prec: 0.4065 Train Rcll: 0.4054 Train F1: 0.3431 Val Loss: 1.7879 Val Acc: 0.2470 Val Prec: 0.4502 Val Rcll: 0.4191 Val F1: 0.3607
EPOCH: 2/10 Train Loss: 1.5166 Train Acc: 0.4667 Train Prec: 0.6045 Train Rcll: 0.5709 Train F1: 0.5608 Val Loss: 1.1623 Val Acc: 0.4469 Val Prec: 0.5814 Val Rcll: 0.5471 Val F1: 0.5375
EPOCH: 3/10 Train Loss: 1.0495 Train Acc: 0.6277 Train Prec: 0.7519 Train Rcll: 0.7422 Train F1: 0.7328 Val Loss: 0.8619 Val Acc: 0.5928 Val Prec: 0.7118 Val Rcll: 0.7070 Val F1: 0.6985
EPOCH: 4/10 Train Loss: 0.8305 Train Acc: 0.6864 Train Prec: 0.7852 Train Rcll: 0.7938 Train F1: 0.7825 Val Loss: 0.6990 Val Acc: 0.6613 Val Prec: 0.7658 Val Rcll: 0.7737 Val F1: 0.7625
EPOCH: 5/10 Train Loss: 0.6918 Train Acc: 0.7097 Train Prec: 0.8020 Train Rcll: 0.8118 Train F1: 0.8029 Val Loss: 0.6011 Val Acc: 0.6913 Val Prec: 0.7900 Val Rcll: 0.7979 Val F1: 0.7896
EPOCH: 6/10 Train Loss: 0.6086 Train Acc: 0.7216 Train Prec: 0.8289 Tr

EPOCH: 1/10 Train Loss: 2.0209 Train Acc: 0.2396 Train Prec: 0.3334 Train Rcll: 0.3805 Train F1: 0.3171 Val Loss: 1.8872 Val Acc: 0.2314 Val Prec: 0.3898 Val Rcll: 0.3698 Val F1: 0.3061
EPOCH: 2/10 Train Loss: 1.6074 Train Acc: 0.4821 Train Prec: 0.6437 Train Rcll: 0.6199 Train F1: 0.5838 Val Loss: 1.1944 Val Acc: 0.4673 Val Prec: 0.6320 Val Rcll: 0.6035 Val F1: 0.5689
EPOCH: 3/10 Train Loss: 1.0744 Train Acc: 0.6098 Train Prec: 0.7519 Train Rcll: 0.7523 Train F1: 0.7093 Val Loss: 0.8416 Val Acc: 0.6055 Val Prec: 0.7441 Val Rcll: 0.7459 Val F1: 0.7038
EPOCH: 4/10 Train Loss: 0.8473 Train Acc: 0.6853 Train Prec: 0.7917 Train Rcll: 0.8006 Train F1: 0.7799 Val Loss: 0.6830 Val Acc: 0.6632 Val Prec: 0.7700 Val Rcll: 0.7853 Val F1: 0.7627
EPOCH: 5/10 Train Loss: 0.7169 Train Acc: 0.7241 Train Prec: 0.8312 Train Rcll: 0.8271 Train F1: 0.8168 Val Loss: 0.5942 Val Acc: 0.7061 Val Prec: 0.8149 Val Rcll: 0.8115 Val F1: 0.8007
EPOCH: 6/10 Train Loss: 0.6464 Train Acc: 0.7274 Train Prec: 0.8273 Tr

EPOCH: 1/10 Train Loss: 1.9943 Train Acc: 0.3048 Train Prec: 0.4722 Train Rcll: 0.4934 Train F1: 0.4152 Val Loss: 1.8292 Val Acc: 0.3004 Val Prec: 0.4304 Val Rcll: 0.4820 Val F1: 0.4052
EPOCH: 2/10 Train Loss: 1.5914 Train Acc: 0.4402 Train Prec: 0.5899 Train Rcll: 0.5992 Train F1: 0.5321 Val Loss: 1.2127 Val Acc: 0.4322 Val Prec: 0.5613 Val Rcll: 0.5853 Val F1: 0.5264
EPOCH: 3/10 Train Loss: 1.0869 Train Acc: 0.5586 Train Prec: 0.6938 Train Rcll: 0.7136 Train F1: 0.6547 Val Loss: 0.8400 Val Acc: 0.5618 Val Prec: 0.6887 Val Rcll: 0.7111 Val F1: 0.6544
EPOCH: 4/10 Train Loss: 0.8245 Train Acc: 0.6007 Train Prec: 0.7151 Train Rcll: 0.7490 Train F1: 0.7063 Val Loss: 0.6598 Val Acc: 0.6023 Val Prec: 0.7095 Val Rcll: 0.7455 Val F1: 0.7026
EPOCH: 5/10 Train Loss: 0.6927 Train Acc: 0.6767 Train Prec: 0.7719 Train Rcll: 0.7908 Train F1: 0.7745 Val Loss: 0.5724 Val Acc: 0.6853 Val Prec: 0.7761 Val Rcll: 0.7985 Val F1: 0.7798
EPOCH: 6/10 Train Loss: 0.6332 Train Acc: 0.7035 Train Prec: 0.7925 Tr

## 4.4. Cross-Participant Cross-Validation

Cross-Participant cross-validation, also known as Leave-One-Subject-Out Cross-Validation is the most complex, but also most expressive validation method one can apply when dealing with multi-subject data. In general, it can be seen as a variation of the k-fold cross-validation, where each fold is the data of one subject. This way, each subject is treated as the unseen data at least once. 

Leaving one subject out each fold ensures that the overall evaluation of the algorithm does not overfit on subject-specific traits, i.e. how subjects performed the activities individually.

### Task 5: Implementing the cross-participant CV loop

1. Define a for loop which iterates over all subjects
2. Define the train data to be everything but the current subject data and the validation data to be the subject data by filtering the train + valid data
3. Apply the sliding window on top of the train and validation data and omit the 'subject_identifier' column from both datasets
4. Run the train function

In [9]:
# needed for saving results
log_date = time.strftime('%Y%m%d')
log_timestamp = time.strftime('%H%M%S')

# iterate over all subjects
for i, sbj in enumerate(np.unique(train_valid_data.iloc[:, 0])):
    print('\n VALIDATING FOR SUBJECT {0} OF {1}'.format(int(sbj) + 1, int(np.max(train_valid_data.iloc[:, 0])) + 1))
    
    # define the train data to be everything, but the data of the current subject
    train_data = data[data.iloc[:, 0] != sbj]
    # define the validation data to be the data of the current subject
    valid_data = data[data.iloc[:, 0] == sbj]
    
    print(train_data.shape, valid_data.shape)
    
    # apply the sliding window on top of both the train and validation data; use the "apply_sliding_window" function
    # found in data_processing.sliding_window
    X_train, y_train = apply_sliding_window(train_data.iloc[:, :-1], train_data.iloc[:, -1],
                                            sliding_window_size=config['sw_length'],
                                            unit=config['sw_unit'],
                                            sampling_rate=config['sampling_rate'],
                                            sliding_window_overlap=config['sw_overlap'],
                                            )

    print(X_train.shape, y_train.shape)

    X_valid, y_valid = apply_sliding_window(valid_data.iloc[:, :-1], valid_data.iloc[:, -1],
                                            sliding_window_size=config['sw_length'],
                                            unit=config['sw_unit'],
                                            sampling_rate=config['sampling_rate'],
                                            sliding_window_overlap=config['sw_overlap'],
                                            )

    print(X_valid.shape, y_valid.shape)
    
    # omit the first feature column (subject_identifier) from the train and validation dataset
    X_train, X_valid = X_train[:, :, 1:], X_valid[:, :, 1:]
    
    # within the config file, set the parameters 'window_size' and 'nb_channels' accordingly
    # window_size = size of the sliding window in units
    # nb_channels = number of feature channels
    config['window_size'] = X_train.shape[1]
    config['nb_channels'] = X_train.shape[2]
    
    # define the network to be a DeepConvLSTM object; can be imported from model.DeepConvLSTM
    # pass it the config object
    net = DeepConvLSTM(config=config)

    X_train, y_train,  = X_train.astype(np.float32), y_train.astype(np.uint8)
    X_valid, y_valid = X_valid.astype(np.float32), y_valid.astype(np.uint8)

    cross_participant_net, val_output, train_output = train(X_train, y_train, X_valid, y_valid,
                                                            network=net, 
                                                            config=config, 
                                                            log_date=log_date,
                                                            log_timestamp=log_timestamp)
    
    # the next bit prints out the average results per subject if you did everything correctly
    cls = np.array(range(config['nb_classes']))
    
    print('\nVALIDATION RESULTS FOR SUBJECT {0}: '.format(int(sbj) + 1))
    print("\nAvg. Accuracy: {0}".format(jaccard_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Avg. Precision: {0}".format(precision_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Avg. Recall: {0}".format(recall_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Avg. F1: {0}".format(f1_score(val_output[:, 1], val_output[:, 0], average='macro')))

    print("\nVALIDATION RESULTS (PER CLASS): ")
    print("\nAccuracy:")
    for i, rslt in enumerate(jaccard_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nPrecision:")
    for i, rslt in enumerate(precision_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nRecall:")
    for i, rslt in enumerate(recall_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
        print("   {0}: {1}".format(class_names[i], rslt))
    print("\nF1:")
    for i, rslt in enumerate(f1_score(val_output[:, 1], val_output[:, 0], average=None, labels=cls)):
        print("   {0}: {1}".format(class_names[i], rslt))

    print("\nGENERALIZATION GAP ANALYSIS: ")
    print("\nTrain-Val-Accuracy Difference: {0}".format(jaccard_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                      jaccard_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Train-Val-Precision Difference: {0}".format(precision_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                       precision_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Train-Val-Recall Difference: {0}".format(recall_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                    recall_score(val_output[:, 1], val_output[:, 0], average='macro')))
    print("Train-Val-F1 Difference: {0}".format(f1_score(train_output[:, 1], train_output[:, 0], average='macro') -
                                                f1_score(val_output[:, 1], val_output[:, 0], average='macro')))


 VALIDATING FOR SUBJECT 1 OF 2
(437639, 5) (221621, 5)
(12502, 50, 4) (12502,)
(6331, 50, 4) (6331,)
+----------------------------+------------+
|          Modules           | Parameters |
+----------------------------+------------+
| conv_blocks.0.conv1.weight |    704     |
|  conv_blocks.0.conv1.bias  |     64     |
| conv_blocks.0.conv2.weight |   45056    |
|  conv_blocks.0.conv2.bias  |     64     |
| conv_blocks.1.conv1.weight |   45056    |
|  conv_blocks.1.conv1.bias  |     64     |
| conv_blocks.1.conv2.weight |   45056    |
|  conv_blocks.1.conv2.bias  |     64     |
| lstm_layers.0.weight_ih_l0 |   98304    |
| lstm_layers.0.weight_hh_l0 |   65536    |
|  lstm_layers.0.bias_ih_l0  |    512     |
|  lstm_layers.0.bias_hh_l0  |    512     |
|         fc.weight          |    1024    |
|          fc.bias           |     8      |
+----------------------------+------------+
Total Params: 302024
Applied weighted class weights: 
[1.04044607 1.00112108 5.50264085 0.8735327  0.79732

## 4.5 Testing

Now, after having implemented each of the validation techniques we want to get an unbiased view of how our trained algorithm perfoms on unseen data. To so we use the testing set which we split off the original dataset within the first step of this notebook.

### Task 6: Testing your trained networks

1. Apply the sliding window on top of the test data and omit the 'subject_identifier' column from the dataset
2. Using the predict function of the DL-ARC obtain results on the test dataset using each of the trained networks
3. Which model does perform the best and why? Was this expected?

In [16]:
from model.train import predict


X_test, y_test = apply_sliding_window(test_data.iloc[:, :-1], test_data.iloc[:, -1],
                                      sliding_window_size=config['sw_length'],
                                      unit=config['sw_unit'],
                                      sampling_rate=config['sampling_rate'],
                                      sliding_window_overlap=config['sw_overlap'],
                                      )

print(X_test.shape, y_test.shape)

X_test = X_test[:, :, 1:]

print('COMPILED TEST RESULTS: ')
print('\nTest results (train-valid-split): ')
predict(X_test, y_test, train_valid_net, config, log_date, log_timestamp)

print('\nTest results (k-fold): ')
predict(X_test, y_test, kfold_net, config, log_date, log_timestamp)

print('\nTest results (per-participant): ')
predict(X_test, y_test, per_participant_net, config, log_date, log_timestamp)

print('\nTest results (cross-participant): ')
predict(X_test, y_test, cross_participant_net, config, log_date, log_timestamp)

(6536, 50, 4) (6536,)
COMPILED TEST RESULTS: 

Test results (train-valid-split): 
TEST RESULTS: 
Avg. Accuracy: 0.09035619618330168
Avg. Precision: 0.11683614302774101
Avg. Recall: 0.1316811109437625
Avg. F1: 0.11418997366534062
TEST RESULTS (PER CLASS): 
Accuracy: [0.         0.         0.00552486 0.         0.66101695 0.05630776
 0.         0.        ]
Precision: [0.         0.         0.02857143 0.         0.84086242 0.06525529
 0.         0.        ]
Recall: [0.         0.         0.00680272 0.         0.75553506 0.29111111
 0.         0.        ]
F1: [0.         0.         0.01098901 0.         0.79591837 0.10661241
 0.         0.        ]

Test results (k-fold): 
TEST RESULTS: 
Avg. Accuracy: 0.020392950378779043
Avg. Precision: 0.04028757833309204
Avg. Recall: 0.048827137250964346
Avg. F1: 0.036964828253996565
TEST RESULTS (PER CLASS): 
Accuracy: [0.         0.         0.03783102 0.         0.12507778 0.0002348
 0.         0.        ]
Precision: [0.         0.         0.0443787 