# 3. Network Architecture & Training

Within this notebook, you'll learn all you need to know about how to define your network architecture as well as train it. The notebook is structured into three main parts. 

First, we will go over a sample Deep Learning for HAR architecture called the DeepConvLSTM. You will learn how to define the architecture in PyTorch and create a trainable object. 

Second, we will go over essential steps which you need to apply on top of the data so that it is in the correct format to be fed into the network. This directly ties into what you have learned within the first notebook on preprocessing.

Third, you'll learn how to embed your defined network object into a training loop and train it using the data of the first two subjects within the RWHAR dataset. You will also learn how to perfom a simple evaluation by applying your trained network on top of the data of the third subject within the RWHAR dataset.

**WARNING FOR COLAB USERS:**  
- Set use_colab to True if you are accessing this notebook
- Change your runtime time to GPU by clicking: Runtime -> Change runtime type -> Dropdown -> GPU -> Save

In [1]:
import os
import sys

use_colab = False

module_path = os.path.abspath(os.path.join('..'))

if use_colab:
    # clone package repository
    !git clone https://github.com/mariusbock/dl-for-har.git

    # navigate to dl-for-har directory
    %cd dl-for-har/
else:
    os.chdir(module_path)
    
# this statement is needed so that we can use the methods of the DL-ARC pipeline
if module_path not in sys.path:
    sys.path.append(module_path)

## 3.1. Defining a Network Architecture

In the following we will define a network which we can train using the data which we previously preprocessed. The architecture which we will used is called DeepConvLSTM. The architecture was introduced by Francisco Javier Ordonez and Daniel Roggen in 2016 and is to this date a state-of-the-art architecture for applying Deep Learning on Human Activity Recognition. The architecture combines both convolutional and recurrent layers.

The architecture is made of three main parts:

1. **Convolutional layers:** Within the original architecture Ordonez and Roggen apply 4 convolutional layers each with 64 filters of size 5x1. 
2. **LSTM layer(s):** After applying convolutional layers, Ordonez and Roggen make us of an LSTM in order to capture time dependencies on features extracted by convolutional operations. Originally, Ordonez and Roggen employed a 2-layered LSTM with 128 hidden units. Recently, we exhibited that a 1-layered LSTM might be a better suited option when dealing with raw sensor-data. We thus employ a 1-layered instead of 2-layered LSTM within this tutorial.
3. **Classification layer:** The output of the LSTM is finally fed into a classifier which produces the final predictions.

In [35]:
from torch import nn

class DeepConvLSTM(nn.Module):
    def __init__(self, config):
        super(DeepConvLSTM, self).__init__()
        # parameters
        self.window_size = config['window_size']
        self.drop_prob = config['drop_prob']
        self.nb_channels = config['nb_channels']
        self.nb_classes = config['nb_classes']
        self.seed = config['seed']
        self.nb_filters = config['nb_filters']
        self.filter_width = config['filter_width']
        self.nb_units_lstm = config['nb_units_lstm']
        self.nb_layers_lstm = config['nb_layers_lstm']

        # define conv layers
        self.conv1 = nn.Conv2d(1, self.nb_filters, (self.filter_width, 1))
        self.conv2 = nn.Conv2d(self.nb_filters, self.nb_filters, (self.filter_width, 1))
        self.conv3 = nn.Conv2d(self.nb_filters, self.nb_filters, (self.filter_width, 1))
        self.conv4 = nn.Conv2d(self.nb_filters, self.nb_filters, (self.filter_width, 1))
        
        # define lstm layers
        self.lstm = nn.LSTM(input_size=self.nb_filters * self.nb_channels, hidden_size=self.nb_units_lstm, num_layers=self.nb_layers_lstm)

        # define dropout layer
        self.dropout = nn.Dropout(self.drop_prob)
        
        # define classifier
        self.fc = nn.Linear(self.nb_units_lstm, self.nb_classes)

    def forward(self, x):
        # reshape data for convolutions
        x = x.view(-1, 1, self.window_size, self.nb_channels)
        
        # apply convolution
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        
        # set the final sequence length to be 
        final_seq_len = x.shape[2]
        
        # permute dimensions and reshape for LSTM
        x = x.permute(0, 2, 1, 3)
        x = x.reshape(-1, final_seq_len, self.nb_filters * self.nb_channels)

        # apply LSTM
        x, _ = self.lstm(x)
            
        # reshape data for classifier
        x = x.view(-1, self.nb_units_lstm)
        
        # apply dropout and feed data through classifier
        x = self.dropout(x)
        x = self.fc(x)
        
        # reshape data and return predicted label of last sample within final sequence (determines label of window)
        out = x.view(-1, final_seq_len, self.nb_classes)
        return out[:, -1, :]

## 3.2. Preparing your data

Before training the network we just defined, we need to bring our data into the right format. The preprocessing consists of five essential parts.

1. Split the data into a training and validation dataset. The validation dataset is used to gain feedback on the perfomance of the model and functions as unseen data. Results obtained on the validation dataset can be used as an indicator whether the changes you make to a network and/ or its training process are improving or worsening results.
2. Apply the sliding window approach on top of the training and validation dataset. As you learned in the previous notebook, we do not classify a single record, but a window of records. The label of the last record within a window defines the label of the window and is our ultimate goal to predict.
3. Omit the subject identifier column.
4. Apply label encoding on top of the datasets, i.e. replace the string label names with integer values. 
5. Convert the two datasets into the correct data format so that they are compatible with the GPU.

In [53]:
import pandas as pd
import numpy as np
import warnings

from data_processing.sliding_window import apply_sliding_window

warnings.filterwarnings('ignore')

# data loading; you already know this
data_folder = 'data'
dataset = 'rwhar_3sbjs_data.csv'
data = pd.read_csv(os.path.join(data_folder, dataset), names=['subject_id', 'acc_x', 'acc_y', 'acc_z', 'activity_label'])
print("\nValue counts before label encoding: ")
print(data['activity_label'].value_counts())

# all activity names (you need them to define the label_dict!)
class_names = ['climbing_down', 'climbing_up', 'jumping', 'lying', 'running', 'sitting', 'standing', 'walking']

# apply label encoding: define a dict with all the activities and assign them integers going from zero 
# to (number of activites - 1); use the .replace() function of pandas.Series to replace the values within the dataset
label_dict = {
    'climbing_down': 0,
    'climbing_up': 1,
    'jumping': 2,
    'lying': 3,
    'running': 4,
    'sitting': 5,
    'standing': 6,
    'walking': 7
}

data['activity_label'] = data['activity_label'].replace(label_dict) 

# if you did everything correctly then the label distribution should now be printed with integer instead of strings
print("\nValue counts after label encoding: ")
print(data['activity_label'].value_counts())

print("\nShape of the dataset before splitting and windowing: ")
print(data.shape)

# define the train data to be all data belonging to the first two subjects
train_data = data[data.subject_id <= 1]
# define the validation data to be all data belonging to the third subject
valid_data = data[data.subject_id == 2]

# settings for the sliding window (change them if you want to!)
sw_length = 50
sw_unit = 'units'
sw_overlap = 50

# apply a sliding window on top of both the train and validation data; you can use our predefined method
# you can import it via from preprocessing.sliding_window import apply_sliding_window
X_train, y_train = apply_sliding_window(train_data.iloc[:, :-1], train_data.iloc[:, -1],
                                        sliding_window_size=sw_length,
                                        unit=sw_unit,
                                        sampling_rate=50,
                                        sliding_window_overlap=sw_overlap,
                                        )


X_valid, y_valid = apply_sliding_window(valid_data.iloc[:, :-1], valid_data.iloc[:, -1],
                                        sliding_window_size=sw_length,
                                        unit=sw_unit,
                                        sampling_rate=50,
                                        sliding_window_overlap=sw_overlap,
                                        )

print("\nShape of the train and validation datasets after splitting and windowing: ")
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)

# omit the first feature column (subject_identifier) from the train and validation dataset
X_train, X_valid = X_train[:, :, 1:], X_valid[:, :, 1:]
print("\nShape of the train and validation feature dataset after splitting and windowing: ")
print(X_train.shape, X_valid.shape)

# convert the features of the train and validation to float32 and labels to uint8 for GPU compatibility 
X_train, y_train = X_train.astype(np.float32), y_train.astype(np.uint8)
X_valid, y_valid = X_valid.astype(np.float32), y_valid.astype(np.uint8)


Value counts before label encoding: 
running          99204
walking          96810
sitting          95265
standing         94106
lying            94038
climbing_up      87572
climbing_down    78004
jumping          14261
Name: activity_label, dtype: int64

Value counts after label encoding: 
4    99204
7    96810
5    95265
6    94106
3    94038
1    87572
0    78004
2    14261
Name: activity_label, dtype: int64

Shape of the dataset before splitting and windowing: 
(659260, 5)

Shape of the train and validation datasets after splitting and windowing: 
(17215, 50, 4) (17215,)
(9151, 50, 4) (9151,)

Shape of the train and validation feature dataset after splitting and windowing: 
(17215, 50, 3) (9151, 50, 3)


## 3.3. Training Your Network

Since we now have brought the data into the correct format, let's train our network with it!

A typical training loop can be divided into three steps:

1. **Definition:** You define your network, optimizer and loss
2. **Training:** Iterating over the number of epochs: you chunk your training data into so-called batches and iteratively feed them through your network. After a batch has been fed through the network, you compute the loss said batch produced. Using the loss you backprogate it through the network using the optimizer which adjusts the weights accordingly. 
3. **Validation:** After you have processed your whole training dataset, you go on to validate the predictive performance of the network. To do so you again chunk your training and validation data into batches. Iterating over all batches of both all datasets, fed the batches through the trained network and obtain its predictions. **Note:** you only want to obtain predicitons and not backpropagate any loss. Using the predictions you can now use them to calculate your standard evaluation metrics which you learnt in previous notebooks.

### Task 3: Define your own train loop (ADVANCED)

1. You'll see that we already defined a config object which you can use. Nevertheless, there are three values missing, i.e. the window size, number of feature channels and number of classes. Define them correctly.
2. Define your network, optimizer and loss object
3. Write your training loop: iterate over the number of epochs and define a DataLoader object using the train features and labels.
4. Iterate over the DataLoader object; for each batch, compute the loss by passing it through the network; backprogate the computed loss using your optimizer object 
5. Obtain predictions for the train and validation dataset using the resulting trained network of the current epoch. To do so: define a DataLoader object for the validation dataset and fed both DataLoader objects batch-wise through the network. Obtain predicitons by applying softmax on top of the network output and compute the loss. 

In [57]:
import torch
from torch.utils.data import DataLoader
from sklearn.metrics import precision_score, recall_score, f1_score, jaccard_score

import time

# this is the config object which contains all relevant settings. Feel free to change them and see how it influences
# your results. Parameters which shouldn't be changed are marked.
config = {
    'nb_filters': 64,
    'filter_width': 11,
    'nb_units_lstm': 128,
    'nb_layers_lstm': 1,
    'drop_prob': 0.5,
    'seed': 1,
    'epochs': 20,
    'batch_size': 100,
    'learning_rate': 1e-4,
    'weight_decay': 1e-6,
    'gpu_name': 'cuda:0',
    'print_counts': False
}

# define the missing parameters within the config file. 
# window_size = size of the sliding window in units
# nb_channels = number of feature channels
# nb_classes = number of classes that can be predicted
config['window_size'] = X_train.shape[1]
config['nb_channels'] = X_train.shape[2]
config['nb_classes'] = len(class_names)

# initialize your DeepConvLSTM object 
network = DeepConvLSTM(config)

# send network to the GPU and set it to training mode
network.to(config['gpu_name'])
network.train()

# initialize your optimizer and loss; e.g. Adam optimizer and Cross-entropy loss
# look up the PyTorch documentation for more options
optimizer = torch.optim.Adam(network.parameters(), lr=config['learning_rate'], weight_decay=config['weight_decay'])
criterion = nn.CrossEntropyLoss()

# define your training loop; iterates over the number of epochs
for e in range(config['epochs']):
    # helper objects needed for proper documentation
    train_losses = []
    start_time = time.time()
    batch_num = 1

    # initialize train dataset in Torch format
    dataset = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
    
    # define your trainloader; use from torch.utils.data import DataLoader
    trainloader = DataLoader(dataset,
                             batch_size=config['batch_size'],
                             num_workers=2,
                             shuffle=False,
                             )

    # iterate over the trainloader object (it'll return batches which you can use)
    for i, (x, y) in enumerate(trainloader):
        # sends batch x and y to the GPU
        inputs, targets = x.to(config['gpu_name']), y.to(config['gpu_name'])
        # zero accumulated gradients
        optimizer.zero_grad()
        
        # send inputs through network to get predictions, calculate loss
        output = network(inputs)
        loss = criterion(output, targets.long())
        # backprogates your computed loss through the network
        loss.backward()
        optimizer.step()
        
        # appends the computed batch loss to list
        train_losses.append(loss.item())

        # prints out every 100 batches information about the current loss and time per batch
        if batch_num % 100 == 0 and batch_num > 0:
            cur_loss = np.mean(train_losses)
            elapsed = time.time() - start_time
            print('| epoch {:3d} | {:5d} batches | ms/batch {:5.2f} | train loss {:5.2f}'.format(e, batch_num, elapsed * 1000 / config['batch_size'], cur_loss))
            start_time = time.time()
            batch_num += 1

            
    # helper objects
    val_preds = []
    val_gt = []
    val_losses = []
    train_preds = []
    train_gt = []

    # initialize validation dataset in Torch format
    dataset = torch.utils.data.TensorDataset(torch.from_numpy(X_valid).float(), torch.from_numpy(y_valid))
    
    # define your valloader; use from torch.utils.data import DataLoader
    valloader = DataLoader(dataset,
                           batch_size=config['batch_size'],
                           num_workers=2,
                           shuffle=False,
                           )

    # sets network to eval mode and 
    network.eval()
    with torch.no_grad():
        # iterate over the valloader object (it'll return batches which you can use)
        for i, (x, y) in enumerate(valloader):
            # sends batch x and y to the GPU
            inputs, targets = x.to(config['gpu_name']), y.to(config['gpu_name'])

            # send inputs through network to get predictions
            val_output = network(inputs)
            # calculate loss by passing criterion both predicitons and true labels 
            val_loss = criterion(val_output, targets.long())
            # calculate actual predictions (i.e. softmax probabilites); use torch.nn.functional.softmax()
            val_output = torch.nn.functional.softmax(val_output, dim=1)

            # appends validation loss to list
            val_losses.append(val_loss.item())

            # creates predictions and true labels; appends them to the final lists
            y_preds = np.argmax(val_output.cpu().numpy(), axis=-1)
            y_true = targets.cpu().numpy().flatten()
            val_preds = np.concatenate((np.array(val_preds, int), np.array(y_preds, int)))
            val_gt = np.concatenate((np.array(val_gt, int), np.array(y_true, int)))

        # iterate over the trainloader object (it'll return batches which you can use)
        for i, (x, y) in enumerate(trainloader):
            # sends batch x and y to the GPU
            inputs, targets = x.to(config['gpu_name']), y.to(config['gpu_name'])

            # send inputs through network to get predictions
            train_output = network(inputs)
            # calculate actual predictions (i.e. softmax probabilites); use torch.nn.functional.softmax()
            train_output = torch.nn.functional.softmax(train_output, dim=1)

            # creates predictions and true labels; appends them to the final lists
            y_preds = np.argmax(train_output.cpu().numpy(), axis=-1)
            y_true = targets.cpu().numpy().flatten()
            train_preds = np.concatenate((np.array(train_preds, int), np.array(y_preds, int)))
            train_gt = np.concatenate((np.array(train_gt, int), np.array(y_true, int)))

        # print epoch evaluation results for train and validation dataset
        print("\nEPOCH: {}/{}".format(e + 1, config['epochs']),
                  "\nTrain Loss: {:.4f}".format(np.mean(train_losses)),
                  "Train Acc: {:.4f}".format(jaccard_score(train_gt, train_preds, average='macro')),
                  "Train Prec: {:.4f}".format(precision_score(train_gt, train_preds, average='macro')),
                  "Train Rcll: {:.4f}".format(recall_score(train_gt, train_preds, average='macro')),
                  "Train F1: {:.4f}".format(f1_score(train_gt, train_preds, average='macro')),
                  "\nVal Loss: {:.4f}".format(np.mean(val_losses)),
                  "Val Acc: {:.4f}".format(jaccard_score(val_gt, val_preds, average='macro')),
                  "Val Prec: {:.4f}".format(precision_score(val_gt, val_preds, average='macro')),
                  "Val Rcll: {:.4f}".format(recall_score(val_gt, val_preds, average='macro')),
                  "Val F1: {:.4f}".format(f1_score(val_gt, val_preds, average='macro')))

        # if chosen, print the value counts of the predicted labels for train and validation dataset
        if config['print_counts']:
            print('Predicted Train Labels: ')
            print(np.vstack((np.nonzero(np.bincount(train_preds))[0], np.bincount(train_preds)[np.nonzero(np.bincount(train_preds))[0]])).T)
            print('Predicted Val Labels: ')
            print(np.vstack((np.nonzero(np.bincount(val_preds))[0], np.bincount(val_preds)[np.nonzero(np.bincount(val_preds))[0]])).T)


    # set network to train mode again
    network.train()




EPOCH: 1/20 
Train Loss: 2.0133 Train Acc: 0.1968 Train Prec: 0.3320 Train Rcll: 0.3394 Train F1: 0.2774 
Val Loss: 2.0925 Val Acc: 0.1360 Val Prec: 0.1594 Val Rcll: 0.2392 Val F1: 0.1557

EPOCH: 2/20 
Train Loss: 1.9281 Train Acc: 0.2438 Train Prec: 0.2601 Train Rcll: 0.3578 Train F1: 0.2804 
Val Loss: 1.8990 Val Acc: 0.2780 Val Prec: 0.3474 Val Rcll: 0.3718 Val F1: 0.3131

EPOCH: 3/20 
Train Loss: 1.8311 Train Acc: 0.3060 Train Prec: 0.3454 Train Rcll: 0.4229 Train F1: 0.3721 
Val Loss: 1.8964 Val Acc: 0.2050 Val Prec: 0.2508 Val Rcll: 0.3350 Val F1: 0.2555

EPOCH: 4/20 
Train Loss: 1.8028 Train Acc: 0.3168 Train Prec: 0.3536 Train Rcll: 0.4470 Train F1: 0.3748 
Val Loss: 1.9316 Val Acc: 0.1670 Val Prec: 0.1760 Val Rcll: 0.3194 Val F1: 0.2188

EPOCH: 5/20 
Train Loss: 1.6829 Train Acc: 0.3377 Train Prec: 0.4310 Train Rcll: 0.4789 Train F1: 0.4123 
Val Loss: 1.9313 Val Acc: 0.0961 Val Prec: 0.1750 Val Rcll: 0.2397 Val F1: 0.1443

EPOCH: 6/20 
Train Loss: 1.5558 Train Acc: 0.3187 Trai