# Hands-on #3: Evaluation and Training on Patient-specific Data

In this notebook, you will:
1. Pre-process your personalized EMG data collected in Hands-on #2 and split it into training/validation and test sets.
2. Load the optimized DNN from Hands-on #1 and test it (without fine-tuning) on your data.
3. Fine-tune the DNN on your data and verify if performance improves.
4. Lastly, quantize the model to prepare it for deployment.


# Part 1: Personalized Data Preparation

## Initial Setup

Let's start by importing the required libraries:

In [1]:
import os
import copy
import sys
import random
import json

from itertools import groupby
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import butter, lfilter

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torcheval.metrics import MulticlassAccuracy, Mean
from torchinfo import summary

from utils.data import EMGDataset, get_repetitions_mask, windowing
from utils.train import CheckPoint, EarlyStopping, try_load_checkpoint
from utils.plot import plot_raw_data, plot_signal, plot_learning_curves, plot_conf_matrix



Setup the paths, training configuration, training device, and reproducibility settings, as done in Hands-on #1.

In [2]:
DATA_DIR = Path("./personal_data")
SAVE_DIR = Path(f"experiments/hands_on_3/")

TRAINING_CONFIG = {
    'epochs': 300,
    'batch_size': 32,
    'learning_rate': 0.0001,
    'label_smoothing': 0.25,
    'patience': 50
}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Working on: {device}")

## The Raw Dataset

At the end of Hands-on #2, you should have collected your data in a binary format, and applied a simple Python script to convert it to Apache Parquet, an open-source column data format. You should have two files, called `session_1.parquet` and `session_2.parquet`. Copy both files under `./personal_data/` and load the first session to check it.


In [3]:
# load the first session data and display the first few lines
df = pd.read_parquet(DATA_DIR / "session_1.parquet")
df.head()

The trigger signal was set by the GUI that you used to collect data. It assumes a value $k>0$ in correspondence of repetitions of the $k$-th gesture, and 0 during rests.

Use the next utility function, provided under `./utils/plot.py` to display the first 50k samples of the raw data and trigger. The graph will be automatically colored based on the trigger to highlight gestures. With 50k samples, you should be able to see at least two types of gesture. Try to increase the to_sample parameter to visualize a longer portion of the data.

In [4]:
plot_raw_data(df, to_sample=50000)

## Pre-processing

As discussed in Hands-on #1, we feed our DNNs with EMG data after an initial pre-processing phase. As a reminder, we apply the following pre-processing steps:

1. We apply a 4-th order high-pass Butterworth **filter** to eliminate the continuous component from the signal. We use the scipy `butter()` function, that takes the filter order and normalized cutoff frequency as parameters, as well as the filter type (btype), and returns the corresponding coefficients. Then, `lfilter()` uses these coefficients to apply the filter to a numpy array. The normalized cutoff is computed as the absolute cutoff frequency (in Hz) divided by the Nyquist frequency (i.e. 1/2 of the sampling frequency). We apply the filtering to each EMG channel separately.
2. We **trim** the initial and final part of each session (before the first gesture and after the last one), which contain transients. For that, we first compute the length of each segment in the trigger signal. We then remove the long first segment, corresponding to the initial acquisition before the first gesture (keeping only a small portion of it as "rest"). Then, we also have to eliminate all the signal after the last gesture.
4. We apply a **min-max** normalization to have the same value ranges for all channels. We simply transform our data to x' = (x - min) / (max - min), to normalize it between 0 and 1.

While the dataset used in Hands-on #1 had been pre-processed beforehand, we will have to repeat the same steps for your personal data here.

Let us define some simple functions to perform these operations.


In [5]:
def hp_filter(data, order, cutoff_frequency, sampling_frequency):
    # compute the normalize cutoff frequency and apply butter() and lfilter() to filter your data
    normalized_cutoff = cutoff_frequency / (sampling_frequency / 2)
    b, a = butter(order, normalized_cutoff, btype="highpass")
    return lfilter(b, a, data, axis=0)

def trim(data, labels):
    segment_lengths = [sum(1 for _ in group) for _, group in groupby(labels)]
    # eliminate most of the initial portion of the signal, where the trigger is zero, but the hand might not be yet at rest.
    # Precisely, we keep only some seconds before the first gesture (equal to the length of the SECOND rest).
    # Then, also eliminate all data after the last gesture, 
    # where you might have moved your hand freely. 
    start_index = segment_lengths[0] - segment_lengths[2]
    end_index = data.shape[0] - segment_lengths[-1] + 1
    data = data[start_index:end_index, :]
    labels = labels[start_index:end_index]
    return data, labels

def normalize(data, min_values, max_values):
    # normalize the data to [0:1]
    rescaled_data = (data - min_values) / (max_values - min_values)
    return rescaled_data

Let's test these functions on the whole session_1 data and display the result. For the filtering, we use a cutoff frequency of 10Hz and a 4-th order filter.


In [6]:
# split data and labels
data = df.drop("Trigger", axis=1)
labels = df["Trigger"].astype(int).to_numpy()

# filter, trim and normalize. Call the final arrays "normalized_data" and "trimmed_labels"
filtered_data = hp_filter(data, order=4, cutoff_frequency=10.0, sampling_frequency=500)
trimmed_data, trimmed_labels = trim(filtered_data, labels)
normalized_data = normalize(trimmed_data, trimmed_data.min(axis=0), trimmed_data.max(axis=0))

# create a new df for plotting
new_df = pd.DataFrame(np.hstack((normalized_data, trimmed_labels.reshape(-1,1))), columns=df.columns)

plot_raw_data(new_df, to_sample=50000)

You should now see much more clearly the various channels activating in correspondance of a gesture, as well as the differences between different gestures.

## Splitting and Windowing

The next steps consist in separating our data into training, validation and test sets, and then constructing fixed-length, partially overlapping windows of 300 samples (as done in the pre-collected dataset) to feed our DNNs. Since this part is tedious and not particularly interesting, we provide you with two pre-cooked functions. Both are defined under `utils/data.py`. 

Let's start by defining the goal in terms of splitting. We want to use session_1 for **training**, and session_2 for **testing**. In fact, as you remember from hands-on #1,  generalizing over different sessions will be very important for our models. Furthermore, we also want to hold out a portion of session 1 (the _last gesture repetition_) and use it as **validation** set. Now, let's see how we can do that with the two provided functions.

The first function, `get_repetitions_mask()` takes the data and labels (i.e., the trigger), as well as the index of the first/last gesture repetition of interest (two numbers between 0 and 4). It returns a binary mask of the same length as the data, containing 1s in correspondance to samples that belong to the selected repetitions (or to the associated rests) and 0s everywhere else. For example:
```Python
get_repetitions_mask(data, labels, 1, 3)
```
would return a mask with 1s in correspondence to the 2nd, 3rd and 4th repetition of each gesture.

The second function, `windowing()` takes two arrays of raw data and labels (trigger), respectively of dimension $(N_{samples}, 8)$ and $N_{samples}$, and outputs the windowed data, of dimension $(N_{windows}, 300, 8)$ and the windowed labels, of dimension $N_{windows}$. Each window label is obtained looking at the trigger value in correspondence to the *center* of the window. Windows are partially overlapped, with 70% of the samples shared between consecutive windows. Furthermore, only stable windows are considered. In other words, windows that include a transition between two gestures, or are too close to it (less than 1.5s), are excluded. All parameters of this function are configurable, but we will keep the default ones.

These are all the components needed to obtain our dataset, ready for DNN training and evaluation. Let's put them all together in the next function.


<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the next function to return training, validation, and test data and labels after pre-processing and windowing.
</div>

In [7]:
def prepare_data(train_val_file, test_file, concatenate_all=False):
    
    print(f"Processing {train_val_file}")

    # read session 1 data
    df = pd.read_parquet(train_val_file)

    # split data and labels
    data = df.drop("Trigger", axis=1).values 
    labels = df.Trigger.values.astype(int)

    # apply filtering to the data 
    filtered_data = hp_filter(data, order=4, cutoff_frequency=10.0, sampling_frequency=500)
    
    # trim the filtered data and labels
    trimmed_data, trimmed_labels = trim(filtered_data, labels)

    # split training and validation sets using get_repetitions_mask
    # first call the function to isolate repetitions [0,3] (for training) and
    # [4,4] (for validation). Then, use the mask as an index in the arrays
    # to separate the two subsets
    train_mask = get_repetitions_mask(trimmed_data, trimmed_labels, 0, 3)
    val_mask = get_repetitions_mask(trimmed_data, trimmed_labels, 4, 4)
    train_data = trimmed_data[train_mask]
    train_labels = trimmed_labels[train_mask]
    val_data = trimmed_data[val_mask]
    val_labels = trimmed_labels[val_mask]

    print(f"Processing {test_file}")

    # read session 2 data (test set)
    df = pd.read_parquet(test_file)

    
    # split test data and test labels (expected: 2 lines)
    # YOUR_CODE_START


    # YOUR_CODE_END

    # apply filtering to the test data (expected: 1 line)
    # YOUR_CODE_START

    # YOUR_CODE_END
    
    # trim the filtered test data and labels (expected: 1 line)
    # YOUR_CODE_START

    # YOUR_CODE_END

    # we don't need to mask anything since we use the whole session 2 as test set
    test_data, test_labels = trimmed_data, trimmed_labels

    # normalize all data arrays using the TRAINING SET's min and max values
    train_min, train_max = train_data.min(axis=0), train_data.max(axis=0)
    train_data = normalize(train_data, train_min, train_max)
    val_data = normalize(val_data, train_min, train_max)
    test_data = normalize(test_data, train_min, train_max)

    # create windows for all three datasets
    train_data, train_labels = windowing(train_data, train_labels)
    val_data, val_labels = windowing(val_data, val_labels)
    test_data, test_labels = windowing(test_data, test_labels)

    # note: the concatenate_all option allows you to obtain splits in which:
    # - train = train + test
    # - val = val
    # - test = train + test
    # this is needed if you want to retrain on ALL DATA before deployment (which is usually a good idea).
    # of course, in this case, the test accuracy shouldn't be considered as relevant.
    # for now, leave it at False, then you might come back and set it to True before deployment.
    if concatenate_all:
        train_data = test_data = np.vstack((train_data, test_data))
        train_labels = test_labels = np.hstack((train_labels, test_labels))
    
    return (train_data, train_labels), (val_data, val_labels), (test_data, test_labels), (train_min, train_max)

Notice that the function also returns the training set minimum and maximum values. We will need those values if we want to normalize newly incoming data. So, at the end of this notebook, we will save them to file. 

Invoke the function just defined to generate our three data splits:

In [8]:
train, val, test, norm_values = prepare_data(DATA_DIR / "session_1.parquet", DATA_DIR / "session_2.parquet")
train_data, train_labels = train
val_data, val_labels = val
test_data, test_labels = test
norm_min, norm_max = norm_values

Let us check the size of our datasets. Note that the test set size is quite large compared to training and validation. One could be tempted to add more data to the training set. However, remember that it is fundamental to test DNN models for EMG gesture recognition on *different sessions* compared to those use for training, to make sure that they are practically useful (still predicting reasonably even with a slightly different sensor mount).


In [9]:
# print dataset shapes 
print(f"Total Train Windows: {train_data.shape[0]}, Val Windows: {val_data.shape[0]}, Test Windows: {test_data.shape[0]}")

Let's reuse the plot_signal function seen in Hands-on #1 to visualize one window of the pre-processed training data and see if they looks similar to those treated in that session. Change the two window indexes (5 and 30 by default) to find rest and gesture windows respectively.

In [10]:
plot_signal(5, 30, train_data, train_labels)

Before we move on, let's save the preprocessed and splitted data to file, so that we can re-use it later (e.g. to run some test inferences on the target hardware):

In [11]:
os.makedirs(SAVE_DIR / "preprocessed_data", exist_ok=True)
np.save(SAVE_DIR / "preprocessed_data" / "train_data", train_data)
np.save(SAVE_DIR / "preprocessed_data" / "train_labels", train_labels)
np.save(SAVE_DIR / "preprocessed_data" / "val_data", val_data)
np.save(SAVE_DIR / "preprocessed_data" / "val_labels", val_labels)
np.save(SAVE_DIR / "preprocessed_data" / "test_data", test_data)
np.save(SAVE_DIR / "preprocessed_data" / "test_labels", test_labels)

Together with the data, we also ought to save the minimum and maximum values for normalization. In fact, differently from other preprocessing steps (windowing and filtering) normalization has data dependent parameters (the min/max values themselves). At inference time, we will have to reuse the **same normalization constants** to obtain good predictions from the DNN. The next cell generates a simple JSON file containing the two values.

In [12]:
with open(f"{SAVE_DIR}/rescaling_values.json", "w") as f:
    json.dump({"min": norm_min.tolist(), "max": norm_max.tolist()}, f)

## Datasets and Dataloaders



The last data preparation step consists in generating Dataset instances (first) and DataLoaders (second) for PyTorch trainings. For the first part, we can reuse the `EMGDataset` class seen in Hands-on #1.

In [13]:
# create the datasets for training, validation and test sets
train_ds = EMGDataset(train_data, train_labels)
val_ds = EMGDataset(val_data, val_labels)
test_ds = EMGDataset(test_data, test_labels)

Concerning DataLoaders, we can reuse the function prepared in Hands-on #1 to build them. Using the `ipynb` library, we can conveniently load function definitions from another notebook. Let's use it:

In [14]:
from ipynb.fs.defs.hands_on_1 import build_dataloaders

train_dl, val_dl, test_dl = build_dataloaders(TRAINING_CONFIG, train_ds, val_ds, test_ds, num_workers=4)
print(f"Training data-loader length: {len(train_dl)}")
print(f"Validation data-loader length: {len(val_dl)}")
print(f"Test data-loader length: {len(test_dl)}")

As you can see, we have much fewer minibatches, as expected since we're working with a single patient's data.

# Part 2: Zero-shot Performance

In Hands-on #1, we claimed that EMG processing requires subject-specific training. To verify this, let's try to load our optimized DNN from that session, which is fine-tuned on our multi-patient dataset, and directly test it on the newly created test set from your personal data.

Load the model saved at the end of Hands-on #1, which should be under `./experiments/hands_on_1/final_model.pt`. If you did not manage to finish that session, you can find a pre-cooked model under `./checkpoints/hands_on_1/final_model.pt`. 

In [15]:
MODEL_PATH = Path("./experiments/hands_on_1/final_model.pt")
model = torch.load(MODEL_PATH).eval()

We can now evaluate this model on the new data. As for `build_dataloaders` let's not rewrite any code. Use the `ipynb` library to import the `evaluate()` function from Hands-on #1, and run it. Remember that evaluate also requires the loss function. It's not strictly necessary here, but to avoid having to rewrite anything, we will just create a new `criterion` instance before testing.


In [16]:
# complete this line to import the right functions
from ipynb.fs.defs.hands_on_1 import evaluate, get_criterion

# define the criterion, then evaluate the loaded model on the new test set and print the result. 
criterion = get_criterion(train_dl, TRAINING_CONFIG, device)
test_loss, test_acc, test_macro_acc = evaluate(model, criterion, test_dl, device, num_classes=9)
print(f"Test Loss: {test_loss:.2f}, Test Acc: {test_acc:.2f}, Test Macro Acc: {test_macro_acc:.2f}")


As you can see, our model performs quite poorly on the new data. You should get a macro accuracy not higher than 30-40% at most. This confirms our hypothesis on the need of personalized training. 

<div class="alert alert-block alert-info">
<b>Question:</b> Why do you think the accuracy is so low? What influences the EMG signals?
</div>

So, the next obvious step is to fine-tune the model on the new data. Let's do that next.

# Part 3: Patient-specific Fine-tuning

Let's train the model on the new data. Again, no need to rewrite anything here. Just import the training loop defined in Hands-on #1 and use it. Note from the `TRAINING_CONFIG` at the beginning of this notebook, that we're training for more epochs, with a longer "patience", and with a lower learning rate. This is possible because we have less data. Let's see if it improves the final results.


<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the next cell to train the loaded model on the new data.
</div>


In [17]:
# import the training loop function from hands-on-1 
from ipynb.fs.defs.hands_on_1 import training_loop

# run the training loop, saving checkpoints in SAVE_DIR/fine_tuning (expected: 1 line)
# YOUR_CODE_START
history = training_loop(SAVE_DIR / 'fine_tuning', TRAINING_CONFIG, ...)
# YOUR_CODE_END

At this point, you know very well how to analyze the output of the training. Use the next cells (you can add more if you need to) to verify how the training went (plotting the learning curves). Then evaluate the model on the test set and plot the confusion matrix. All needed code has been introduced in Hands-on #1.

<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the next cells as you want, to check the results of your model (plot learning curves,  confusion matrices, etc.
</div>



In [18]:
# use this cell to plot learning curves, evaluate the model on the test set, and visualize the confusion matrix
# (expected: 4-5 lines)
# YOUR_CODE_START




# YOUR_CODE_END

<div class="alert alert-block alert-info">
<b>Questions:</b> 

- Did the model recover the lost accuracy? 
- Is there still a gap between validation and test set? Why?
- Looking at the confusion matrix, are the mistakes made by the model "reasonable"?
</div>

## Saving the (Scripted) Model

Before we move on with quantization, let's save the fine-tuned floating point model to disk. This will be useful for testing real-time inference on your laptop.
This time, we will use **torchscript** format for saving. This is a format that allows us to easily reload the model in a different script, even with a different environment and directory structure.

In [19]:
model_scripted = torch.jit.script(model)
model_scripted.save(SAVE_DIR / 'final_model_scripted.pt')

# Part 4: Quantization

All DNN models considered up to now used **32-bit floating point** for internal operations, and for storing weights and activations. However, our hardware target only supports Quantized DNN inference, using **8-bit integers**. Therefore, we need to convert our model to that format before we can export it and compile it.

Simply quantizing a model by replacing all floating point data with their closest integer approximation (the most basic form of Post-Training Quantization) could worsen its accuracy. Fortunately, this drop can often be recovered by running some epochs of the so-called **Quantization-Aware Training (QAT)**. Essentially, QAT is a training that "simulates" the fact that weights and activations will be quantized, and allows the gradient-descent-based optimizer to modify the weights in order to compensate the approximation error introduced by quantization.

The **[PLiNIO](https://github.com/eml-eda/plinio) DNN optimization library**, introduced in Hands-on #1, can be used to perform QAT on our model, as well as allowing to export the final "full integer" model in a format compatible with the compiler used in one of the next sessions.  

More precisely, PLiNIO's QAT function is embedded in the `MPS()` class, which is used to perform a more advanced optimization, called **Mixed-Precision Search**. 
We will not use MPS in this session, since our target hardware and backend library do not support $<8$ bit inference (*). However, we can still use PLiNIO to perform a simple QAT run, by simply reducing it to a **"corner case" of MPS, with a single precision** (8-bit) to select from.

If you're interested in the details on the MPS algorithm present in PLiNIO, check-out these two papers: [link1](https://arxiv.org/abs/2206.08852), [link2](https://arxiv.org/abs/2004.05795). After you finish this hands-on, feel free to also try applying MPS with multiple precisions on our DNN for EMG gesture recognition, as an extra. Although we won't be able to deploy models with precisions different from 8-bit, it could still be interesting to check how much we can compress the weights without losing too much accuracy.

 
(*) Actually, the DNN accelerator present in GAP9 would support those precisions, but we will only deploy on the multi-core RISC-V cluster.

## Preparing for QAT

Let's start by importing the required PLiNIO classes and functions. Some of these will be needed only for the final export:

In [20]:
from plinio.methods import MPS
from plinio.methods.mps import get_default_qinfo
from plinio.methods.mps.quant.quantizers import PACTAct
from plinio.methods.mps.quant.backends import Backend, integerize_arch
from plinio.methods.mps.quant.backends.match import MATCHExporter

Next, we have to apply the `MPS()` constructor to our patient-specific fine-tuned DNN. The constructor expects the following inputs:

- The model to be converted
- The cost metric to be optimized (this is for an actual MPS run, since we have a single precision, we can ignore this parameter)
- The shape of a single input (for internal graph conversion passes)
- A `qinfo` dictionary, containing settings on the desired type of Quantization to apply for different parts of the network.

The settings in `qinfo` include the quantization algorithm to use for weights and activations (e.g. min-max, PaCT, etc), and optional configuration parameters. We do not have time to go over these details, but please refer to survey papers such as [this one](https://arxiv.org/abs/2106.08295) for more information. In our case, it suffices to use the reasonable default settings provided by PLiNIO, by calling the `get_default_qinfo()` function. This function expects as input parameters the tuple of weights and activations bitwidths to be included in the optimization (in our case, only 8-bit for both).

There's just one thing to customize in the default `qinfo`, namely the range of the DNN **input** quantizer. In fact, since we know that our (float) data is in the $[0, 1]$ range, we can set the initial range of the quantizer to be the same. This should facilitate the conversion.

Let's create our `MPS()` instance.

<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the next cell to create an instance of the MPS() class with the correct parameters.
</div>


In [21]:
# get the default qinfo dictionary, specifying 8-bit as the only precision for both weights and activations
qinfo = get_default_qinfo((8,), (8,))

# modify the default qinfo for the input layer, since we're using signed data in the [0, 1] range
qinfo['input_default']['quantizer'] = PACTAct
qinfo['input_default']['kwargs'] = {'init_clip_val': +1}

model_copy = copy.deepcopy(model)
# create a MPS() instance passing model_copy and the correct parameters to the constructor
# don't forget to move the newly created model to our training device (expected: 2-3 lines)
# YOUR_CODE_START


# YOUR_CODE_END

The model generated by the MPS constructor has approximated weights and operations that simulate int8 precision. Furthermore, other optimizations are performed during the conversion, such as folding Batch Normalization layers with Convolutions or Linear layers, to avoid entirely their execution in the final deployed model. Overall, the result of the conversion is similar to what we would get with a (very basic) post-training quantization. Let's check how this model performs on our test data.


In [22]:
# evaluate the post-conversion mps_model on the test set and print the results
test_loss, test_acc, test_macro_acc = evaluate(mps_model, criterion, test_dl, device, num_classes=9)
print(f"Test Loss: {test_loss:.2f}, Test Acc: {test_acc:.2f}, Test Macro Acc: {test_macro_acc:.2f}")

**Interesting!!**

You should see that the test accuracy is preserved quite well, possibly with a minimal drop! It seems that, for our simple task, Post-Training Quantization could suffice.
We could stop here, and directly export the model for deployment. But since QAT is very useful on more complex problems, let's see how we could run it.


## Running QAT

The actual execution of QAT is nothing more than an additional training run, using the model generated by the `MPS()` constructor. Note that, if we wanted to actually *select* the bitwidth using MPS, we would have to run something similar to the `nas_loop` seen in Hands-on #1. However, we're keeping a fixed precision, and do not aim to optimize the DNN cost (e.g. total memory occupation) in this phase. Or better, we already reduced memory by a factor of 4 by moving from float32 to int8, but now, our goal is just retrieving the lost accuracy.

So, in our case, a simple `training_loop` on the converted model will suffice.

<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the next cell to run a standard training loop on the MPS model.
</div>

In [23]:
# run the training loop, saving checkpoints in SAVE_DIR/qat (expected: 1 line)
# YOUR_CODE_START
history = training_loop(SAVE_DIR / 'qat', TRAINING_CONFIG, ....)
# YOUR_CODE_END

Let's test the final model after QAT.


In [24]:
# evaluate the post-conversion mps_model on the test set and print the results
test_loss, test_acc, test_macro_acc = evaluate(mps_model, criterion, test_dl, device, num_classes=9)
print(f"Test Loss: {test_loss:.2f}, Test Acc: {test_acc:.2f}, Test Macro Acc: {test_macro_acc:.2f}")

After QAT, it is possible that your model has become even **slightly more accurate** than the floating point version! This sometimes happens when quantizing: the approximation introduced by quantization has a *regularizing* effect, which makes the model behave slightly better on unseen data.

<div class="alert alert-block alert-info">
<b>Question:</b> Combining Channel-pruning with PIT (in Hands-on #1) and 8-bit Quantization (in this notebook), by how much did we compress the model in total? Where do you think there's room to reduce it even more?
</div>

## Export for Deployment

We can now call the `.export()` method of the PLiNIO MPS model, exactly as done in Hands-on #1 for PIT. Notice however, that in this case, the exporting phase has a slightly different behaviour. In fact, rather than outputting a model that includes standard torch layers, we replace each quantized layer with a new class (for instance, `nn.Conv2D` becomes `QuantConv2D`). These layers function analogously to the torch equivalents, but also store the quantization parameters (e.g. min/max values for each weight tensor), and use them to simulate the effect of quantization during inference.


In [25]:
final_mps_model = copy.deepcopy(mps_model)
quant_model = final_mps_model.export()
quant_model = quant_model.eval()

Moreover, the model exported by MPS still only applies a so-called "fake quantization". This means that the DNN is not yet using *only* integer data. Rather, some parameters, such as the scale factors for (re-)quantization are still in floating point. To deploy on our target, however, **all data should be integer**. For instance, in the PULP-NN backend library that we will use on our target, the multiplication times a floating point scaling factor is replaced by the sequence of: i) an integer multiplication and ii) a right shift.

To obtain a model that fully complies with this execution model, we need a further conversion step. This is implemented by the next cell, which calls the `integerize_arch` function with parameters that specify the desired backend, among the supported ones, and some other optional parameters. The backend essentially refers to the compiler that will be used to take the model and convert it to inference code for the hardware. In our case, it will be the [MATCH](https://github.com/eml-eda/match) compiler.


In [26]:
# convert the model to full-integer, compiler-compliant format
full_int_model = integerize_arch(quant_model, Backend.MATCH, backend_kwargs={'shift_pos': 16})
full_int_model = full_int_model.to(device)

Since the previous conversion removes all floating point operations from the network, it might affect the accuracy (minimally). So, let's verify by how much.


In [27]:
# evaluate the full-integer model on the test set and print the results
test_loss, test_acc, test_macro_acc = evaluate(full_int_model, criterion, test_dl, device, num_classes=9)
print(f"Test Loss: {test_loss:.2f}, Test Acc: {test_acc:.2f}, Test Macro Acc: {test_macro_acc:.2f}")

We now have to save the final model. In this case, differently from Hands-on #1, we don't need to save it *just* in PyTorch format. Rather, we want to generate an ONNX file compatible with what the MATCH compiler expects. The following cell does that. 

It essentially generates the ONNX using torch's built-in utility, and then adds some custom annotations as desired by MATCH. Let's run it (you can safely ignore the warnings that appear).


In [28]:
# export to onnx (after making a copy of the model to avoid destroying it).
single_batch_shape = (1,) + train_ds[0][0].shape
exporter = MATCHExporter()
exporter.export(copy.deepcopy(full_int_model), single_batch_shape, SAVE_DIR)

Last but not least, we also have to save a new rescaling parameter for later reuse. In fact, while the floating point model takes as input normalized samples in the range $[0, 1]$, the full-integer model also uses **8-bit input data**. 

The MPS class in PLiNIO (and the following integerization step) use an "Input Quantizer" layer to perform this transformation. The Input Quantizer is of type `PACTAct`, as specified in the `qinfo` data structure above. This type of quantizer implements the following transformation:

$$X_{int} = \left\lfloor \frac{255}{\alpha} \cdot \min(X_{float}, \alpha) \right\rfloor $$


where $\alpha$ is a **learned** clipping value.

Note that, in order to run the full pipeline on GAP9, we would also need to run the **filtering** using integers. However, for simplicity, in the following sessions we will still perform filtering *in floating point*, then rescale the (float) data to $[0, 1]$, and finally convert them to integers.  Overall, the complete normalization pipeline will be the following:

$$X\mathrm{\ (Unnormalized,\ filtered\ input\ window)} \rightarrow X_{float} = \frac{X - \min(X)}{\max(X) - \min(X)} \rightarrow X_{int} =  \left\lfloor \frac{255}{\alpha} \cdot \min(X_{float}, \alpha) \right\rfloor  $$


Let's extract the clipping value from the full integer model, and save it on disk for later usage.

We will update the same JSON file used for the floating point scaling normalization factors.


In [29]:
qtz = full_int_model.input_1_input_quantizer
clip_val = float(qtz.out_quantizer.clip_val)
print(f"Input clipping value: {clip_val}")

with open(f"{SAVE_DIR}/rescaling_values.json", "r") as f:
    scaling_dict = json.load(f)

scaling_dict['clip_val'] = clip_val
with open(f"{SAVE_DIR}/rescaling_values.json", "w") as f:
    json.dump(scaling_dict, f)

You will probably see that the clipping value remained very close to the initialization value (+1). That is, our model prefers to avoid  any clipping, and map the entire input range to the int8 range.

# Extras

## Retraining on all data

**NOTE**: Do this only if you have spare time, your model will work regardless.


One important thing that you might want to do before the next Hands-on session is relaunching this script (after changing `SAVE_DIR`), setting `concatenate_all=True` in the `prepare_data()` function. This will retrain everything using the combined training and test sets as training set (and keeping just a small held-out validation set for things like early-stopping).

This can give you a slightly more accurate final model, since it will be trained on data from both sessions. Of course, the test accuracies computed in this notebook won't be meaningful anymore, but doing this is fair, since your actual "test" data will be the one that your model will encounter in-field.

## Others

Other things you could play with include:
- Using PLiNIO to actually perform a mixed-precision search on the model, and see if you can compress it even more without losing accuracy. Remember: this won't be supported by our hardware target, but remains an interesting test. It will require you to use a `nas_loop` similar to the one seen in Hands-on #1, and impose a cost metric such as `params_bit`, which estimates the total model size in bits (thus accounting for bitwidth), whereas `params` only *counts* the parameters.
- Playing with the training configuration (learning rate, batch size, patience, type of optimizer, label smoothing, etc) to try and further increase the test accuracy of the model.
- etc.