# Hands-on #5: Real-time Inference with Streaming Data from BioGAP

In this session, you will:
1. Review a script that receives data in real-time from BioGAP (through the same GUI application that you used for data acquisition), and feeds this data to the patient-specific fine-tuned model (i.e., under `/scratch/$USER/efcl-school-t3/experiments/hands_on_3/`) developed in Hands-on #3. 
2. Use this script on your local machine to perform real-time predictions on your own data.
3. Play with the inference parameters (stride among input sliding windows, number of aggregated predictions) looking for the combination that achieves the best performance.
4. Devise possible alternative mechanisms to further improve the prediction quality.


# Part 1: Analyzing the Inference Script

In this session, you will not have to write any code to complete the basic assignment. A script to run inference on your laptop is already provided to you as `hands_on_5.py`. Your first task will be to read and analyze the code in this script.

Before you do that, a brief clarification: this script will run inference with data *streamed in real-time from BioGAP's EMG sensors*. However, the DNN will run *on your laptop (and in floating point)*. In the previous session, you have seen how to deploy the quantized model on the GAP9 evaluation board, and test it with some pre-collected inputs (passed via UART). However, even with the DNN deployed, running the entire inference on BioGAP is not so straight-forward, as it would require:
- Writing low-level code for interfacing GAP9 with the Nordic nRF52811 MCU where the EMG sensor data are currently acquired. Or, in alternative, writing some device drivers to directly acquire the data on GAP9
- Implementing the entire pre-processing chain (filtering and normalization) in C, using integer data only.

For this summer school, we will check the real-time inference behaviour of our DNNs on your laptops only. A scheme of the setup that we will use is shown in the following picture:

<br>
<center><img src="./assets/ho5_setup.png" alt="setup" class="bg-primary" width="50%"></center>
<center> Fig. 1: The inference setup for this session.</center>
<br>

As shown, you will stream data to the acquisition GUI via Bluetooth. Then, these data will be forwarded to the Python script through a UDP socket. The Python script will prepare the data for inference and invoke the floating point DNN that you saved in a previous session. Let's now open the inference script and understand how it works.

The script uses three separate execution threads. An overview scheme of its functionality is shown in the following figure.

<br>
<center><img src="./assets/ho5_threads.png" alt="threads" class="bg-primary" width="80%"></center>
<center> Fig. 1: Architecture of the inference script.</center>
<br>


## The "Main" Program

Starting from the bottom of the script, look at the "main" part. This part mainly handles command line arguments. Our script takes the following 4 arguments:
- The path to the trained (and scripted) PyTorch model that we want to use for inference (take the one saved during Hands-on #3)
- The path to the JSON file containing the rescaling information for normalization
- The number of windows to consider for averaging predictions
- The stride between consecutive windows, as a fraction of the window length.

While the first two inputs are obvious, we will analyze the other two more in detail later.

After parsing the arguments, the script invokes the `main()` function. The latter, initially loads the model and the rescaling values. Then, it creates two separate execution threads, `DataReader` and `DataConsumer`, in charge of getting data from the acquisition GUI, and performing the inference, respectively. It also sets up two `Queue` FIFO data structures through which the threads can exchange information. Namely:
- `data_queue` will contain chunks of EMG signal, and will be written by `DataReader` and read by `DataConsumer`
- `pred_queue` will contain the labels predicted by the DNN, and will be written by `DataConsumer` and read in the main thread.

After this setup phase, the main function waits for predictions on the `pred_queue`, and when they arrive, it prints the predicted label on the console, while also updating a simple GUI window that visualizes an image of the predicted gesture. It also checks for the user closing the GUI window, or interrupting the execution via a `KeyboardInterrupt`. If any of the two happens, it sets the global variable `alive` to False to terminate the execution.


## DataReader

Next, look at the `DataReader` thread. This thread replaces the code that, on the actual BioGAP board, would use the Sensor's ADCs Device Drivers to collect a bunch of data to be then processed by the DNN.

The constructor of this class, after setting some attributes, opens a **UDP Socket** on address localhost, and port 4040. This is where the acquisition GUI will stream the data received from BioGAP. 

The thread execution method (`run()`) loops indefinitely, until an interrupt is encountered, collecting packets of data from the UDP Socket. Namely, packets of 38 bytes are processed, of which only 24 contain the actual data from the BioGAP ADCs (24bit, or 3-bytes per channel). These bits are first converted to integers considering the little-endian organization of the packet, and then re-scaled from unitless digital values to actual electrical voltages (unit = 1 Volt), based on hardware-dependent parameters. As soon as one **chunk** of data is available, it is added to the processing queue. Chunk sizes depend on the window length, which is 300-samples (or 0.6s), as set during our DNN training. Moreover, they also depend on the stride, that is, the fraction of common data, between two consecutive windows. Namely, a chunk is defined as a portion of window of length: $(1 - S) * 300$ where $S$ is the stride. 

## DataConsumer

The `DataConsumer` thread is where the DNN inference is implemented. The constructor of this class initializes the parameters of the Butterworth filter for pre-processing, sets the normalization scaling factors, and lastly sends the DNN model to the correct inference device (CPU or GPU), setting it to `eval()` mode.
Notice that, since we apply filtering to a stream of newly incoming data, we have to store the initial conditions, and update them before each application of the `lfilter()` function. Otherwise, we would experience spiking transients every time we apply the filter.

The `run()` method of the consumer thread initially creates two empty buffers, to store one window of data, and `num_windows` DNN predictions respectively. A first loop (lines 196-198) fills the input buffer, loading `n_chunks` chunks of data coming from the `DataReader`. Once the buffer is filled, the main loop is started which iteratively:
- Invokes the DNN to classify the current window
- Stores the prediction (i.e., the last layer logits) in the `predictions` array, replacing the oldest entry
- Obtains the final prediction as the mean of the `num_windows` outputs present in the array, and uses those values to extract the predicted class (with an argmax operation)
- Outputs the predicted class to the `pred_queue` FIFO.
- Gets a new chunk of data from the `DataReader` and uses it to replace the oldest chunk in the buffer.

<div class="alert alert-block alert-info">
<b>Question:</b> Given what you have learned about the architecture of GAP9, how would you organize the "main" inference code for that platform? How would you schedule the collection of data and the inference execution?
</div>

# Part 2: Running the Inference Script

Having understood how the script works, you are now ready to run it. 

Here are the steps:
- Mount the electrodes on your arm, as done during Hands-on #2. (**IMPORTANT: try to position the electrodes in the same way, or the accuracy will drop significantly).
- Start the acquisition GUI in `./biowolf_gui` by running `source run_app.sh`, setting it up to forward the data via UDP. Start the streaming process. Check the notebook for hands-on 2 for the detailed description.
- Launch the inference script, as indicated below.

The script can be triggered with the following command:

`python3 hands_on_5.py --model_path /scratch/$USER/efcl-school-t3/<PATH_TO_SCRIPTED_MODEL> --rescaling_path /scratch/$USER/efcl-school-t3/<PATH_TO_RESCALING_JSON>`

where the two paths are respectively:
- The one of the fine-tuned floating point model saved during Hands-on #3
- The one of the JSON file containing normalization min/max values, also saved during Hands-on #3.

If, for some reason, you were not able to generate those files, you can find pre-cooked version in the `./checkpoints` directory. However, clearly, *those pre-cooked checkpoints were trained on someone else's data*. So, they won't work well on you. If you are in this situation, please contact the instructors, and they will provide you an alternative way to test the script with pre-cooked checkpoints (although less fun than testing on your data in real-time! &#x1F641;).

<div class="alert alert-block alert-info">
<b>Question:</b> Is real-time inference equally accurate on all gestures? If not, are misclassified gestures "reasonable"?
</div>


# Part 3: Playing with the Inference Settings

Once you managed to test the inference script once, you can play with the `--num_windows` and `--window_stride` parameters, to see if you can obtain more robust performance. Of course, the evaluation will only be qualitative. 

<div class="alert alert-block alert-info">
<b>Question:</b> What are the trade-offs caused by changing the number of windows for prediction averaging and the stride between consecutive windows? What's the best setup in your opinion?
</div>



# Extra

If you managed to finish the basic tests in advance, here are some additional things you can think of/try to improve the results:
- Are there alternatives to averaging consecutive predictions? For instance, another option we took in consideration was to perform individual classifications, and then apply a majority voting on labels directly. Do you expect this to perform better or worse than averaging logits? If you want, try to modify the code to implement it. It shouldn't require too many changes.
- Do you see other ways to improve the robustness of the real-time inference?
- What would be the pros and cons of collecting a small amount of new data with the GUI and fine-tuning the model for a few epochs? Would that be a practically feasible option for a "product"?