# General Introduction

This section provides you with a general introduction to hands-on sessions.

The idea is that you will work on your own, following the instructions reported in the notebooks. The instructors will be available to answer questions, discuss issues and problems individually and/or with the rest of the class. At the beginning of each notebook, there's a small summary of its content. You should be able to complete the whole notebook in 2 hours.

Throughout the notebooks, whenever you find a blue box like this one:

<div class="alert alert-block alert-info">
<b>Question:</b> What should you do when you find a blue box?
</div>

It means that there's a question for you to think about. Instead, a yellow box like this one:

<div class="alert alert-block alert-warning">
<b>Task:</b> Complete the following cell by adding X lines of code.
</div>

It means that there's some code for you to write. Finally, only for this notebook tou can also find a green box like this one:

<div class="alert alert-block alert-success">
<b>Code block:</b> Check the following code in the codebase.
</div>

It means that you have to check the codebase to understand what is happing in the functions.

# Hands-on #4: DNN Compilation and Deployment

In this notebook, you will:
1. Take the onnx exported quantized model from Hands-on #3, import it in the MATCH compiler, visualize it, and compare it with the onnx graph.
2. Transform the graph with MATCH Compiler and generate the final C code.
3. Test the GAP_SDK and run a hello-world on the virtual simulator.
4. Test the connection to the GAP9 board.
5. Test the network generated on the board by sending data from your collected dataset and reading back the result.

# Setup

1. Define paths for match_output, network, and dataset.  
2. Specify the paths of the GAP_SDK to be used on GAP.  
3. Specify the paths of MATCH and TVM tools.

In [1]:
import pathlib
import os
import sys

#compilation tools
# set the paths to correctly set up match
username = os.environ.get('USER')
summerschool_path = f'/scratch/{username}/efcl-school-t3/' # ADAPT IF NECESSARY

MATCH_OUTPUT = pathlib.Path(f'/scratch/{username}/match_gap9/match_gap/match_output')
NETWORK_PATH = summerschool_path + "checkpoints/hands_on_3/GraphModule.onnx"
DATASET_DIR = pathlib.Path(summerschool_path + "experiments/hands_on_3/preprocessed_data")
os.environ["GAP_SDK"]=f'/scratch/{username}/gap_sdk_private/'

os.environ["MATCH_PATH"]=f'/scratch/{username}/match'
os.environ["TVM_HOME"]=os.environ["MATCH_PATH"]+"/match-tvm"
sys.path.insert(0,f"{os.environ.get('TVM_HOME')}/python")
sys.path.insert(0,f"{os.environ.get('MATCH_PATH')}/zigzag")
print(os.environ.get("TVM_HOME"))

First, we have to load Python packages for the different tasks:
1. **torch** and **matplotlib**: needed to load data for the final application and visualize them;
2. our compilation tools: **match** and **match_gap**. To load MATCH, you set up above the correct paths for MATCH, TVM, and ZigZag;
3. library for UART communication for the final application: **serial**;
4. library for Onnz import and display: **onnx** and **netron**;
5. ipython widgets and HTML packages: **ipywidgets** and **HTML**;
6. **utils.deployment**: custom functions for GAP9 deployment;

In [2]:
import numpy as np
import json

import torch
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

import match
import match_gap

# needed to support UART Communication
import serial

#import onnx graph and display it
import onnx
import netron

#import widgets and display libraries
import ipywidgets as widgets
from IPython.display import HTML

#deployment functions
import utils.deployment as utils

# Step 1: Import the graph in MATCH

During the first hands-on step, we'll take our generated network from Hands-on #3.  
This network has been quantized and exported with the Plinio tool using the MPS algorithm. Note that the exported is specifically thought to create an Onnx in a format that could be read from the compilation tool, MATCH.  
First, we import the network. Then, in the first step, MATCH transforms the Onnx in the Relay Graph that uses Relay-IR to identify each network operator..

<div class="alert alert-block alert-info">
<b>Task:</b> Check the format of the <b>get_relay_network</b> function in match/match/run.py. Note that the params_filename should be empty when using the onnx format.
</div>



In [3]:
# look at the relay network by using the get_relay_network. 
relay_mod, _ = match.get_relay_network(filename=NETWORK_PATH)

Compare the ONNX graph and the RELAY graph.

<div class="alert alert-block alert-info">
<b>Question:</b> Do you see any difference between the graphs? 
</div>

In [4]:
# Load your ONNX model
onnx_model = onnx.load(NETWORK_PATH)

# Save the ONNX model locally if it's not already saved
onnx.save(onnx_model, NETWORK_PATH)

# Serve the model using Netron
netron.start(NETWORK_PATH, browse=False)
netron_url = "http://localhost:5997" # ADAPT IF NECESSARY

# Assuming relay_mod is already defined in your environment
relay_mod_str = str(relay_mod)  # Convert relay_mod to a string representation

# Create the HTML content
html_content = f'''
<div style="display: flex;">
    <div style="flex: 1; padding: 10px;">
        <h3>ONNX Model Graph</h3>
        <iframe src="{netron_url}" width="600" height="5000"></iframe>
    </div>
    <div style="flex: 1; padding: 10px;">
        <h3>Relay Module</h3>
        <pre>{relay_mod_str}</pre>
    </div>
</div>
'''

# Display the HTML content
display(HTML(html_content))

# Step 2: Partition the graph and match with HW primitives

In the Step 2, we will see how the input network is transformed in MATCH, using HW specific features and exploiting HW specific kernels and models.  
The first thing to define is the HW target: in our case, we will use the GAP9 SoC comprised in your GAP9 Evaluation board.

<br>
<center><img src="./assets/GAP9_blockdiagram.png" alt="setup" class="bg-primary" width="800px"></center>
<center> Fig.1: GAP9 block diagram. </center>
<be>

The GAP9 SoC has a main core (*Fabric Controller*) to manage I/O and 9 cores (*Compute Cluster*) for workload accelerator. All 10 cores are based on the RISC-V Instruction Set Architecture extended with custom instructions automatically used by the GAP toolchain. The Cluster is perfectly adapted to handling combinations of neural network and digital signal processing tasks delivering programmable compute power at extreme energy efficiency. In terms of memory, GAP9 features a 2-level internal hierarchy, with a larger (2 MB) but slower L2 memory, and a faster but smaller 128 kB L1 memory, accessible from Cluster cores with single-cycle latency. This memory is the one to be used to maximize the efficiency of your neural network.  
GAP9 also features the NE16 accelerator for CNNs that however **will not be used in this hands-on**.

When setting the target in MATCH, you have to exclude the NE16 HW accelerator from the modules included in the deployment.

In [5]:
#define GAP9 HW Target inside match
target=match.target.Gap9()

#disable the utilization of the NE16 in the GAP9 target
target.disabled_exec_modules=[]
target.disable_exec_module("NE16")

In the following, we will parse the Onnx input network with the function *match.match()* to generate the compiled application.  
Despite being a single call to a function, it is important to understand what is going on inside match.

<div class="alert alert-block alert-success">
<b>Code block:</b> In order to understand, check the following code in the codebase of MATCH at MATCH_PATH.<br>
    1. <b>match</b> function in match/match/run.py;  <br>
    2. <b>driver</b>, tvm_compile function in match/match/driver/driver.py;<br>
    3. <b>tvmc_compile_and_unpack</b> match/match/relay/utils/utils.py;<br>
</div>

Some comments on these functions:  
1. **match** has a series of default operations for TVM and the driver function needed to create the output C code; the *match* function generate the layer functions which will be then called inside the main. You can check them in *MATCH_OUTPUT/codegen/host/src/*
2. **driver** function. This is the calling function of the compilation process. Inside the function, you specify the input, the output directory and the target platform for which you want to generate the code. **Note that this is crucial, given that MANY optimization steps depends on the target HW**; arguments 
    - *relay_mod*: relay graph obtained in *get_relay_from* function;
    - *relay_params*: needed only if not giving an onnx as an input;
    - *target*: platform for which you want to generate the code;
    - *output_path*: directory where the code is saved;
3. **tvmc_compile_and_unpack** function: it calls the final tvm compilation
4. the **compile_model** function is the TVM-called function that does the job. In particular, you can check the functions that do two specific passes you saw in the lecture:
    -  *apply_graph_transforms*
    -  *partition_function*
   
<div class="alert alert-block alert-success">
<b>Code block:</b> Check carefully <b>compile_model</b> at MATCH_PATH/match-tvm/python/tvm/driver/tvmc/compiler.py
</div>

Inside the function, you will see that most of the variable used depends on the HW target.

#### match.match function
Reading the *compile_model* function, you may have explored the partitioning function.  
To create the final code for a target HW, MATCH performs the following operations:  
1. *pattern matching*: it checks the patterns in the relay graph and compares them with a series of patterns saved for each HW target. MATCH saves for each new HW target a list of patterns that have to be compared with the relay graph; Check the patterns for GAP9 at **MATCH_PATH/match/target/gap9/cluster/partitioning_patterns.py**
2. *graph_transformation*: after matching the pattern, the relay graph is modified, and the corresponding patterns are mapped to the target HW;
3. *BYOC offloading*: for each block, the mapping is generated with ZigZag. Note that in the output below, you will read an estimated latency and energy: these estimations are used by ZigZag to split and tile the layer loops optimally.


The output in the next cell will show the mapping of each node.

## BYOC and ZigZag
As you saw during the lesson, MATCH uses the ZigZag tool as BYOC add-on to TVM: https://github.com/KULeuven-MICAS/zigzag  
ZigZag is a tool from KU Leuven University that gives us the possibility to perform two main operations:
1. *Spatial Tiling*: distribute operations among different processing elements (PE) in our architecture;
2. *Temporal Tiling*: distribute operations over time;

Let's have a simple example: consider a systolic array with dimension 4x4, which parallelizes on output channel (K) and one of the output dimensions (OX)
<br>
<center><img src="./assets/Systolic.png" alt="setup" class="bg-primary" width="600px"></center>
<center> Fig.2: Systolic array of dimension 4x4. </center>
<be>

In this case, the spatial tiling will be K=4, OX=4. In other words, it means that 4 output channels and 4 output positions will be processed in parallel by our HW.
On top of the spatial tiling, all the other loops will be temporally tiled: at every new time step, a new tile will be executed.
Let's consider a simple convolutional layer with input 16x16 (IX = IY = OX = OY), C=8, K=16, and Fx=Fy=3:

<br>
<center><img src="./assets/tiling.png" alt="setup" class="bg-primary" width="600px"></center>
<center> Fig.3: Tiling split of ZigZag. </center>
<be>

As the figure shows, the K and OX loops are split to be temporally and spatially tiled.  
Note that beside the tiling, ZigZag also performs loop re-ordering and memory allocation. If you are interested to all the details about ZigZag, you can read the paper from KU Lueven:  
https://arxiv.org/abs/2007.11360

Now, moving to our GAP9 architecture, we do not have a "fixed" systolic array, but 8 general purpose cores. We use the PULP-NN library (https://github.com/pulp-platform/pulp-nn) as convolutional backend. This library does not have spatial tiling constraints but is optimized to execute with OY = 8, OX = 2, K = 4. Therefore, while we never force a specific spatial tiling, we always try to be near these values as possible.

### Pattern Matching & Partitioned Graph

Final note before looking at the actual output of the function. MATCH prints the expected latency and energy for the final configuration of every pattern matched (*do you remember gap9.cluster.conv2d_bnorm_requant from the partitioning_patterns.py file?*). Note that to choose the final tiling, loop order, and loop partitioning, ZigZag uses internal HW models that compute energy and latency based on each different configuration. After a model-based search, the final hyperparameters are chosen. The ones that you see as output of the function, are only the best latency and energy values found for each layer.

In res.mod variable, you can also read each specific node processed by MATCH of your specific network.

In the first part, you can see the main structure with the call to the different individual layers of the network.
As you may notice, most of the layers are identified with the acronym tvmgen to specify they have been processed by the BYOC section of MATCH.

In the second part, you can look at which individual functions are called inside each *pattern* recognized by the BYOC.

In [6]:
res=match.match(filename=NETWORK_PATH,target=target,output_path=MATCH_OUTPUT)
print(res.mod)

### MATCH Tiling and solution: ZigZag output

After the pattern recognition, each individual recognized pattern is processed by ZigZag as detailed above.

ZigZag not only spatially and temporally tiles the layer, but also assigns a memory level to each different tensor at each loop.

In the following code, you can check how MATCH temporally tiles the layer and which memory locations are assigned to the different tensors.

**You will probably notice that all tensors are allocated in L1 (the last level cache), while never being allocated in L2 (Main Memory). Therefore, we will not need to tile any layer to reduce the memory footprint since each layer, individually, fits already the L1.**

In [7]:
!cat {MATCH_OUTPUT}/match_schedules.log

### MATCH (TVM) Memory Plan

A final check you may want to do is the L2 memory allocation (think to the algorithm you saw during lesson).  
For this operation, you will extract data from the TVM outputs, using functions defined inside *utils.deployment*.  
1. With the first function, you define the anchors, i.e., the starting adress of each tensor;
2. With the second function, you define the tensors dimension;

In [8]:
sids=dict()
sids = utils.define_memory_anchors(sids, match_output_path = f'{MATCH_OUTPUT}')

for key in sorted(sids.keys(), key=int):
    print(f"{sids[key]}\n")

In [9]:
sids = utils.annotate_memory_size(res.mod["main"].body, sids)

for key in sorted(sids.keys(), key=int):
    print(f"{sids[key]}\n")

In [10]:
import matplotlib.pyplot as plt

# Define a list of colors for the bars
COLORS = ["maroon", "orange", "cyan", "cadetblue", "goldenrod", "rebeccapurple", "darkmagenta",
          "khaki", "darkseagreen", "steelblue", "firebrick", "green", "royalblue", "limegreen",
          "olive", "indianred", "dimgray", "black"]

# Create a new figure for the plot
plt.figure(figsize = (12,4))

# Iterate over the values in the 'sids' dictionary
for idx, sid in enumerate(sids.values()):
    # Plot a bar for each entry in 'sids'
    # x position is the midpoint of the interval [first, last]
    # width is the length of the interval plus 1 (to include the last time step)
    # bottom is the workspace offset
    # height is the size of the tensor
    # color is chosen from the COLORS list, cycling through if there are more bars than colors
    # edgecolor is set to black to color the borders of the bars
    plt.bar(x=sid["first"] + (sid["last"] + 1 - sid["first"]) / 2, width=(sid["last"] + 1 - sid["first"]),
            bottom=sid["workspace_offset"], height=sid["size"],
            color=COLORS[idx % len(COLORS)], edgecolor='black')

# Add a vertical line at x=0 for reference
plt.axvline(x=0, color='k', linewidth=1.5)

# Label the x and y axes
plt.xlabel("Time (1 layer at each step)")
plt.ylabel("Buffer")

# Set the title of the plot
plt.title("Memory allocation of intermediate tensors")

# Enable grid lines for both major and minor ticks
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Set x-axis ticks to be at every integer within the range of the data
plt.xticks(range(int(min(sid["first"] for sid in sids.values())), 
                 int(max(sid["last"] for sid in sids.values())) + 1))

# Display the plot
plt.show()


<div class="alert alert-block alert-info">
<b>Question:</b> Look at the above graph. Analyze the memory allocation considering the algorithms you saw during lesson. Which algorithm seems to use?
</div>

# Step 3: Load and test the GAP9 SDK

For Deployment, we need two main components:  
1. **C Code** of the input deep neural network;
2. A working **Software Development Kit (SDK)**: the SDK is provided by the GAP9 vendor. For you, it is already pre-installed in the machine;

If you want to navigate the SDK, check it on the website: *https://github.com/GreenWaves-Technologies/gap_sdk*
Note that this is not the version you are using, since it is still restricted.

First of all, we want to load the correct SDK and try the board on a simple ***hello world*** application.
To do so, we need to execute the following steps:
1. add the GAP_SDK to the user’s OS environmental variables (we did it in the second cell of the notebook already);
2. navigate inside the folder;
3. source the correct board with the *sourceme.sh* file
4. Execute a simple example on the GAP9 online simulator, GVSoC;

Note that we first don't use the board, but the virtual simulator. To do it, you have to specify the argument *platform=gvsoc* inside the make all command

## GVSoC
Few worlds about the simulator. The Greenwaves GAP GVSOC is a versatile simulation framework tailored for Greenwaves Technologies' GAP series of processors. It provides cycle-accurate modeling of the GAP8 and GAP9 SoCs, enabling detailed performance analysis and verification. You can use GVSOC to efficiently debug, optimize, and validate software and hardware components before deployment. Its integration with development tools and open-source nature enhances the design and testing process in embedded systems.

In [11]:
!cd $GAP_SDK && source sourceme.sh 1 <<< 1 && pwd && cd examples/gap9/basic/helloworld && cmake -B build && cmake ../helloworld/build -DCONFIG_PLATFORM_BOARD=NO -DCONFIG_PLATFORM_GVSOC=YES && cmake --build build --target run

# Step 4: Test the connection to the board and the network loading

## MATCH Network
Now let's first have a look at the files generated by MATCH:
1. in the *match_output/src/* folder there is the main.c file and all the support files needed for GAP9.
2. in the *match_output/codegen/host/src/* folder there is all the code generated by MATCH: in *default_lib0.c* and *default_lib1.c* the full network is reported. From *default_lib2.c*, each file contains the code generated for each layer. 

<div class="alert alert-block alert-info">
<b>Question:</b> Which functions are called inside the layer? Which are the dimensions of the layer? Check in the onnx if they correspond.
</div>

<div class="alert alert-block alert-success">
<b>Code block:</b> Check the code of the <b>default_lib2.c</b> file, understand to which layer corresponds and reply to the questions.
</div>

In [12]:
!cat {MATCH_OUTPUT}/codegen/host/src/default_lib2.c

<div class="alert alert-block alert-success">
<b>Code block:</b> Check the code of the <b>default_lib0.c</b> and <b>default_lib1.c</b> file and understand the network execution in C. It is a bit more complicated than net.eval() &#128513.
</div>

Then, let's test our network on the data you previously collected in hands-on 3.
Despite the data having already been pre-processed and normalized, we still need to integerize them before feeding into the network (you saw in the previous hands-on that this step is automatically done in the MPS class in PLiNIO using an "Input Quantizer"):

$$X_{int} = \left\lfloor \frac{255}{\alpha} \cdot \min(X_{float}, \alpha) \right\rfloor $$

To perform the data integerization, we load all the data you saved in the previous hands-on.  
**Note that these data are patient-specific!! You can not take them from your colleagues**


In [13]:
data = np.load(DATASET_DIR / "train_data.npy")
labels = np.load(DATASET_DIR / "train_labels.npy")

# Open and load the scaling value
with open(summerschool_path+'checkpoints/hands_on_3/rescaling_values.json', 'r') as file:
    scaling_values = json.load(file)

# Fill this value with the ones found in the hands-on 3
clip_value = scaling_values['clip_val']
scale_factor = (2**8 - 1) / (clip_value + 1e-3)

# Index of the data to plot
i = 320
print(labels[i])

<div class="alert alert-block alert-info">
<b>Task:</b> Look at the function to integerize data and to the following graph. Are they identical to the floating point? Which is the theoretical error between the 2?
</div>


In [14]:
def integerize_data(data, clip_value, scale_factor):
    data = torch.clamp(torch.tensor(data), 0, clip_value)
    data = torch.floor(scale_factor * data)
    return data


After, we need to verify that the integerization of the input data works correctly.
Therefore, we plot the original and the integerized version of the data, before feeding to the GAP9 network.

## Data Format

Normally, in torch, networks receive the data in the CHW format, which means that 2 consecutive data belong to the same channel in successive spatial positions. On the other hand, GAP9 needs data to be ordered in the HWC format.  
**Check your data format!**

In [15]:
# Assume data, clip_value, scale_factor, and integerize_data are already defined

data_integer = integerize_data(data[i], clip_value, scale_factor)

# Define the zoom range
zoom_start = 165
zoom_end = 170

# Create a figure and set of subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Plot data_integer in the first subplot
ax1.plot(data_integer, label='Integerized Data')
ax1.set_title('Integerized Data')
ax1.set_xlabel('time [samples]')
ax1.set_ylabel('Value')
ax1.grid(True)

# Plot data[i] in the second subplot
ax2.plot(data[i], label='Original Data')
ax2.set_title('Original Data')
ax2.set_xlabel('time [samples]')
ax2.set_ylabel('Value')
ax2.grid(True)

# Add insets for zoomed-in plots
axins1 = inset_axes(ax1, width="40%", height="40%", loc='upper right')
axins2 = inset_axes(ax2, width="40%", height="40%", loc='upper right')

# Plot zoomed-in data_integer
axins1.plot(data_integer)
axins1.set_xlim(zoom_start, zoom_end)
axins1.set_xticklabels([])  # Remove x-axis ticks
axins1.set_yticklabels([])  # Remove y-axis ticks
axins1.grid(True)

# Plot zoomed-in data[i]
axins2.plot(data[i])
axins2.set_xlim(zoom_start, zoom_end)
axins2.set_xticklabels([])  # Remove x-axis ticks
axins2.set_yticklabels([])  # Remove y-axis ticks
axins2.grid(True)

# Highlight the zoomed area on the main plots
ax1.axvline(x=zoom_start, color='gray', linestyle='--')
ax1.axvline(x=zoom_end, color='gray', linestyle='--')
ax2.axvline(x=zoom_start, color='gray', linestyle='--')
ax2.axvline(x=zoom_end, color='gray', linestyle='--')

# Adjust layout
plt.tight_layout()

# Show the plots
plt.show()

Now, let's verify the deployment of the network on the board works.  
In order:
1. Load an input of your collected data, with the corresponding label; **Pay attention to the format**. The data type should be integer, HWC format and flattened.
2. Use the match_gap command **match_gap.network_at** to deploy and execute the network on the board;
3. Compare the output with the golden label

**network_at** is a function from *match_gap*. You can check it at *match_gap9/match_gap/match_gap.py*.
This function is used to build a network compiled to C code by MATCH, The function performs the following:
1. Copies libraries (back-end kernels that provide efficient computation -- pulp-nn, helpers for memory movement, etc..)
2. Create the *Makefile* and the *main.c* for the application. The main.c simply reads an input from a memory location and runs the layers created by match sequentially.
  
As parameters, you have to provide the network path, the input, the SDK path, and the *board* parameter.
True if you want to run on board, False if you want to run on gvsoc.

In [16]:
input_data = integerize_data(data[i], clip_value, scale_factor).flatten().numpy().astype(int)

network_res=match_gap.network_at(match_res=res,network_path=str(pathlib.Path(f'{MATCH_OUTPUT}')),
                                 inputs=[{"name":res.match_inputs[idx]["name"],
                                          "size":res.match_inputs[idx]["size"],
                                          "values":input_data}
                                          for idx in range(len(res.match_inputs))],
                                 board = False,
                                 gap_sdk_path=os.environ.get("GAP_SDK"))
network_res

# Step 5: Create a small application that send data from the collected dataset to the board and classify them

Finally, let's create a small application which simulates the inference of a complete system. Your notebook will act as the sensing part of the board:
1. Send the "collected" data to the GAP9 board. Instead of directly collecting the data, we will use the data that you collected in the previous Hands-On #3. We will iterate over the dataset and send one data at the time to the board;
2. Connect the board throught UART to receive the data: in order too do so, we'll use a class called **MatchUART**. It contains 4 functions:
    - *open_connection* to open the connection on the correct serial port;
    - *infer* that will first send a *continue code* to the board, then it sends the real data, and finally it waits to receive back the network output;
    - *close_connection* which will simply send a *stop code* to the board;
    - *wait_board*, that is used to synchronize with the board;
3. Execute the network on board as soon as a continue code and the data are received.

In [17]:
from utils.deployment import MatchUART

To create an object from this class, we'll need the correct USB connection.
You can retrieve it by simply connecting the board and running the command **ll /dev/ttyUSB\*** before and after connecting the board to get which 2 USB connections were assigned to the board.
Note that the board generates 2 different USB connections, one which leads to the JTAG, to load and run the binary and the other one which is instead the UART.
The UART can be recognized because it will be assigned to the USB with the highest number out of the 2 board USB connections.
If more than 2 USB connections are generated, ttyUSB2 is usually the UART connection, but we encourage you to test and discover the correct one.

<br>
<center><img src="./assets/tty.png" alt="setup" class="bg-primary" width="800px"></center>
<center> Fig.4: ttyUSB list example. </center>
<be>

In [None]:
match_uart=MatchUART("/dev/ttyUSB2",res.match_output) # ADAPT IF NECESSARY
match_uart.open_connection()

This function from *match_gap* is similar to the **network_at**.
The only important difference is the creation of the main (look at it at *./match_output/src/main.c*).
The new main setups the UART connection and sends the status of the UART connection.
Then, it will busy wait for the status from the client: if it is 0, it will wait for the network inputs and then run the inference.
Then, it will return through UART the output from the network.
Note that **uart_network** will run the application in the background, letting it act as a server.

In [None]:
match_gap.uart_network(match_res=res,network_path=str(pathlib.Path(f'{MATCH_OUTPUT}')),
                                   gap_sdk_path=os.environ.get("GAP_SDK"),board=True)

Finally, we write the consumer part of our application.
The application asks the confirmation to continue to send data and perform inference, comparing the output with the golden label.
Each char inserted different from c/C will stop the application.

Note: If you repeat Step 5, make sure to also reset the GAP9 board. 


In [None]:
ret=match_uart.wait_board()
CLASSES = ["Rest", "Open Hand", "Fist", "Index Pointed", "Ok", "Right Flexion", "Left Flexion", "Fist", "Horns", "Shaka" ]
if ret!=0:
    print("Problem with the board")
else:
    while True:
        status=input("Do you wish to run another inference? (c/C to continue)")
        if status.capitalize()!="C":
            break
        idx_s = input("Write the number of the index of the data to infer from the dataset(max size:1200)")
        print(idx_s)
        if not idx_s.isnumeric():
            print("Not a number!")
            continue
        idx = int(idx_s)
        print(idx)
        inputs,label=integerize_data(data[idx], clip_value, scale_factor).numpy().flatten().astype(np.uint8), labels[idx]
        quantized_net_status,quantized_net_value=match_uart.infer([inputs])
        if quantized_net_status!=0:
            if quantized_net_value!=0:
                print("Error in inference")
                continue
            else:
                break
        else:
            print(f"Quantized model classified {CLASSES[np.argmax(quantized_net_value)]} (correct label is {CLASSES[label]})")
match_uart.close_connection()


# Extras
You can play with the code to produce a classification output using the post-processing:
- create a postprocessing with a majority voting on the consumer side of the application (this notebook);
- create a postprocessing with a majority voting on the server side of the application (GAP9);

### Majority voting 
Majority Voting is a technique used to improve the stability and accuracy of predictions by considering the most frequent prediction over a series of recent outputs rather than relying on a single prediction. In the context of gesture recognition, this method can smooth out temporary misclassifications due to noise or transient errors in the input data.

Implementation Steps
Collect N Predictions: Keep a sliding window of the last N predictions made by your gesture recognition model.
Count Frequencies: Count how many times each class (gesture) appears in the window.
Determine Majority Class: The class with the highest count in this window is considered the final prediction.