# FINN - End-to-End Flow
-----------------------------------------------------------------

In this experiment, we will show how to take a simple, binarized, fully-connected network trained on the MNIST data set and take it all the way down to a customized bitfile running on a PYNQ board. 

In this notebook, we will export the brevitas model as .onnx file.

## 1. Brevitas export <a id='brev_exp'></a>
FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a PyTorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network.

First a few things have to be imported. Then the model can be loaded with the pretrained weights.


In [6]:
import torch
import onnx


from finn.util.test import get_test_model_trained
from finn.util.visualization import showSrc, showInNetron
from brevitas.export import export_qonnx
from qonnx.util.cleanup import cleanup as qonnx_cleanup

In [3]:
build_dir = "./"

In [4]:
tfc = get_test_model_trained("TFC", 1, 1)
export_onnx_path = build_dir + "/tfc_w1_a1.onnx"
export_qonnx(tfc, torch.randn(1, 1, 28, 28), build_dir + "/tfc_w1_a1.onnx"); # semicolon added to suppress log
qonnx_cleanup(export_onnx_path, out_file=export_onnx_path)

The model was now exported in QONNX format, loaded with the pretrained weights and saved under the name "tfc_w1_a1.onnx".
To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.

In [7]:
showInNetron(build_dir + "/tfc_w1_a1.onnx")

Serving './/tfc_w1_a1.onnx' at http://0.0.0.0:2222


Now that we have the model in .onnx format, we can work with it using FINN. For that, `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. `ModelWrapper` is imported from the [QONNX repo](https://github.com/fastmachinelearning/qonnx), this repository contains several functionality that is used in FINN. The model was exported in QONNX format, to feed it into the FINN flow, our first step is to convert it to the FINN-ONNX format.

In [8]:
from qonnx.core.modelwrapper import ModelWrapper
from finn.transformation.qonnx.convert_qonnx_to_finn import ConvertQONNXtoFINN

model = ModelWrapper(build_dir + "/tfc_w1_a1.onnx")
model = model.transform(ConvertQONNXtoFINN())

After the conversion we save the model and visualize it using Netron. As you can see, quantization is now expressed differently. Where we had Quant nodes before, there are now MultiThreshold nodes present in the graph.

In [12]:
model.save(build_dir + "/tfc_w1_a1_finn.onnx")
showInNetron(build_dir + "/tfc_w1_a1_finn.onnx", port=2222)

Serving './/tfc_w1_a1_finn.onnx' at http://0.0.0.0:2222


Now the model is prepared and could be simulated using Python. How this works is described in the Jupyter notebook about verification and can be found [here](tfc_end2end_verification.ipynb#simpy).

The model can now also be processed in different ways. The principle of FINN are **analysis and transformation passes**, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. A transformation pass changes the model and returns the changed model back to the FINN flow.

Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.