# fpgaConvNet deployment on PYNQ

This notebook shows how to run [fpgaConvNet](https://alexmontgomerie.github.io/fpgaconvnet-website/) IPs under the PYNQ framwork using the fpgaConvNet PYNQ driver. The model used here is the same as the one used in the [fpgaConvNet tutorial](https://github.com/AlexMontgomerie/fpgaconvnet-tutorial/), whose IP can be generated with the [end-to-end example](https://github.com/AlexMontgomerie/fpgaconvnet-tutorial/blob/main/end-to-end-example.ipynb) and whose bitstream can be generated by following the Parts 1 to 4 of the [hardware tutorial](https://github.com/AlexMontgomerie/fpgaconvnet-tutorial/blob/main/hardware-tutorial.ipynb). Make sure to select the PYNQ-supported board/part during the fpgaConvNet and [SAMO](https://github.com/AlexMontgomerie/samo) workflow and at the creation of the Vivado project.

To use load the model with the fpgaConvNet PYNQ driver, the following files are required:
- The bitstream file (in this example, [`single_layer_tutorial.bit`](single_layer_tutorial.bit))
- The hardware handoff file (in this example, [`single_layer_tutorial.hwh`](single_layer_tutorial.hwh))
- The `.json` file describing the partitions of the fpgaConvNet model (in this example, [`single_layer_tutorial.json`](single_layer_tutorial.json))

**The files (`.bit`, `.hwh` and `.json`) in this tutorial have been generated for the PYNQ-Z2 board.**

In [1]:
from pynq import Overlay
from assets.tutorial_library import *
import numpy as np
from fpgaconvnet_pynq_driver import *
import os

## Get MNIST dataset

This will download the MNIST dataset and save it in the `MNIST` folder if it is not already present.

In [2]:
if not os.path.isfile('assets/MNIST/t10k-images-idx3-ubyte'):
    os.system('cd assets/MNIST && ./get_mnist.sh')

## Load the fpgaConvNet overlay
In order to load an overlay IP on PYNQ you need to upload to the platform the `.bit` (bitstream) and `.hwh` files, which can be found inside the `.xsa` file generated by Vivado during the hardware exportation. Also, to the fpgaConvNet PYNQ driver needs the `.json` file with the description of the implemented partition.

In [3]:
overlay = Overlay('single_layer_tutorial.bit')

In [4]:
fpgaconvnet_ip = overlay.fpgaconvnet_ip_0

The fpgaConvNet-generated IPs are automatically linked to the `FpgaConvnetDriver` defined at `fpgaconvnet_pynq_driver.py`. The `load_partition` obtains the dimensionally of the partition and allocates the input and output buffers, setting the model to be ready to start inference.

In [5]:
fpgaconvnet_ip.load_partition('single_layer_tutorial.json', 0)

## Test the implementation

Load the example input data.

In [6]:
img = get_MNIST_image(0)[0]

In [7]:
img.shape

(1, 28, 28)

> The fpgaConvNet driver expects the input datatype to have 4 dims: batch size, channels, rows and cols

In [8]:
img = np.expand_dims(img, 0)
img.shape

(1, 1, 28, 28)

**The entire inference functionality of the fpgaConvNet IP under PYNQ is embedded in the `.run()` function.**

In [9]:
Y = fpgaconvnet_ip.run(img)

Now, let's check if the model is working as expected. This cell loads the the outputs of sample `0` of the ONNX model used as staring point at the [end-to-end example](https://github.com/AlexMontgomerie/fpgaconvnet-tutorial/blob/main/end-to-end-example.ipynb).

In [10]:
Y_onnx = np.load('assets/output.npy')

Compute the _Mean Squared Error_ (MSE)

In [11]:
((Y.flatten()-Y_onnx)**2).mean()

1.3678097e-05

## Performance testing

Finally, the driver also provides a function to measure the performance of the model in terms of latency and throughput. If the platforms supports power measurement through the PMBus (as in the ZCU104 board), the power consumption is also measured, and the energy per inference is computed.

In [12]:
fpgaconvnet_ip.test_performance(img, int(1e4))

Getting latency...Done
Latency: 2.42 ms
Throughput: 413.45 inferences/s
