# Part 7: Deployment

In the previous sections we've seen how to train a Neural Network with a small resource footprint using QKeras, then to convert it to `hls4ml` and create an IP. That IP can be interfaced into a larger design to deploy on an FPGA device. In this section, we introduce the `VivadoAccelerator` backend of `hls4ml`, where we can easily target some supported devices to get up and running quickly. Specifically, we'll deploy the model on a [pynq-z2 board](http://www.pynq.io/).

In [None]:
from tensorflow.keras.models import load_model
from qkeras.utils import _add_supported_quantized_objects
co = {}
_add_supported_quantized_objects(co)
import os
os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']

## Load model
Load the model from `part4: quantization` (note you need to have trained the model in part 4 first)

In [None]:
model = load_model('model_3/KERAS_check_best_model.h5', custom_objects=co)

## Convert to hls4ml
We'll convert our model into `hls4ml`, with a few small changes compared to the previous use of the same model in part 4.
We target  `backend='VivadoAccelerator'` backend: this will wrap the HLS NN model, providing an AXI-Stream interface to our IP. We also specify `board='pynq-z2'`.

The pynq-z2 board is a popular board with a Zynq 7020 SoC. Since this device is much smaller than the Alveo we specified in previous sections, we set the `ReuseFactor` of all the `Dense` layers of the model to 64.

In [None]:
import hls4ml
import plotting
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['LayerName']['softmax']['exp_table_t'] = 'ap_fixed<18,8>'
config['LayerName']['softmax']['inv_table_t'] = 'ap_fixed<18,4>'
for layer in ['fc1', 'fc2', 'fc3', 'output']:
    config['LayerName'][layer]['ReuseFactor'] = 64
print("-----------------------------------")
plotting.print_dict(config)
print("-----------------------------------")
hls_model = hls4ml.converters.convert_from_keras_model(model,
                                                       hls_config=config,
                                                       output_dir='model_3/hls4ml_prj_pynq',
                                                       backend='VivadoAccelerator',
                                                       board='pynq-z2')
hls_model.compile()

We can query which other boards are currently supported (we're working to add more). The `VivadoAccelerator` backend introduces the `AcceleratorConfig` section of configuration. Here we can change some details of the interface to the accelerator IP.


The `create_initial_config` method (of any backend) can be used to create a template dictionary with the default parameters that you can use as a starting point. In the conversion above, we didn't change any of these settings so all the defaults are used.

In [None]:
print(hls4ml.templates.get_supported_boards_dict().keys())
plotting.print_dict(hls4ml.templates.get_backend('VivadoAccelerator').create_initial_config())

## Predict
Run the CPU emulation of the hls4ml NN and save the file to compare against the hardware result later.

In [None]:
import numpy as np
X_test = np.load('X_test.npy')
y_hls = hls_model.predict(np.ascontiguousarray(X_test))
np.save('model_3/y_hls.npy', y_hls)

## Synthesize
Now synthesize the model, and also export the IP.

In [None]:
hls_model.build(csim=False, export=True)

## Make bitfile
Now we've exported the NN IP, let's create a bitfile! The `VivadoAccelerator` backend design scripts create a Block Design in Vivado IPI containing our Neural Network IP, as well as the other necessary IPs to create a complete system.

In the case of our `pynq-z2`, we add a DMA IP to transfer data between the PS and PL containg the Neural Network. If you want to create a different design, for example to connect your NN to a sensor, you can use our block designs as a starting point and add in relevant IP for your use case.

Our block diagram looks like this:

<img src="images/part7_block_design.png" alt="Block Design" style="width; 400px"/>

In [None]:
hls4ml.templates.VivadoAcceleratorBackend.make_bitfile(hls_model)

The floorplan of our NN placed on the `pynq-z2` is shown below, with the hls4ml Neural Network highlighted in purple. You can reproduce this yourself if running the tutorial with a local Vivado installation by opening the project at `model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.xpr` in the Vivado GUI and clicking "Open Implemented Design".
<img src="images/part7_floorplan.png" alt="Floorplan" style="width: 500px;"/>

Let's also inspect the final resource usage after placement:

In [None]:
!sed -n '30,45p' model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper_utilization_placed.rpt

## Part 7b: running on a pynq-z2
The following section is the code to execute in the pynq-z2 jupyter notebook to execute NN inference. 

First, you'll need to follow the setup instructions for the pynq-z2 board, then transfer the following files from the earlier part of this notebook into a directory on the pynq-z2:
- bitfile: `model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit` -> `hls4ml_nn.bit`
- hardware handoff: `model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh` -> `hls4ml_nn.hwh`
- driver: `model_3/hls4ml_prj_pynq/axi_stream_driver.py` -> `axi_stream_driver.py`
- data: `X_test.npy`, `y_test.npy`

The following commands archive these files into `model_3/hls4ml_prj_pynq/package.tar.gz` that can be copied over to the pynq-z2 and extracted.

In [None]:
!mkdir model_3/hls4ml_prj_pynq/package/
!cp model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.runs/impl_1/design_1_wrapper.bit model_3/hls4ml_prj_pynq/package/hls4ml_nn.bit
!cp model_3/hls4ml_prj_pynq/myproject_vivado_accelerator/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh model_3/hls4ml_prj_pynq/package/hls4ml_nn.hwh
!cp model_3/hls4ml_prj_pynq/axi_stream_driver.py model_3/hls4ml_prj_pynq/package/
!cp X_test.npy y_test.npy model_3/hls4ml_prj_pynq/package
!tar -czvf model_3/hls4ml_prj_pynq/package.tar.gz -C model_3/hls4ml_prj_pynq/package/ .

The following cells are intended to run on a pynq-z2, they will not run on the server used to train and synthesize models!

First, import our driver `Overlay` class. We'll also load the test data.

In [None]:
from axi_stream_driver import NeuralNetworkOverlay
import numpy as np
X_test = np.load('X_test.npy')
y_test = np.load('y_test.npy')

Create a `NeuralNetworkOverlay` object. This will download the `Overlay` (bitfile) onto the PL of the pynq-z2. We provide the `X_test.shape` and `y_test.shape` to allocate some buffers for the data transfer.

In [None]:
nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)

Now run the prediction! When we set `profile=True` the function times the inference, and prints out a summary as well as returning the profiling information. We also save the output to a file so we can do some validation.

In [None]:
y_hw, latency, throughput = nn.predict(X_test, profile=True)

An example print out looks like:

Classified 166000 samples in 0.402568 seconds (412352.6956936468 inferences / s)

## Part 7c: final validation
We executed NN inference on the pynq-z2! Now we can copy the `y_hw.npy` back to the host we've been using for the training and synthesis, and make a final plot to check that the output we took on the board is as expected.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import accuracy_score

y_hw = np.load('y_hw.npy')
y_test = np.load('y_test.npy')
classes = np.load('classes.npy', allow_pickle=True)
y_qkeras = model.predict(X_test)

print("Accuracy QKeras, CPU:     {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_qkeras, axis=1))))
print("Accuracy hls4ml, pynq-z2: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_hw, axis=1))))

fig, ax = plt.subplots(figsize=(9, 9))
_ = plotting.makeRoc(y_test, y_qkeras, classes, linestyle='-')
plt.gca().set_prop_cycle(None) # reset the colors
_ = plotting.makeRoc(y_test, y_hw, classes, linestyle='--')

from matplotlib.lines import Line2D
lines = [Line2D([0], [0], ls='-'),
         Line2D([0], [0], ls='--')]
from matplotlib.legend import Legend
leg = Legend(ax, lines, labels=['QKeras, CPU', 'hls4ml, pynq-z2'],
            loc='lower right', frameon=False)
ax.add_artist(leg)