# Vivado Flow

## Before You Start

The current set of notebooks are under constant development.

### Update Tutorial Repository

If you have previously cloned the tutorial repository, you may need to get the latest versions of the notebooks.

First check the status of your repository:
```
cd hls4ml-tutorial
make clean
git status 
```

You may have some _modified_ notebooks. For example:

```
# On branch csee-e6868-spring2021
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   part1_getting_started.ipynb
#	modified:   part2_advanced_config.ipynb
#	modified:   part2b_advanced_config.ipynb
#
no changes added to commit (use "git add" and/or "git commit -a")
```

You can make a copy of those modified notebooks if you had significat changes, otherwise the easiest thing to do is to discard those changes.

**ATTENTION** You will loose your local changes!

```
git checkout *.ipynb
```

At this point, you can update you copy of the repository:
```
git pull
```


### Update Conda Environment

It is likely that you are running this notebook in the Conda environment `hls4ml-tutorial-cu`.

If you did not do that yet, you should update the `hls4ml` packages with the latest changes in the working branch.

```
conda activate hls4ml-tutorial-cu
pip uninstall hls4ml
pip install git+https://github.com/GiuseppeDiGuglielmo/hls4ml.git@gdg/cosmetics#egg=hls4ml[profiling]
```

You may need to restart the Jupyter notebook.


## Introduction

We're going to train a fully connected neural network with QKeras on the jet tagging dataset and run it baremetal on Zynq-class boards (ZCU106, Ultra96, Pynq-Z1, MiniZed).

This is an overview of the flow. We reference some of steps in this notebook.

![vivado-flow](doc/vivado_flow.png)

## Setup

Choose the target board. For the time being, you can use `minized`, `pynqz1`, `pynqz2`, `cmoda735t`. You may need to install the proper board files for the chosen board.

In [1]:
## ZCU106
#board_name='zcu106'
#fpga_part='xczu7ev-ffvc1156-2-e'
 
## Ultra96
#board_name='ultra96'
#fpga_part='xczu3eg-sbva484-1-e'

## Pynq-Z1
#board_name='pynqz1'
#fpga_part='xc7z020clg400-1'

## Pynq-Z2
board_name='pynqz2'
fpga_part='xc7z020clg400-1'

## MiniZed
#board_name='minized'
#fpga_part='xc7z007sclg225-1'

##Cmod A7-35t
#board_name='cmoda735t'
#fpga_part='xc7a35tcpg236-1'

Let's import the libraries, call the magic functions, and setup the environment variables.

In [2]:
import tensorflow as tf

from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l1

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler

from qkeras.qlayers import QDense, QActivation
from qkeras.quantizers import quantized_bits, quantized_relu
from qkeras.utils import _add_supported_quantized_objects

import numpy as np

import hls4ml

from callbacks import all_callbacks
import plotting

%matplotlib inline

import os
os.environ['PATH'] = '/tools/Xilinx/Vivado/2019.1/bin:' + os.environ['PATH']

def is_tool(name):
    from distutils.spawn import find_executable
    return find_executable(name) is not None

print('-----------------------------------')
if not is_tool('vivado_hls'):
    print('Xilinx Vivado HLS is NOT in the PATH')
else:
    print('Xilinx Vivado HLS is in the PATH')
print('-----------------------------------')

-----------------------------------
Xilinx Vivado HLS is in the PATH
-----------------------------------


## Load the dataset

This is a lot like the previous notebooks, so we will go through quickly.

First, we fetch the dataset from file, do the normalization and make a train and test split.

We save the test dataset to files so that we can use them later.

In [3]:
#load processed test data
from sklearn.utils import shuffle
X = np.load('./test_data/test_data.npy', allow_pickle=True)
y = np.load('./test_data/test_data_ground_truths.npy', allow_pickle=True)
y_keras = []
#use a quarter of the test_set to save time
for i in range(len(X)):
    quarter = int(len(X[i])/4)
    assert len(X) == len(y)
    X[i], y[i] = shuffle(X[i], y[i])
    X[i], y[i] = X[i][0:quarter],  y[i][0:quarter]

## Train

In [4]:
import keras_model
train = False
#not os.path.exists('model/KERAS_check_best_model.h5')
if train:
    model.compile(loss="mean_squared_error", optimizer="adam")
        
    print("Shape of training data element is: {}".format(train_data[0].shape))
    history = model.fit(train_data,
                        train_data,
                        epochs=100,
                        batch_size=512,
                        shuffle=true,
                        validation_split=0.1,
                        verbose=1,
                        callbacks=callbacks)
    

else:
    model_file = "{model}/model_{machine_type}.hdf5".format(model="./model/train_config_bits_6_frames_5_mels_128_encDims_8_bn_True_l1reg_0_expPower_3_beginSpar_0_finSpar_0.8",
                                                              machine_type="ToyCar")
    #model_file = "model/KERAS_check_best_model.hdf5"
    if not os.path.exists(model_file):
        print("{} model not found at path ".format(model_file))
    model = keras_model.load_model(model_file)
    model.summary()





Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 640)]             0         
_________________________________________________________________
q_dense (QDense)             (None, 128)               82048     
_________________________________________________________________
batch_normalization (BatchNo (None, 128)               512       
_________________________________________________________________
q_activation (QActivation)   (None, 128)               0         
_________________________________________________________________
q_dense_1 (QDense)           (None, 128)               16512     
_________________________________________________________________
batch_normalization_1 (Batch (None, 128)               512       
_________________________________________________________________
q_activation_1 (QActivation) (None, 128)              

## Check accuracy

Do not expect a good accuracy because of the low amount of neurons. I could have done better than this, but as long as it fits both Pynq-Z1 and MiniZed, it is fine with us.

## Make an hls4ml configuration (Step 2)

Notice we're using `Strategy: Resource` for every layer, and `ReuseFactor: 64`. The Programmable Logic (FPGA part) of the Pynq-Z1 SoC is not big compared to VU9P type of parts.

We also use some settings which are good for QKeras.

Notice the `fpga_part:'xc7z020clg400-1'`.

In [5]:

hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'

hls_config = hls4ml.utils.config_from_keras_model(model, granularity='name')
hls_config['Model'] = {}
hls_config['Model']['ReuseFactor'] = 128
hls_config['Model']['Strategy'] = 'Resource'
hls_config['Model']['Precision'] = 'ap_fixed<7,4>'
hls_config['LayerName']['input_1']['Precision'] = 'ap_fixed<7,7>'

hls_config['LayerName']['q_dense']['Precision']['weight'] = 'ap_fixed<7,1>'
hls_config['LayerName']['q_dense']['Precision']['bias'] = 'ap_fixed<7,1>'
hls_config['LayerName']['q_dense']['ReuseFactor'] = 128

hls_config['LayerName']['batch_normalization']['Precision']['scale'] = 'ap_fixed<7,4>'
hls_config['LayerName']['batch_normalization']['Precision']['bias'] = 'ap_fixed<7,4>'
hls_config['LayerName']['batch_normalization']['ReuseFactor'] = 128

hls_config['LayerName']['q_activation']['Precision']['result'] = 'ap_fixed<7,4>'
hls_config['LayerName']['q_activation']['ReuseFactor'] = 128

for i in range(1,9):
    
    hls_config['LayerName']['q_dense_{}'.format(i)]['Precision']['weight'] = 'ap_fixed<7,1>'
    hls_config['LayerName']['q_dense_{}'.format(i)]['Precision']['bias'] = 'ap_fixed<7,1>'
    hls_config['LayerName']['q_dense_{}'.format(i)]['ReuseFactor'] = 128

    hls_config['LayerName']['batch_normalization_{}'.format(i)]['Precision']['scale'] = 'ap_fixed<7,4>'
    hls_config['LayerName']['batch_normalization_{}'.format(i)]['Precision']['bias'] = 'ap_fixed<7,4>'
    hls_config['LayerName']['batch_normalization_{}'.format(i)]['ReuseFactor'] = 128

    hls_config['LayerName']['q_activation_{}'.format(i)]['Precision']['result'] = 'ap_fixed<7,4>'
    hls_config['LayerName']['q_activation_{}'.format(i)]['ReuseFactor'] = 128
    
#final output
hls_config['LayerName']['q_dense_9']['Precision']['weight'] = 'ap_fixed<7,1>'
hls_config['LayerName']['q_dense_9']['Precision']['bias'] = 'ap_fixed<7,1>'
hls_config['LayerName']['q_dense_9']['ReuseFactor'] = 128

print("-----------------------------------")
plotting.print_dict(hls_config)
print("-----------------------------------")

-----------------------------------
Model
  ReuseFactor:       128
  Strategy:          Resource
  Precision:         ap_fixed<7,4>
LayerName
  input_1
    Precision:       ap_fixed<7,7>
  q_dense
    Precision
      weight:        ap_fixed<7,1>
      bias:          ap_fixed<7,1>
    ReuseFactor:     128
  batch_normalization
    Precision
      scale:         ap_fixed<7,4>
      bias:          ap_fixed<7,4>
    ReuseFactor:     128
  q_activation
    Precision
      result:        ap_fixed<7,4>
    ReuseFactor:     128
  q_dense_1
    Precision
      weight:        ap_fixed<7,1>
      bias:          ap_fixed<7,1>
    ReuseFactor:     128
  batch_normalization_1
    Precision
      scale:         ap_fixed<7,4>
      bias:          ap_fixed<7,4>
    ReuseFactor:     128
  q_activation_1
    Precision
      result:        ap_fixed<7,4>
    ReuseFactor:     128
  q_dense_2
    Precision
      weight:        ap_fixed<7,1>
      bias:          ap_fixed<7,1>
    ReuseFactor:     128
  batch_

## Convert and Compile

You can set some target specific configurations:

- Define the `interface`, which for our current setup should always be `m_axi`.
- Define the  width of the AXI bus. For the time being, use `16` that is each clock cycle you transfer a single input or output value (`ap_fixed<16,*>`).
- Define the implementation. For the time being, use `serial`.

In [6]:
interface = 'm_axi' # 's_axilite', 'm_axi', 'hls_stream'
axi_width = 16 # 16, 32, 64
implementation = 'serial' # 'serial', 'dataflow'

In [7]:
output_dir='hls/' + board_name + '_' + interface + '_' + str(axi_width) + '_' + implementation + '_prj' 

backend_config = hls4ml.converters.create_backend_config(fpga_part=fpga_part)
backend_config['ProjectName'] = 'jet_tagger'
backend_config['KerasModel'] = model
backend_config['HLSConfig'] = hls_config
backend_config['OutputDir'] = output_dir
backend_config['Backend'] = 'Pynq'
backend_config['Interface'] = interface
backend_config['IOType'] = 'io_parallel'
backend_config['AxiWidth'] = str(axi_width)
backend_config['Implementation'] = implementation
backend_config['ClockPeriod'] = 10

#print("-----------------------------------")
#plotting.print_dict(backend_config)
#print("-----------------------------------")

hls_model = hls4ml.converters.keras_to_hls(backend_config)

_ = hls_model.compile()

In [8]:
plotting.print_dict(backend_config)

OutputDir:           hls/pynqz2_m_axi_16_serial_prj
ProjectName:         jet_tagger
XilinxPart:          xc7z020clg400-1
ClockPeriod:         10
Backend:             Pynq
IOType:              io_parallel
HLSConfig
  Model
    ReuseFactor:     128
    Strategy:        Resource
    Precision:       ap_fixed<7,4>
  LayerName
    input_1
      Precision:     ap_fixed<7,7>
    q_dense
      Precision
        weight:      ap_fixed<7,1>
        bias:        ap_fixed<7,1>
      ReuseFactor:   128
    batch_normalization
      Precision
        scale:       ap_fixed<7,4>
        bias:        ap_fixed<7,4>
      ReuseFactor:   128
    q_activation
      Precision
        result:      ap_fixed<7,4>
      ReuseFactor:   128
    q_dense_1
      Precision
        weight:      ap_fixed<7,1>
        bias:        ap_fixed<7,1>
      ReuseFactor:   128
    batch_normalization_1
      Precision
        scale:       ap_fixed<7,4>
        bias:        ap_fixed<7,4>
      ReuseFactor:   128
    q_activation

## Prediction and Comparison


## Synthesis

In [None]:
hls_model.build(csim=False,synth=True,export=True)

hls4ml.report.read_vivado_report(output_dir)

## Resource Reference

See the resources availables on different boards.

```
+-----------------+---------+-------+--------+-------+-----+                    
|                 |               Resource                 |
+-----------------+---------+-------+--------+-------+-----+
|      Board      | BRAM_18K| DSP48E|   FF   |  LUT  | URAM|
+-----------------+---------+-------+--------+-------+-----+
|   PYNQ-Z1/Z2    |      280|    220|  106400|  53200|    0|
+-----------------+---------+-------+--------+-------+-----+
|     MiniZed     |      100|     66|   28800|  14400|    0|
+-----------------+---------+-------+--------+-------+-----+
``` 

## Generate .dat Files (Step 3)

The .dat files are used
- during the following `csim` step
- to generate the header files for SDK

## Run Vivado HLS csim (Step 4)

At this step we generate simulation traces out from the hls4ml-model.

Run the following cell to run Vivado HLS GUI:

**IMPORTANT** Click the button to `Run C Simulation`.

This will generate simulation traces with fixed-point arythmetic.

When completed close Vivado HLS GUI.

## Integrate IP in a Vivado Project and Generate Bitstream (Step 5)

**TODO** Tell the user how to visualize the `Block Diagram` to get a better understanding of the IP integration with both Zynq and MicroBlaze PS.

## Configure Software in Vivado SDK and Run HW/SW on the Board (Step 6)

Create Vivado SDK project.

- `make sdk` to configure an application with register polling
- `make sdk-irq` to configure an application with interrupts (default)

You can open a serial console, for example
```
sudo minicom -D /dev/ttyUSB0
```
and see 

![serial-console](doc/serial_console.png)