## Introduction
We're going to train a fully connected NN with QKeras on the jet tagging dataset and run it on the Pynq.

**Note**: Vivado should be on the `$PATH` beforehand

In [None]:
import tensorflow as tf
from qkeras.utils import _add_supported_quantized_objects
import hls4ml
import numpy as np
from tensorflow.keras.utils import to_categorical
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
import os
os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l1
from qkeras.qlayers import QDense, QActivation
from qkeras.quantizers import quantized_bits, quantized_relu
from callbacks import all_callbacks

## Dataset fetching
This is a lot like the hls4ml tutorial, so we will go through quickly.

First, we fetch the dataset from OpenML, do the normalization and make a train test split.

We save the test dataset to files so that we can use them on the Pynq card later.

In [None]:
data = fetch_openml('hls4ml_lhc_jets_hlf')
X, y = data['data'], data['target']
le = LabelEncoder()
y = le.fit_transform(y)
y = to_categorical(y, 5)
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_val = scaler.fit_transform(X_train_val)
X_test = scaler.transform(X_test)
np.save('y_test.npy', y_test)
np.save('X_test.npy', X_test)

## Model training
Our favourite 3 hidden-layer model. 6 bit quantizers everywhere.

In [None]:
model = Sequential()
model.add(QDense(64, input_shape=(16,), name='fc1',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu1'))
model.add(QDense(32, input_shape=(16,), name='fc2',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu2'))
model.add(QDense(32, input_shape=(16,), name='fc3',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(QActivation(activation=quantized_relu(6), name='relu3'))
model.add(QDense(5, name='output',
                 kernel_quantizer=quantized_bits(6,0,alpha=1), bias_quantizer=quantized_bits(6,0,alpha=1),
                 kernel_initializer='lecun_uniform', kernel_regularizer=l1(0.0001)))
model.add(Activation(activation='softmax', name='softmax'))

## Prune
Because why not?

In [None]:
from tensorflow_model_optimization.python.core.sparsity.keras import prune, pruning_callbacks, pruning_schedule
from tensorflow_model_optimization.sparsity.keras import strip_pruning
pruning_params = {"pruning_schedule" : pruning_schedule.ConstantSparsity(0.75, begin_step=2000, frequency=100)}
model = prune.prune_low_magnitude(model, **pruning_params)

## Train

In [None]:
train = not os.path.exists('model_1/KERAS_check_best_model.h5')
if train:
    adam = Adam(lr=0.0001)
    model.compile(optimizer=adam, loss=['categorical_crossentropy'], metrics=['accuracy'])
    callbacks = all_callbacks(stop_patience = 1000,
                              lr_factor = 0.5,
                              lr_patience = 10,
                              lr_epsilon = 0.000001,
                              lr_cooldown = 2,
                              lr_minimum = 0.0000001,
                              outputDir = 'model_1')
    callbacks.callbacks.append(pruning_callbacks.UpdatePruningStep())
    model.fit(X_train_val, y_train_val, batch_size=1024,
              epochs=30, validation_split=0.25, shuffle=True,
              callbacks = callbacks.callbacks)
    # Save the model again but with the pruning 'stripped' to use the regular layer types
    model = strip_pruning(model)
    model.save('model_1/KERAS_check_best_model.h5')
else:
    from tensorflow.keras.models import load_model
    from qkeras.utils import _add_supported_quantized_objects
    co = {}
    _add_supported_quantized_objects(co)
    model = load_model('model_1/KERAS_check_best_model.h5', custom_objects=co)

## Check accuracy

In [None]:
import plotting
from sklearn.metrics import accuracy_score
y_keras = model.predict(X_test)
print("Accuracy: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_keras, axis=1))))

## Make an hls4ml configuration
Notice we're using `Strategy: Resource` for every layer, and `ReuseFactor: 64`. The Programmable Logic (FPGA part) of the Pynq SoC is not big compared to VU9P type of parts.

We also use some settings which are good for QKeras.

Notice the `fpga_part:'xc7z020clg400-1'` and `backend='Pynq'`.

In [None]:
import hls4ml
hls4ml.model.optimizer.OutputRoundingSaturationMode.layers = ['Activation']
hls4ml.model.optimizer.OutputRoundingSaturationMode.rounding_mode = 'AP_RND'
hls4ml.model.optimizer.OutputRoundingSaturationMode.saturation_mode = 'AP_SAT'
config = hls4ml.utils.config_from_keras_model(model, granularity='name')
config['Model'] = {}
config['Model']['ReuseFactor'] = 64
config['Model']['Strategy'] = 'Resource'
config['Model']['Precision'] = 'ap_fixed<16,6>'
config['LayerName']['fc1']['ReuseFactor'] = 64
config['LayerName']['fc2']['ReuseFactor'] = 64
config['LayerName']['fc3']['ReuseFactor'] = 64
config['LayerName']['output']['ReuseFactor'] = 64
config['LayerName']['softmax']['exp_table_t'] = 'ap_fixed<18,8>'
config['LayerName']['softmax']['inv_table_t'] = 'ap_fixed<18,4>'

cfg = hls4ml.converters.create_vivado_config(fpga_part='xc7z020clg400-1')
cfg['HLSConfig'] = config
cfg['Backend'] = 'Pynq'
cfg['KerasModel'] = model
cfg['OutputDir'] = 'hls4ml_prj_gui'

print("-----------------------------------")
plotting.print_dict(cfg)
print("-----------------------------------")

## Convert, `predict`, synthesize

In [None]:
hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()
y_hls = hls_model.predict(X_test)
print("Accuracy: {}".format(accuracy_score(np.argmax(y_hls, axis=1), np.argmax(y_keras, axis=1))))
hls_model.build(csim=False,synth=True,export=True)
hls4ml.report.read_vivado_report('hls4ml_prj_gui/')

## Bitfile time
At this point we can make a bitfile with `hls4ml.templates.PynqBackend.make_bitfile(hls_model)`. For the first run through, let's use the Vivado GUI to get a better idea of what's going on. We'll run the "board designer" flow, run Synthesis and Implementation, then check the various reports.

## Bitfile time 2
For the avoidance of any issues due to missteps during the GUI flow, let's make another project & bitfile.

The only difference wrt the above is the output directory, and now we run `hls4ml.templates.PynqBackend.make_bitfile(hls_model)` to make the bitfile! This basically executes the `tcl` command for each of the GUI clicks we made before.

Check the terminal from where you started the notebook, or from another terminal (e.g. started from the Jupyter notebook) do `tail -f <a log file>` to see some Vivado output.

In [None]:
cfg['OutputDir'] = 'hls4ml_prj'
hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()
y_hls = hls_model.predict(X_test)
np.save('y_hls.npy', y_hls)
print("Accuracy: {}".format(accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_hls, axis=1))))
hls_model.build(csim=False,synth=True,export=True)
hls4ml.report.read_vivado_report('hls4ml_prj/')
hls4ml.templates.PynqBackend.make_bitfile(hls_model)

## PYNQ time
Now we can run this on the Pynq board. Set up your Pynq board. You should have the SD card that came with the board. You'll need to connect to your laptop/PC using an ethernet cable for data and USB for power. You may need to change some network settings to be able to connect to the Jupyter notebook.
There are more details here: https://pynq.readthedocs.io/en/latest/getting_started/pynq_z2_setup.html

Copy (e.g. with `scp` or jupyter notebook upload) these files to the Pynq:
- hls4ml_prj/myproject_pynq/project_1.runs/impl_1/design_1_wrapper.bit
- hls4ml_prj/myproject_pynq/project_1.srcs/sources_1/bd/design_1/hw_handoff/design_1.hwh

For a reason that I don't understand, these files don't seem to be available immediately when Vivado exits, you may have to wait a few minutes... 