# Importing Brevitas networks into FINN with the QONNX interchange format

**Note: Previously it was possible to directly export the FINN-ONNX interchange format from Brevitas to pass to the FINN compiler. This support is deprecated and FINN uses the export to the QONNX format as a front end, internally FINN uses still the FINN-ONNX format.**

In this notebook we'll go through an example of how to import a Brevitas-trained QNN into FINN. The steps will be as follows:

1. Load up the trained PyTorch model
2. Call Brevitas QONNX export and visualize with Netron
3. Import into FINN and converting QONNX to FINN-ONNX

We'll use the following utility functions to print the source code for function calls (`showSrc()`) and to visualize a network using netron (`showInNetron()`) in the Jupyter notebook:

In [1]:
import onnx
from finn.util.visualization import showSrc, showInNetron

## 1. Load up the Data in tensor format

The FINN Docker image comes with several [example Brevitas networks](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq), and we'll use the LFC-w1a1 model as the example network here. This is a binarized fully connected network trained on the MNIST dataset. Let's start by looking at what the PyTorch network definition looks like:

We can see that the network topology is constructed using a few helper functions that generate the quantized linear layers and quantized activations. The bitwidth of the layers is actually parametrized in the constructor, so let's instantiate a 1-bit weights and activations version of this network. We also have pretrained weights for this network, which we will load into the model.

We have now instantiated our trained PyTorch network. Let's try to run an example MNIST image through the network using PyTorch.

In [2]:
!pip install opencv-python

Defaulting to user installation because normal site-packages is not writeable
Collecting opencv-python
  Downloading opencv_python-4.8.1.78-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.7/61.7 MB[0m [31m782.7 kB/s[0m eta [36m0:00:00[0m00:01[0m00:03[0m
Installing collected packages: opencv-python
Successfully installed opencv-python-4.8.1.78


In [2]:
import cv2
import numpy as np
import onnx.numpy_helper as nph

img = cv2.imread("/home/omaribrahim/Omar/thesis/finn/notebooks/zidane.jpg")
img = cv2.resize(img, (384,640))
img = np.float32(img)
img = np.reshape(img, (-1,3,384,640))
img_tensor = nph.from_array(img)

In [None]:
img_tensor

## 2. Visualize with Netron

Let's examine what the exported ONNX model looks like. For this, we will use the Netron visualizer:

In [2]:
model_path = "/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant.onnx"

In [3]:
showInNetron(model_path)

Serving '/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant.onnx' at http://0.0.0.0:8082


When running this notebook in the FINN Docker container, you should be able to see an interactive visualization of the imported network above, and click on individual nodes to inspect their parameters. If you look at any of the MatMul nodes, you should be able to see that the weights are all {-1, +1} values.

# 3. Import into FINN and call cleanup transformations

We will now import this ONNX model into FINN using the ModelWrapper, and examine some of the graph attributes from Python.

In [3]:
from qonnx.util.cleanup import cleanup

model_path_clean = "/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant_clean.onnx"
cleanup(model_path, out_file=model_path_clean)

                i.e. domain=finn to domain=qonnx.custom_op.<general|fpgadataflow|...>


In [5]:
showInNetron(model_path_clean)

Stopping http://0.0.0.0:8082
Serving '/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant_clean.onnx' at http://0.0.0.0:8082


We will now import this QONNX model into FINN using the ModelWrapper. Here we can immediatley execute the model to verify correctness.

Using the `QONNXtoFINN` transformation we can convert the model to the FINN internal FINN-ONNX representation. Notably all Quant and BipolarQuant nodes will have disappeared and are converted into MultiThreshold nodes.

And once again we can execute the model with the FINN/QONNX execution engine.

We have succesfully verified that the transformed and cleaned-up FINN graph still produces the same output, and can now use this model for further processing in FINN.

# 4. Further cleanup and Running the Model

In [3]:

# Print information about input and output tensors
for n in model.graph.node:
    for i in n.input:
        i_shape = model.get_tensor_shape(i)
            print("input: ",i,i_shape)
    for o in n.output:
        o_shape = model.get_tensor_shape(o)
            print("output: ",o,o_shape)

IndentationError: unexpected indent (355756226.py, line 5)

In [9]:
len(model.graph.node[0].output)

1

In [2]:
from qonnx.transformation.base import Transformation
from qonnx.transformation.batchnorm_to_affine import BatchNormToAffine
from qonnx.transformation.general import (
    ConvertDivToMul,
    ConvertSubToAdd,
    GiveReadableTensorNames,
    GiveUniqueNodeNames,
)
from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.remove import RemoveIdentityOps

from finn.transformation.streamline.absorb import (
    Absorb1BitMulIntoConv,
    Absorb1BitMulIntoMatMul,
    AbsorbAddIntoMultiThreshold,
    AbsorbMulIntoMultiThreshold,
    AbsorbSignBiasIntoMultiThreshold,
    FactorOutMulSignMagnitude,
)
from finn.transformation.streamline.collapse_repeated import (
    CollapseRepeatedAdd,
    CollapseRepeatedMul,
)
from finn.transformation.streamline.reorder import (
    MoveAddPastConv,
    MoveAddPastMul,
    MoveMulPastMaxPool,
    MoveScalarAddPastMatMul,
    MoveScalarLinearPastInvariants,
    MoveScalarMulPastConv,
    MoveScalarMulPastMatMul,
)
from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds
from finn.transformation.streamline.sign_to_thres import ConvertSignToThres
from qonnx.transformation.infer_shapes import InferShapes

def StreamLineCustom(model):
    model = model.transform(ConvertSubToAdd())
    model = model.transform(ConvertDivToMul())
    model = model.transform(BatchNormToAffine())
    model = model.transform(ConvertSignToThres())
    model = model.transform(MoveMulPastMaxPool())
    model.save(model_path_transformed.split(".onnx")[0]+"mul_past_max.onnx")
    model = model.transform(MoveScalarLinearPastInvariants())
    model.save(model_path_transformed.split(".onnx")[0]+"linear_past_invariants.onnx")
    model = model.transform(AbsorbSignBiasIntoMultiThreshold())
    model = model.transform(MoveAddPastMul())
    model = model.transform(MoveScalarAddPastMatMul())
    model = model.transform(MoveAddPastConv())
    model = model.transform(MoveScalarMulPastMatMul())
    model = model.transform(MoveScalarMulPastConv())
    model = model.transform(MoveAddPastMul())
    model = model.transform(CollapseRepeatedAdd())
    model = model.transform(CollapseRepeatedMul())
    model = model.transform(MoveMulPastMaxPool())
    model = model.transform(AbsorbAddIntoMultiThreshold())
    model = model.transform(FactorOutMulSignMagnitude())
    model = model.transform(AbsorbMulIntoMultiThreshold())
    model = model.transform(Absorb1BitMulIntoMatMul())
    model = model.transform(Absorb1BitMulIntoConv())
    model = model.transform(RoundAndClipThresholds())
    return model

In [7]:
model.save(model_path_transformed.split(".onnx")[0]+"linear_past_invariants.onnx")

In [23]:
for n in model.graph.node:
    if n.op_type == "Transpose":
        print(n , model.get_tensor_shape(n))

input: "global_in"
output: "iiyUEm"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 2
  ints: 3
  ints: 1
  type: INTS
}
 None
input: "TAfAKh"
output: "MultiThreshold_1_out0"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 3
  ints: 1
  ints: 2
  type: INTS
}
 None
input: "Mul_0_out0"
output: "scBJuI"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 2
  ints: 3
  ints: 1
  type: INTS
}
 None
input: "Mul_0_out0"
output: "Whz7cX"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 2
  ints: 3
  ints: 1
  type: INTS
}
 None
input: "4mdrQK"
output: "MultiThreshold_2_out0"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 3
  ints: 1
  ints: 2
  type: INTS
}
 None
input: "9nrTzU"
output: "MultiThreshold_3_out0"
op_type: "Transpose"
attribute {
  name: "perm"
  ints: 0
  ints: 3
  ints: 1
  ints: 2
  type: INTS
}
 None
input: "Mul_1_out0"
output: "lZhlV6"
op_type: "Transpose"
attribute {
  name: "perm"
  ints:

In [22]:
from qonnx.transformation.fold_constants import FoldConstants
from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs
from qonnx.transformation.qcdq_to_qonnx import QCDQToQuant
from qonnx.transformation.infer_data_layouts import InferDataLayouts

from qonnx.transformation.lower_convs_to_matmul import LowerConvsToMatMul
import finn.transformation.streamline.absorb as absorb
from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
from qonnx.transformation.fold_constants import FoldConstants
from finn.transformation.streamline.reorder import MakeMaxPoolNHWC
from qonnx.transformation.general import (
    ApplyConfig,
    GiveReadableTensorNames,
    GiveUniqueNodeNames,
    RemoveStaticGraphInputs,
    RemoveUnusedTensors,
)

from qonnx.util.cleanup import cleanup
import onnx
from finn.util.visualization import showSrc, showInNetron

from finn.transformation.streamline import Streamline
from finn.transformation.qonnx.convert_qonnx_to_finn import ConvertQONNXtoFINN

from qonnx.core.modelwrapper import ModelWrapper

model_path = "/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant.onnx"

model_path_clean = "/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant_clean.onnx"
cleanup(model_path, out_file=model_path_clean)

model = ModelWrapper(model_path_clean)
model_path_transformed = "/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant_transformed.onnx"

# model = model.transform(QCDQToQuant())
#model = model.transform(ConvertQONNXtoFINN())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model = model.transform(RemoveStaticGraphInputs())
model = model.transform(InferDataTypes())
model = model.transform(InferShapes())
model = model.transform(FoldConstants())

model.save(model_path_transformed.split(".onnx")[0]+"before.onnx")
model = model.transform(absorb.AbsorbSignBiasIntoMultiThreshold())
model.save(model_path_transformed.split(".onnx")[0]+"absorb_sign_thres.onnx")
model = model.transform(Streamline())
model.save(model_path_transformed.split(".onnx")[0]+"streamline_1.onnx")

if True:
    model = model.transform(LowerConvsToMatMul())
    model.save(model_path_transformed.split(".onnx")[0]+"lower_conv.onnx")
    model = model.transform(MakeMaxPoolNHWC())
    model.save(model_path_transformed.split(".onnx")[0]+"max_pool_1.onnx")
    model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
    model.save(model_path_transformed.split(".onnx")[0]+"absorb_tranpose thres.onnx")
    model = model.transform(MakeMaxPoolNHWC())
    model.save(model_path_transformed.split(".onnx")[0]+"max_pool_2.onnx")
    model = model.transform(absorb.AbsorbConsecutiveTransposes())
    model.save(model_path_transformed.split(".onnx")[0]+"absorb_tranpose consec.onnx")
    
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model.save(model_path_transformed.split(".onnx")[0]+"convert_bipolar_matmul_xnor.onnx")
model = StreamLineCustom(model)

# absorb final add-mul nodes into TopK
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors())
model.save(model_path_transformed)

InferenceError: [ShapeInferenceError] (op_type:Transpose): [ShapeInferenceError] Inferred shape and existing shape differ in dimension 1: (160) vs (96)

In [21]:
showInNetron(model_path_transformed.split(".onnx")[0]+"before.onnx")

Stopping http://0.0.0.0:8082
Serving '/home/omaribrahim/Omar/thesis/finn/notebooks/yolov5s_quant_transformedbefore.onnx' at http://0.0.0.0:8082


In [145]:
model.graph.node[3]

input: "/model.0/conv/export_handler/Add_output_0"
input: "/model.0/act/export_handler/Constant_output_0"
output: "/model.0/act/export_handler/MultiThreshold_output_0"
name: "/model.0/act/export_handler/MultiThreshold"
op_type: "MultiThreshold"
attribute {
  name: "out_dtype"
  s: "UINT8"
  type: STRING
}
domain: "qonnx.custom_op.general"

In [189]:
model.graph.node[0].input[0]

'inp.1'

In [163]:
print(model.get_tensor_shape(model.graph.output[0].name))

[0, 0, 0]


In [None]:
for idx in range(len(model.graph.node)):
    print(model.get_tensor_datatype(model.graph.node[idx].input[0]))

In [None]:
for n in model.graph.node:
    for i in n.input:
        i_shape = model.get_tensor_shape(i)
        print("input: ",i,i_shape)
    for o in n.output:
        o_shape = model.get_tensor_shape(o)
        print("output: ",o,o_shape)


In [60]:
import finn.core.onnx_exec as oxe
model = ModelWrapper(model_path_transformed)
input_dict = {"global_in": nph.to_array(img_tensor)}
output_dict = oxe.execute_onnx(model, input_dict)
produced_finn = output_dict[list(output_dict.keys())[0]]

produced_finn[0]

array([[ 3.62163496e+00,  3.21588707e+00,  7.71233654e+00, ...,
         8.37445259e-06,  9.53376293e-05,  1.89006329e-04],
       [ 1.15097218e+01,  1.08520880e+01,  2.03394566e+01, ...,
         1.36196613e-05,  5.64157963e-05,  6.37471676e-05],
       [ 2.29630623e+01, -4.54718113e-01,  2.08706188e+01, ...,
         1.52289867e-05,  4.77880239e-04,  7.60197639e-04],
       ...,
       [ 5.60907227e+02,  3.58839844e+02,  2.42322083e+02, ...,
         1.63346529e-04,  4.01556492e-04,  2.67982483e-04],
       [ 6.00256470e+02,  3.56950806e+02,  1.41924469e+02, ...,
         1.26272440e-04,  2.22861767e-04,  1.05172396e-04],
       [ 6.10311401e+02,  3.60233521e+02,  1.45730011e+02, ...,
         6.19769096e-04,  7.67916441e-04,  5.12659550e-04]], dtype=float32)

In [74]:
import time
import torch
import torchvision

def xywh2xyxy(x):
    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x
    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y
    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x
    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y
    return y

def non_max_suppression(
        prediction,
        conf_thres=0.25,
        iou_thres=0.45,
        classes=None,
        agnostic=False,
        multi_label=False,
        labels=(),
        max_det=300,
        nm=0,  # number of masks
):
    """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections

    Returns:
         list of detections, on (n,6) tensor per image [xyxy, conf, cls]
    """

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    bs = prediction.shape[0]  # batch size
    nc = prediction.shape[2] - nm - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    # Settings
    # min_wh = 2  # (pixels) minimum box width and height
    max_wh = 7680  # (pixels) maximum box width and height
    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()
    time_limit = 0.5 + 0.05 * bs  # seconds to quit after
    redundant = True  # require redundant detections
    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)
    merge = False  # use merge-NMS

    t = time.time()
    mi = 5 + nc  # mask start index
    output = [torch.zeros((0, 6 + nm))] * bs
    for xi, x in enumerate(prediction):  # image index, image inference
        # Apply constraints
        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height
        x = x[xc[xi]]  # confidence

        # Cat apriori labels if autolabelling
        if labels and len(labels[xi]):
            lb = labels[xi]
            v = torch.zeros((len(lb), nc + nm + 5))
            v[:, :4] = lb[:, 1:5]  # box
            v[:, 4] = 1.0  # conf
            v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls
            x = torch.cat((x, v), 0)

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf

        # Box/Mask
        box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)
        mask = x[:, mi:]  # zero columns if no masks

        # Detections matrix nx6 (xyxy, conf, cls)
        if multi_label:
            i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T
            x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)
        else:  # best class only
            conf, j = torch.max(x[:, 5:mi],1, keepdim=True)
            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]

        # Filter by class
        if classes is not None:
            x = x[(x[:, 5:6] == torch.tensor(classes)).any(1)]

        # Apply finite constraint
        # if not torch.isfinite(x).all():
        #     x = x[torch.isfinite(x).all(1)]

        # Check shape
        n = x.shape[0]  # number of boxes
        if not n:  # no boxes
            continue
        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes

        # Batched NMS
        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
        i = i[:max_det]  # limit detections
        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]
        if (time.time() - t) > time_limit:
            LOGGER.warning(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')
            break  # time limit exceeded

    return output

In [75]:
non_max_suppression(torch.from_numpy(produced_finn))

[tensor([[3.2080e+02, 1.0635e+02, 3.5202e+02, 1.9971e+02, 9.9185e-01, 6.0000e+01],
         [3.4261e+02, 1.0078e+02, 4.0714e+02, 1.5949e+02, 9.9144e-01, 6.0000e+01],
         [8.6478e+00, 2.0767e+02, 6.4876e+01, 2.1052e+02, 9.9133e-01, 4.7000e+01],
         ...,
         [2.9370e+02, 1.1101e+02, 4.0217e+02, 1.4398e+02, 3.8496e-01, 6.0000e+01],
         [1.6854e+02, 1.2156e+02, 1.8602e+02, 1.2414e+02, 3.8415e-01, 9.0000e+00],
         [3.3624e+01, 2.0696e+02, 5.7407e+01, 2.6348e+02, 3.8405e-01, 6.0000e+01]])]