# FPGA usage

In this notebook we are going to use the neural network implemented in the PL of the PYNQ to predict results. This notebook is based on this [tutorial](https://pynq.readthedocs.io/en/v2.3/overlay_design_methodology/overlay_tutorial.html), [this YT video](https://www.youtube.com/watch?v=Dupyek4NUoI) and [this post from the PYNQ forum](https://discuss.pynq.io/t/how-to-use-ap-fixed-data-type-to-communicate-with-the-ip-made-by-the-vivado-hls/679).

Firstly, we load the overlay (there must be a .bit file and a .hwh file)

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [2]:
from pynq import Overlay
overlay = Overlay('/home/xilinx/pynq/overlays/MeatNet102/MeatNet102.bit')

If we go to this file in the HLS project

`.../[HLSproject]/[solutionX]/impl/misc/drivers/[ip_name]/src/x[ip_name]_hw.h`

we can find the list of the MMIO ports that can be used to communicate to the neural network. In our case it contains the following header:

```
// ==============================================================
// Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC v2019.2 (64-bit)
// Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.
// ==============================================================
// AXILiteS
// 0x020 ~
// 0x03f : Memory 'in_V' (11 * 10b)
//         Word n : bit [ 9: 0] - in_V[2n]
//                  bit [25:16] - in_V[2n+1]
//                  others      - reserved
// 0x040 ~
// 0x07f : Memory 'w01_0_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_0_V[2n]
//                  bit [25:16] - w01_0_V[2n+1]
//                  others      - reserved
// 0x080 ~
// 0x0bf : Memory 'w01_1_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_1_V[2n]
//                  bit [25:16] - w01_1_V[2n+1]
//                  others      - reserved
// 0x0c0 ~
// 0x0ff : Memory 'w01_2_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_2_V[2n]
//                  bit [25:16] - w01_2_V[2n+1]
//                  others      - reserved
// 0x100 ~
// 0x13f : Memory 'w01_3_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_3_V[2n]
//                  bit [25:16] - w01_3_V[2n+1]
//                  others      - reserved
// 0x140 ~
// 0x17f : Memory 'w01_4_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_4_V[2n]
//                  bit [25:16] - w01_4_V[2n+1]
//                  others      - reserved
// 0x180 ~
// 0x1bf : Memory 'w01_5_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_5_V[2n]
//                  bit [25:16] - w01_5_V[2n+1]
//                  others      - reserved
// 0x1c0 ~
// 0x1ff : Memory 'w01_6_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_6_V[2n]
//                  bit [25:16] - w01_6_V[2n+1]
//                  others      - reserved
// 0x200 ~
// 0x23f : Memory 'w01_7_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_7_V[2n]
//                  bit [25:16] - w01_7_V[2n+1]
//                  others      - reserved
// 0x240 ~
// 0x27f : Memory 'w01_8_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_8_V[2n]
//                  bit [25:16] - w01_8_V[2n+1]
//                  others      - reserved
// 0x280 ~
// 0x2bf : Memory 'w01_9_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_9_V[2n]
//                  bit [25:16] - w01_9_V[2n+1]
//                  others      - reserved
// 0x2c0 ~
// 0x2ff : Memory 'w01_10_V' (32 * 10b)
//         Word n : bit [ 9: 0] - w01_10_V[2n]
//                  bit [25:16] - w01_10_V[2n+1]
//                  others      - reserved
// 0x300 ~
// 0x33f : Memory 'b01_V' (32 * 10b)
//         Word n : bit [ 9: 0] - b01_V[2n]
//                  bit [25:16] - b01_V[2n+1]
//                  others      - reserved
// 0x340 ~
// 0x35f : Memory 'w12_0_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_0_V[2n]
//                  bit [25:16] - w12_0_V[2n+1]
//                  others      - reserved
// 0x360 ~
// 0x37f : Memory 'w12_1_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_1_V[2n]
//                  bit [25:16] - w12_1_V[2n+1]
//                  others      - reserved
// 0x380 ~
// 0x39f : Memory 'w12_2_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_2_V[2n]
//                  bit [25:16] - w12_2_V[2n+1]
//                  others      - reserved
// 0x3a0 ~
// 0x3bf : Memory 'w12_3_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_3_V[2n]
//                  bit [25:16] - w12_3_V[2n+1]
//                  others      - reserved
// 0x3c0 ~
// 0x3df : Memory 'w12_4_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_4_V[2n]
//                  bit [25:16] - w12_4_V[2n+1]
//                  others      - reserved
// 0x3e0 ~
// 0x3ff : Memory 'w12_5_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_5_V[2n]
//                  bit [25:16] - w12_5_V[2n+1]
//                  others      - reserved
// 0x400 ~
// 0x41f : Memory 'w12_6_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_6_V[2n]
//                  bit [25:16] - w12_6_V[2n+1]
//                  others      - reserved
// 0x420 ~
// 0x43f : Memory 'w12_7_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_7_V[2n]
//                  bit [25:16] - w12_7_V[2n+1]
//                  others      - reserved
// 0x440 ~
// 0x45f : Memory 'w12_8_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_8_V[2n]
//                  bit [25:16] - w12_8_V[2n+1]
//                  others      - reserved
// 0x460 ~
// 0x47f : Memory 'w12_9_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_9_V[2n]
//                  bit [25:16] - w12_9_V[2n+1]
//                  others      - reserved
// 0x480 ~
// 0x49f : Memory 'w12_10_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_10_V[2n]
//                  bit [25:16] - w12_10_V[2n+1]
//                  others      - reserved
// 0x4a0 ~
// 0x4bf : Memory 'w12_11_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_11_V[2n]
//                  bit [25:16] - w12_11_V[2n+1]
//                  others      - reserved
// 0x4c0 ~
// 0x4df : Memory 'w12_12_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_12_V[2n]
//                  bit [25:16] - w12_12_V[2n+1]
//                  others      - reserved
// 0x4e0 ~
// 0x4ff : Memory 'w12_13_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_13_V[2n]
//                  bit [25:16] - w12_13_V[2n+1]
//                  others      - reserved
// 0x500 ~
// 0x51f : Memory 'w12_14_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_14_V[2n]
//                  bit [25:16] - w12_14_V[2n+1]
//                  others      - reserved
// 0x520 ~
// 0x53f : Memory 'w12_15_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_15_V[2n]
//                  bit [25:16] - w12_15_V[2n+1]
//                  others      - reserved
// 0x540 ~
// 0x55f : Memory 'w12_16_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_16_V[2n]
//                  bit [25:16] - w12_16_V[2n+1]
//                  others      - reserved
// 0x560 ~
// 0x57f : Memory 'w12_17_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_17_V[2n]
//                  bit [25:16] - w12_17_V[2n+1]
//                  others      - reserved
// 0x580 ~
// 0x59f : Memory 'w12_18_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_18_V[2n]
//                  bit [25:16] - w12_18_V[2n+1]
//                  others      - reserved
// 0x5a0 ~
// 0x5bf : Memory 'w12_19_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_19_V[2n]
//                  bit [25:16] - w12_19_V[2n+1]
//                  others      - reserved
// 0x5c0 ~
// 0x5df : Memory 'w12_20_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_20_V[2n]
//                  bit [25:16] - w12_20_V[2n+1]
//                  others      - reserved
// 0x5e0 ~
// 0x5ff : Memory 'w12_21_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_21_V[2n]
//                  bit [25:16] - w12_21_V[2n+1]
//                  others      - reserved
// 0x600 ~
// 0x61f : Memory 'w12_22_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_22_V[2n]
//                  bit [25:16] - w12_22_V[2n+1]
//                  others      - reserved
// 0x620 ~
// 0x63f : Memory 'w12_23_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_23_V[2n]
//                  bit [25:16] - w12_23_V[2n+1]
//                  others      - reserved
// 0x640 ~
// 0x65f : Memory 'w12_24_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_24_V[2n]
//                  bit [25:16] - w12_24_V[2n+1]
//                  others      - reserved
// 0x660 ~
// 0x67f : Memory 'w12_25_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_25_V[2n]
//                  bit [25:16] - w12_25_V[2n+1]
//                  others      - reserved
// 0x680 ~
// 0x69f : Memory 'w12_26_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_26_V[2n]
//                  bit [25:16] - w12_26_V[2n+1]
//                  others      - reserved
// 0x6a0 ~
// 0x6bf : Memory 'w12_27_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_27_V[2n]
//                  bit [25:16] - w12_27_V[2n+1]
//                  others      - reserved
// 0x6c0 ~
// 0x6df : Memory 'w12_28_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_28_V[2n]
//                  bit [25:16] - w12_28_V[2n+1]
//                  others      - reserved
// 0x6e0 ~
// 0x6ff : Memory 'w12_29_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_29_V[2n]
//                  bit [25:16] - w12_29_V[2n+1]
//                  others      - reserved
// 0x700 ~
// 0x71f : Memory 'w12_30_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_30_V[2n]
//                  bit [25:16] - w12_30_V[2n+1]
//                  others      - reserved
// 0x720 ~
// 0x73f : Memory 'w12_31_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w12_31_V[2n]
//                  bit [25:16] - w12_31_V[2n+1]
//                  others      - reserved
// 0x740 ~
// 0x75f : Memory 'b12_V' (12 * 10b)
//         Word n : bit [ 9: 0] - b12_V[2n]
//                  bit [25:16] - b12_V[2n+1]
//                  others      - reserved
// 0x760 ~
// 0x77f : Memory 'w23_V' (12 * 10b)
//         Word n : bit [ 9: 0] - w23_V[2n]
//                  bit [25:16] - w23_V[2n+1]
//                  others      - reserved
// 0x780 ~
// 0x787 : Memory 'b23_V' (1 * 10b)
//         Word n : bit [ 9: 0] - b23_V[2n]
//                  bit [25:16] - b23_V[2n+1]
//                  others      - reserved
// 0x788 ~
// 0x78f : Memory 'out_V' (1 * 10b)
//         Word n : bit [ 9: 0] - out_V[2n]
//                  bit [25:16] - out_V[2n+1]
//                  others      - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
```

So, we are going to use a driver similare to the developed [here]](https://github.com/eneriz-daniel/sensorialfusionQNNs/blob/master/Usage.ipynb) to automatically load the model parameters and infer results, we only have to use the ports shown in the `x[ip_name]_hw.h` header.

## The custom driver

A class based on the `DefaultIP` driver is created to contain all the necesary to read and write in the MMIO ports. 

In [27]:
from pynq import DefaultIP
from bitstring import Bits
from math import ceil

class MeatNet(DefaultIP):
    def __init__(self,description):
        super().__init__(description=description)
    
    def set_params(self, path_pesos):
        
        """
        It allows to load the model paramters, the directory to a .npz file must
        be passed, where the paramters are sorted as: 'w01', 'b01', 'w12',
        'b12', 'w23', 'b23'
        """
        w01 = np.load(path_pesos)['w01']
        b01 = np.load(path_pesos)['b01']
        w12 = np.load(path_pesos)['w12']
        b12 = np.load(path_pesos)['b12']
        w23 = np.load(path_pesos)['w23']
        b23 = np.load(path_pesos)['b23']
        
        
        self.write_array(w01[:,0], 0x0040)
        self.write_array(w01[:,1], 0x0080)
        self.write_array(w01[:,2], 0x00c0)
        self.write_array(w01[:,3], 0x0100)
        self.write_array(w01[:,4], 0x0140)
        self.write_array(w01[:,5], 0x0180)
        self.write_array(w01[:,6], 0x01c0)
        self.write_array(w01[:,7], 0x0200)
        self.write_array(w01[:,8], 0x0240)
        self.write_array(w01[:,9], 0x0280)
        self.write_array(w01[:,10], 0x02c0)
        
        self.write_array(b01, 0x0300)
         
        self.write_array(w12[:,0], 0x0340)
        self.write_array(w12[:,1], 0x0360)
        self.write_array(w12[:,2], 0x0380)
        self.write_array(w12[:,3], 0x03a0)
        self.write_array(w12[:,4], 0x03c0)
        self.write_array(w12[:,5], 0x03e0)
        self.write_array(w12[:,6], 0x0400)
        self.write_array(w12[:,7], 0x0420)
        self.write_array(w12[:,8], 0x0440)
        self.write_array(w12[:,9], 0x0460)
        self.write_array(w12[:,10], 0x0480)
        self.write_array(w12[:,11], 0x04a0)
        self.write_array(w12[:,12], 0x04c0)
        self.write_array(w12[:,13], 0x04e0)
        self.write_array(w12[:,14], 0x0500)
        self.write_array(w12[:,15], 0x0520)
        self.write_array(w12[:,16], 0x0540)
        self.write_array(w12[:,17], 0x0560)
        self.write_array(w12[:,18], 0x0580)
        self.write_array(w12[:,19], 0x05a0)
        self.write_array(w12[:,20], 0x05c0)
        self.write_array(w12[:,21], 0x05e0)
        self.write_array(w12[:,22], 0x0600)
        self.write_array(w12[:,23], 0x0620)
        self.write_array(w12[:,24], 0x0640)
        self.write_array(w12[:,25], 0x0660)
        self.write_array(w12[:,26], 0x0680)
        self.write_array(w12[:,27], 0x06a0)
        self.write_array(w12[:,28], 0x06c0)
        self.write_array(w12[:,29], 0x06e0)
        self.write_array(w12[:,30], 0x0700)
        self.write_array(w12[:,31], 0x0720)
        
        self.write_array(b12, 0x0740)
        
        self.write_array(w23.flatten('A'), 0x0760)  
        
        self.write_array(b23, 0x0780)
            
    def pred(self, input_data):

        """
        It allow to infer a result using the passes input,
        that must be an array with the normalized sensors' readings
        """
        
        self.write_array(input_data, 0x0020)
        
        out = Bits(int=self.read(0x0788), length=32)
        return (out[22:]).int/(2**8)
    
    def write_array(self, inputArray, offset):
        """
        Takes the passed array and the initial offeset and writes the data of
        the array starting from that address using the following format:
           Memory 'in_V' (11 * 10b)
           Word n : bit [ 9: 0] - in_V[2n]
                    bit [25:16] - in_V[2n+1]
                    others      - reserved
        """
        i=0
        
        if len(inputArray) == 1:
            self.write(offset,('0b0000000000000000'+
                                   '0b000000'+Bits(int=int(round(inputArray[0]*(2**8))), length=10)).int)
            return
        
        for i in range(len(inputArray)//2): #Every two parameters
            
            # The integer (32 bits) correspoding to each addres is calculated.._hw.h
            try:
                self.write(offset,('0b000000'+Bits(int=int(round(inputArray[2*i+1]*(2**8))), length=10)+
                                   '0b000000'+Bits(int=int(round(inputArray[2*i]*(2**8))), length=10)).int)
            except:
                print(inputArray[2*i], inputArray[2*i+1])
                print(round(inputArray[2*i]*(2**8)), round(inputArray[2*i+1]*(2**8)))
                raise
            
            offset+=4
        
        if len(inputArray)%2!=0:
            # The integer (32 bits) correspoding to each addres is calculated.._hw.h
            try:
                i+=1
                self.write(offset,('0b0000000000000000'+
                                   '0b000000'+Bits(int=int(round(inputArray[2*i]*(2**8))), length=10)).int)
            except:
                print(hex(offset))
                print(inputArray[2*i], inputArray[2*i+1])
                print(round(inputArray[2*i]*(2**8)), round(inputArray[2*i+1]*(2**8)))
                print('0b000000'+Bits(int=int(round(inputArray[2*i+1]*(2**8))), length=10)+
                                   '0b000000'+Bits(int=int(round(inputArray[2*i]*(2**8))), length=10))
                raise
        
 
        
    
    bindto = [overlay.ip_dict['MeatNet_0']['type']] #This must be 'xilinx.com:hls:MeatNet:1.0'

In [28]:
overlay = Overlay('/home/xilinx/pynq/overlays/MeatNet102/MeatNet102.bit')

In [30]:
model = overlay.MeatNet_0

The parameters are loaded

In [31]:
model.set_params('./FPGA/Brisket/params-quant.npz')

The data is loaded

In [32]:
t, rTVC, pyTVC = np.loadtxt('./FPGA/Brisket/test.txt', skiprows=1, usecols=(0,1,2), unpack=True)
sens = np.loadtxt('./FPGA/Brisket/test.txt', skiprows=1, usecols=(3,4,5,6,7,8,9,10,11,12,13))

Trying to infer something

In [33]:
elto=230

In [37]:
print('FPGA:    {}\nReal:    {}\nPyTorch: {}'.format(model.pred(sens[elto,:]), rTVC[elto], pyTVC[elto]))
elto+=1

FPGA:    0.83984375
Real:    0.8596818286105865
PyTorch: 0.8545974493026733


It works! Let's test all the dataset

In [40]:
overlay = Overlay('/home/xilinx/pynq/overlays/MeatNet102/MeatNet102.bit')
modelo = overlay.MeatNet_0

cut_names = ["Inside-Outside", "Round", "Top Sirloin", "Tenderloin", "Flap meat", "Striploin", "Rib eye", "Skirt meat", "Brisket", "Clod Chuck", "Shin", "Fat"]

for i in range(12):
    t, TVC, PypredTVC = np.loadtxt('./FPGA/{}/test.txt'.format(cut_names[i]), skiprows=1, usecols=(0,1,2), unpack=True)
    sens = np.loadtxt('./FPGA/{}/test.txt'.format(cut_names[i]), skiprows=1, usecols=(3,4,5,6,7,8,9,10,11,12,13))
    
    modelo.set_params('./FPGA/{}/params-quant.npz'.format(cut_names[i]))
    
    TVCFPGA = np.empty_like(TVC)
    for j in range(len(t)):
        TVCFPGA[j] = modelo.pred(sens[j, :]);
    
    np.savez('./FPGA/{}/FPGA-results.npz'.format(cut_names[i]), t=t, TVC=TVC, PypredTVC=PypredTVC, TVCFPGA=TVCFPGA)