# MIBCI-QCNNs: Usage

This notebook contains the code to test the EEGNet-based model implemented in the Red Pitaya's PL.

In this notebook only one fold is validated. The fold's parameters and its validation dataset must be included, so the following tree its expected to be in the same root as this notebook:
```
global_model/
├── npyparams/
│   ├── conv2d_w.npy
│   ├── dense_b.npy
│   ├── dense_w.npy
│   ├── depthconv2d_w.npy
│   ├── sepdepthconv2d_w.npy
│   └── seppointconv2d_w.npy
└── validationDS/
    ├── X_samples/
    │   ├── X_0.npy
    │   ├── X_1.npy
    │   ├── ·······
    │   └── X_3527.npy
    ├── y_hls_16_8.txt
    ├── y_pred.npy
    └── y_true.npy

```

Make sure to meet the following dependencies for the ARM Cortex-A9 architecture, `armv7l`.

In [1]:
import mmap
import os
import struct
from numpy import clip
import numpy as np
from tabulate import tabulate
import progressbar
import time

Due to some error when using the `accuracy_socre` function of `scikit-learn`, here is an own-implementation of it using `numpy`. It will serve to compute the validation accuracy.

In [2]:
def accuracy_score(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    
    if len(y_true.shape) != 1 or len(y_pred.shape) != 1:
        raise ValueError('Both y_true and y_pred must be 1-dimensional.')
        
    if len(y_true) != len(y_pred):
        raise ValueError('y_true and y_pred must be equally sized.')
    
    return (y_true == y_pred).sum()/len(y_true)

The inputs and outputs of the model can be accessed through the AXI-reserved memory registers, starting from the `0x40000000`. To access these registers from Python, the memory-mapped file `/dev/mem` can be used.

In the next cell a driver class named `overlay` is defined. In its `__init()__` function the bitstream is loaded and the `/dev/mem` is opened with an offset of `0x40000000`, the same offset present in the addresses appearing in the `x<name-of-the-HLS-project>_hw.h` file inside of the `<name-of-the-HLS-project>/solutionX/impl/ip/drivers/<top-func-name>_vX_0/src/` HLS project directory. Then functions there are the definitions to read and write a 16 bits fixed-point value and their generalizaiton for N-dimentional arrays.

In [3]:
class overlay():
    
    def __init__(self, bitfile: str) -> None:
        """Sets the bitfile in the FPGA and opens the `/dev/mem` file to acces the AXI interface.
        """
        
        if(bitfile[-4:] != '.bit'):
            raise ValueError('The overlay must be inititalized with a .bit file.')
        os.system('cat {} > /dev/xdevcfg'.format(bitfile))
        
        fd = os.open('/dev/mem', os.O_RDWR)
        self.m = mmap.mmap(fileno=fd, length=0x1100f+1, offset=0x40000000)
    
    def writefp16(self, addr: int, value: float, BitsInt: int = 8) -> None:
        """Writes a real number as a fixed-point 16-X (16-8 as default) in the addr address.
        """
        self.m[addr:addr+2] = struct.pack('<h', int(clip(round(value*(2**(16-BitsInt))), -2**15, 2**15-1)))
    
    def readfp16(self, addr: int, BitsInt: int = 8) -> float:
        """Reads a real number as fixed-point 16-X (16-8 as default) in the addr address.
        """
        return struct.unpack('<h', self.m[addr:addr+2])[0]*2**-(16-BitsInt)
    
    def write_array(self, initial_addr: int, array: np.ndarray) -> None:
        addr = initial_addr
        for i in range(int(len(array)/2)):
            try:
                self.writefp16(addr, array[2*i])
                self.writefp16(addr+2, array[2*i+1])
                addr += 4
            except:
                print(i)
                print(array[2*i], array[2*i+1])
                raise
    
    def read_array(self, initial_addr: int, array_len: int) -> np.ndarray:
        addr = initial_addr
        array = np.empty(array_len)
        for i in range(int(array_len/2)):
            array[2*i] = self.readfp16(addr)
            array[2*i+1] = self.readfp16(addr+2)
            addr += 4
        
        return array

Firstly, the bitstream is loaded.

In [4]:
MIBCI_QCNN = overlay('MIBCI-QCNNs.bit')

Here are the parameters' names and their AXI addresses.

In [5]:
npyParamsNames = ['conv2d_w', 'depthconv2d_w', 'sepdepthconv2d_w', 'seppointconv2d_w', 'dense_w', 'dense_b']

In [6]:
params_addrs = [0x10000, 0x10400, 0x10900, 0x10a00, 0x10c00, 0x11000]

The model parameters are loaded.

In [7]:
for i, param in enumerate(npyParamsNames):
    MIBCI_QCNN.write_array(params_addrs[i], np.load('global_model/npyparams/{}.npy'.format(param)).flatten())

MIBCI_QCNN.writefp16(0x10200, 0.6)
MIBCI_QCNN.writefp16(0x10800, 0.5)
MIBCI_QCNN.writefp16(0x10a80, 0.4)

Then, the FPGA is ready to be tested in all the validaiton set.

In [8]:
Nsamples = len(os.listdir('global_model/validationDS/X_samples/'))

In [9]:
y_fpga = np.empty(Nsamples)
for i in progressbar.progressbar(range(Nsamples)):
    X = np.load('global_model/validationDS/X_samples/X_{}.npy'.format(i))
    MIBCI_QCNN.write_array(0x08000, X.flatten())
    #time.sleep(0.07)
    y_fpga[i] = np.argmax(MIBCI_QCNN.read_array(0x11008, 4))

100% (3528 of 3528) |####################| Elapsed Time: 1:26:58 Time:  1:26:58


In [10]:
y_hls = np.loadtxt('global_model/validationDS/y_hls_16_8.txt', usecols=[0])[:Nsamples]

In [11]:
y_true = np.load('global_model/validationDS/y_true.npy')[:Nsamples]

In [12]:
y_pred = np.load('global_model/validationDS/y_pred.npy')[:Nsamples]

And the valdiaiton accuracy for each implementation is...

In [13]:
table = [['Keras', accuracy_score(y_true, y_pred)], ['HLS', accuracy_score(y_true, y_hls)], ['FPGA',  accuracy_score(y_true, y_fpga)]]

print(tabulate(table))
print('Nsamples: {}'.format(Nsamples))

-----  --------
Keras  0.680839
HLS    0.681122
FPGA   0.670635
-----  --------
Nsamples: 3528
