# Numpy acceleration example

Jan Marjanovic, december 2021

This Jupyter notebook is a proof-of-concept for accelerating Numpy with FPGA. It uses [AXI Proxy](https://github.com/j-marjanovic/chisel-stuff/tree/master/example-12-axi-proxy) to read and write from and to a Numpy array.

![AXI Proxy in Vivado IP Integrator](axi_proxy_in_vivado.png)

## Logging

In [1]:
import logging

logging.basicConfig(level=logging.INFO)

## FPGA configuration

Before running the script we have to make sure that the FPGA has the right image. The Yocto should have installed the FPGA bitstream in `/lib/firmware/xilinx/axi_proxy/`.

In [2]:
from fpga_mngr_interface.FpgaManagerInterface import FpgaManagerInterface

fmi = FpgaManagerInterface()

if not fmi.is_programmed():
    print("Programming FPGA...")
    fmi.program_bitstream(
        "/lib/firmware/xilinx/axi_proxy/axi_proxy.bit.bin",
        "/lib/firmware/xilinx/axi_proxy/axi_proxy.dtbo")
    print("Programming done.")
else:
    print("FPGA already programmed.")

FPGA already programmed.


## Basic config

We prepare the module to control the AXI Proxy IP; this will allow us to perform AXI transactions on the PL/PS ports.

In [3]:
from zynqmp_pl_ps import UioDev, AxiProxy

In [4]:
INTERFACE = "hpc"

uio_dev = UioDev.get_uio_dev_file("AxiProxy", INTERFACE)
axi_proxy = AxiProxy.AxiProxy(uio_dev)

In [5]:
axi_proxy.print_info()

INFO:_AxiProxy:id  = a8122081
INFO:_AxiProxy:ver = 00000301


In [6]:
axi_proxy.config_axi(cache=0xF, prot=0x2, user=0x1)

## Numpy array

We now create a Numpy array which we will try to modify from the FPGA. We use the [Array Interface](https://numpy.org/doc/stable/reference/arrays.interface.html) to get the (virtual) memory address of the data.

In [7]:
import numpy as np

In [8]:
xs = np.arange(16, dtype=np.int32)
xs

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int32)

In [9]:
vaddr, read_only = xs.__array_interface__["data"]
print(f"Data virtual addr = {vaddr:#x}, read only = {read_only}")

Data virtual addr = 0xaaaac9290340, read only = False


## Virtual address to physical address

The FPGA and the AXI interconnect work with physical addresses, while the software uses virtual addresses. Here we use the [pagemap](https://www.kernel.org/doc/Documentation/vm/pagemap.txt) interface to obtain the physical address of the Numpy data store.

In [10]:
import os
import mmap
import struct

class VirtualToPhysical:
    def __init__(self, pid):
        self.fd = os.open(f"/proc/{pid}/pagemap", os.O_RDONLY)

    def get_paddr(self, vaddr):
        vpn = vaddr // mmap.PAGESIZE
        bs = os.pread(self.fd, 8, vpn * 8)
        data = struct.unpack("Q", bs)[0]
        pfn = data & ((1 << 55) - 1)
        return pfn * mmap.PAGESIZE + (vaddr % mmap.PAGESIZE)

In [11]:
vtp = VirtualToPhysical(os.getpid())
paddr = vtp.get_paddr(vaddr)
print(f"{vaddr:#x} -> {paddr:#x}")

0xaaaac9290340 -> 0x1a705340


## Reading the data

We first use the AXI proxy to read the data from the Numpy array.

In [12]:
data, dur_read = axi_proxy.read(paddr)
print(f"Read duration {dur_read} cycles")

Read duration 72 cycles


In [13]:
data

[0, 1, 2, 3]

We see that we have managed to read the data from the Numpy array.

## Writting the data

Now we try performing the same operation in the other direction, i.e. writting into the Numpy array from the FPGA.

In [14]:
dur_write = axi_proxy.write(paddr, [100, 200, 300, 400])
print(f"Write duration {dur_write} cycles")

Write duration 47 cycles


In [15]:
xs

array([100, 200, 300, 400,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0], dtype=int32)

We see that the content of the Numpy array has changed. Keep in mind that we do a full 64-byte write (one full cache line), but the IP only allows setting the first 16 bytes (first 4 `int32` samples), the remaining 48 bytes are set to 0.

## Conclusion

We have demonstrated that we can use a PL/PS port to modify the content of a Numpy array in a cache-aware fashion. While the example presented here is very primitive, it provides all building blocks to develop custom accelerators.