# Python Cache/Memory Error

This notebook replicates the problem being seen when the SDSoC hardware function is called in a loop, and some computation on the input and outputs is being done in Python each iteration. The aim is to replicate the issue in as few lines as possible.

## 1. Load Bitstream and setup Xlnk and CFFI interfaces:
This is how we allocate contiguous memory and bind to C-callable libraries.

In [1]:
from pynq import Overlay, PL, Xlnk
import cffi

bitstream = "/home/xilinx/jupyter_notebooks/pynq-ekf/extras/cache-error/ekf_n2m2.bit"
library = "/home/xilinx/jupyter_notebooks/pynq-ekf/extras/cache-error/libekf_n2m2.so"
ffi_interface = "void _p0_top_ekf_1_noasync(int obs[2], int fx_i[2], int hx_i[2], int F_i[4], int H_i[4], int params[14], int output[2], int ctrl, int w1, int w2);"

# Download bitstream
ol = Overlay(bitstream)

# Setup CFFI and Xlnk
_ffi = cffi.FFI()
interface = _ffi.dlopen(library)
_ffi.cdef(ffi_interface)
xlnk = Xlnk()

## 2. Error Function:
This function does the following:

1. Allocates contiguous memory buffer, and then in a loop:
2. Copies data into buffer
3. FPGA accelerator reads from and writes to buffer

This is about the smallest amount of code required to replicate the error.

In [2]:
import os
import numpy as np

def run():
    
    #reset xlnk
    xlnk.xlnk_reset()
    
    # max_val
    max_val=2147483647
    
    # set rng
    np.random.seed(23)
      
    # I/O buffers
    outData = xlnk.cma_array(shape=(100, 2), dtype=np.int32)
    inData = xlnk.cma_array(shape=(2,), dtype=np.int32)
    params = xlnk.cma_array(shape=(14,), dtype=np.int32)
    F = xlnk.cma_array(shape=(4,), dtype=np.int32)
    H = xlnk.cma_array(shape=(4,), dtype=np.int32)
    fx = xlnk.cma_array(shape=(2,), dtype=np.int32)
    hx = xlnk.cma_array(shape=(2,), dtype=np.int32)
    
    np.copyto(params, np.random.randint(max_val, size=14, dtype=np.int32))
    np.copyto(F, np.random.randint(max_val, size=4, dtype=np.int32))
    np.copyto(H, np.random.randint(max_val, size=4, dtype=np.int32))    
    np.copyto(fx, np.zeros(2).astype(np.int32))
    np.copyto(hx, np.zeros(2).astype(np.int32))

    #-------------------------------------------------------------------------
    
    # Algorithm starts
    np.copyto(inData, np.random.randint(max_val, size=2, dtype=np.int32))
    np.copyto(fx, np.random.randint(max_val, size=2, dtype=np.int32))
    np.copyto(hx, np.random.randint(max_val, size=2, dtype=np.int32))

    offset = 0
    out_ptr = outData.pointer + offset

    interface._p0_top_ekf_1_noasync(inData.pointer, fx.pointer,
                                                 hx.pointer, F.pointer,
                                                 H.pointer, params.pointer,
                                                 out_ptr, 0, 2, 2)
    for i in range(99):
        #os.system("sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ")
        
        np.copyto(inData, np.random.randint(max_val, size=2, dtype=np.int32))
        np.copyto(fx, np.random.randint(max_val, size=2, dtype=np.int32))
        np.copyto(hx, np.random.randint(max_val, size=2, dtype=np.int32))

        offset += 8
        out_ptr = outData.pointer + offset

        interface._p0_top_ekf_1_noasync(inData.pointer, fx.pointer,
                                                 hx.pointer, F.pointer,
                                                 H.pointer, params.pointer,
                                                 out_ptr, 1, 2, 2)

    return outData

## 3. Checking the error:

We can execute `def run()` twice, and check for equivalence. If the cache is cleared each iteration using `os.system("sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' ")` then we get identical results.

In [3]:
out = run()
outA = out*1 # fromBuffer

out = run()
outB = out*1 # fromBuffer

print(np.array_equal(outA, outB))

True


We can also print the errors and outputs:

In [4]:
print((outA-outB)[:,0])
print((outA)[-20:,0])
print((outB)[-20:,0])

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[-1658762079  1373097320  1686997839  -604786714   214072831  -537726990
  1281104970  1943631315 -1735902993   494108635  1082973484  1934139322
 -1756123439  -702183237  -563006125 -1169906626  1450915096  1873140528
   707351511  2146224642]
[-1658762079  1373097320  1686997839  -604786714   214072831  -537726990
  1281104970  1943631315 -1735902993   494108635  1082973484  1934139322
 -1756123439  -702183237  -563006125 -1169906626  1450915096  1873140528
   707351511  2146224642]


## 4. C Code:
This problem does not occur when compiling an equivalent C program.

## 4. Solution:
Ensure that only HP ports are used. 

`#pragma SDS data mem_attribute(x: PHYSICAL_CONTIGUOUS|NON_CACHEABLE, y: PHYSICAL_CONTIGUOUS|NON_CACHEABLE)`