<a href="https://colab.research.google.com/github/neworderofjamie/riscv_ise/blob/master/tutorials/mnist_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

There are lots of rough edges here: error checking is lacking in places, the compiler supports an even smaller subset of C than it should and the wrapping of various bits of API is not very Pythonic.

# Installation
The current prototype FeNN toolchain is a little bit tricky to build as it re-uses parts of GeNN (mostly the type system and the GeNNCode scanner, parser and type checker) so, on colab, we can install a prebuilt wheel from my google drive:

In [6]:
if "google.colab" in str(get_ipython()):
    !gdown 1aO3CLhWJeoDXJ-lb7FqrtxDy-7WRsNYK
    !pip install pyfenn-0.0.1-cp312-cp312-linux_x86_64.whl

Downloading...
From: https://drive.google.com/uc?id=1aO3CLhWJeoDXJ-lb7FqrtxDy-7WRsNYK
To: /content/pyfenn-0.0.1-cp312-cp312-linux_x86_64.whl
  0% 0.00/6.55M [00:00<?, ?B/s]100% 6.55M/6.55M [00:00<00:00, 112MB/s]
Processing ./pyfenn-0.0.1-cp312-cp312-linux_x86_64.whl
Installing collected packages: pyfenn
Successfully installed pyfenn-0.0.1


In [27]:
!wget -q https://github.com/neworderofjamie/riscv_ise/raw/refs/heads/master/bin/mnist_bias.bin
!wget -q https://github.com/neworderofjamie/riscv_ise/raw/refs/heads/master/bin/mnist_in_hid.bin
!wget -q https://github.com/neworderofjamie/riscv_ise/raw/refs/heads/master/bin/mnist_hid_out.bin

Install the trusty mnist package so we can easily access a dataset:

In [4]:
!pip install mnist

Collecting mnist
  Downloading mnist-0.2.2-py2.py3-none-any.whl.metadata (1.6 kB)
Downloading mnist-0.2.2-py2.py3-none-any.whl (3.5 kB)
Installing collected packages: mnist
Successfully installed mnist-0.2.2


# Imports
Import a bunch of stuff from PyFeNN:

In [8]:
import mnist
import numpy
import numpy as np

from pyfenn import (BackendFeNNSim, EventContainer, Model, NeuronUpdateProcess,
                    Parameter, ProcessGroup, Runtime, Variable)
from pyfenn.models import Linear, Memset

from pyfenn import init_logging
from pyfenn.utils import get_array_view, get_latency_spikes, load_and_push, zero_and_push
from tqdm.auto import trange

# Layer classes
FeNN is programmed using a small number of primitive objects:
*   ``Processes`` perform computation
*   ``Variables`` are used to hold model state e.g. neuron variables and weights
*   ``EventContainers`` are the primary means of communication between neuron processes

The FeNN tools don't really enforce any particular style of modelling but you can easily use these primitives to create PyTorchesque layer objects. We start by creating a leaky integrator for the output layer. This integrates an input current + bias into a membrane voltage which is averaged over the trial. The update to be performed each timestep is implemented in a ``NeuronUpdateProcess`` which performs the same update to each neuron (as dictated by the same of the variables). In future, these processes might be Just-in-Time compiled from Python but, right now, they are implemented in [GeNNCode](https://genn-team.github.io/genn/documentation/5/custom_models.html#genncode). This is basically a subset of C with extensions for fixed-point types inspired by the [ISO standard extension](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1005.pdf). In the ``LI`` model, this is most obvious is the ``0.0h6`` literal suffix which indicates that this is a fixed point literal with 6 fractional bits (type promotion doesn't work 100% right now...):

In [9]:
class LI:
    def __init__(self, shape, tau_m: float, num_timesteps: int):
        self.shape = shape
        dtype = "s9_6_sat_t"

        self.v = Variable(self.shape, dtype)
        self.i = Variable(self.shape, dtype)
        self.v_avg = Variable(self.shape, dtype)
        self.bias = Variable(self.shape, dtype)
        self.process = NeuronUpdateProcess(
            """
            V = (Alpha * V) + I + Bias;
            I = 0.0h6;
            VAvg += (VAvgScale * V);
            """,
            {"Alpha": Parameter(np.exp(-1.0 / tau_m), dtype),
             "VAvgScale": Parameter(1.0 / (num_timesteps / 2), dtype)},
            {"V": self.v, "VAvg": self.v_avg, "I": self.i, "Bias": self.bias})


The Leaky Integrate-and-Fire model we use for the hidden layer is slightly more complex, but is defined in basically the same way. Because the LIF neuron emits spikes, as well as variables, it has an ``EventContainer`` to manage the emitted spike. In the process code, events are emitted by calling the name of assigned to the event container i.e. ``Spike()``:

In [17]:
class LIF:
    def __init__(self, shape, tau_m: float, tau_refrac: int, v_thresh: float):
        self.shape = shape
        dtype =  "s10_5_sat_t"
        self.v = Variable(self.shape, dtype)
        self.i = Variable(self.shape, dtype)
        self.refrac_time = Variable(self.shape, "int16_t")
        self.out_spikes = EventContainer(self.shape)
        self.process = NeuronUpdateProcess(
            """
            V = (Alpha * V) + I;
            I = 0.0h5;
            if (RefracTime > 0) {
               RefracTime -= 1;
            }
            else if(V >= VThresh) {
               Spike();
               V -= VThresh;
               RefracTime = TauRefrac;
            }
            """,
            {"Alpha": Parameter(np.exp(-1.0 / tau_m), dtype),
             "VThresh": Parameter(v_thresh, dtype),
             "TauRefrac": Parameter(tau_refrac, "int16_t")},
            {"V": self.v, "I": self.i, "RefracTime": self.refrac_time},
            {"Spike": self.out_spikes})

# Parameters

In [11]:
num_examples = 10000
num_timesteps = 79
input_shape = 28 * 28
hidden_shape = 128
output_shape = 10
input_hidden_shape = [input_shape, hidden_shape]
hidden_output_shape = [hidden_shape, output_shape]

# Dataset
Convert MNIST into a latency. Yan LeCun's original site has been down for some time/blocking colab so we override

In [12]:
mnist.datasets_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
mnist_spikes = get_latency_spikes(mnist.test_images())
mnist_labels = mnist.test_labels().astype(np.int16)


  times = np.round(tau * np.log(i / (i - threshold))).astype(int)
  times = np.round(tau * np.log(i / (i - threshold))).astype(int)
  times = np.round(tau * np.log(i / (i - threshold))).astype(int)


# Model definition
The FeNN tools can produce lots of helpful logging information so we initialise this system before we do anything else (if you use ``from pyfenn import PlogSeverity`` to import the enum you can then use e.g. ``PlogSeverity.DEBUG`` to control the logging level):

In [13]:
init_logging()

Input spikes can be directly injected into FeNN rather than needing any sort of layer so define an EventContainer to hold them

In [15]:
input_spikes = EventContainer(input_shape, num_timesteps)

Then create hidden and output layers using the classes we defined above. The fixed-point types are specified as strings, for example s10_5_sat_t is a signed 16-bit fixed point type (this is all FeNN currently supports) with 10 integer and 5 fractional bits to which saturation should be applied (currently only when adding and subtracting):

In [18]:
hidden = LIF(hidden_shape, 20.0, 5, 0.61)
output = LI(output_shape, 20.0, num_timesteps)

Now we connect spiking outputs to input variables using the linear layer class we defined earlier:

In [19]:
input_hidden = Linear(input_spikes, hidden.i, "s10_5_sat_t")
hidden_output = Linear(hidden.out_spikes, output.i, "s9_6_sat_t")

Finally, to reduce the amount of data movement between the FPGA and the CPU, we define a process to zero the classifcation output at the beginning of every trial

In [20]:
avg_zero = Memset(output.v_avg)

Process groups define computation that can be performed in parallel (in fact, on FeNN it's not but this won't be the case with e.g. GPU backends) so we group our neuron update processes, event propagation and zeroing processes into seperate groups

In [21]:
neuron_update_processes = ProcessGroup([hidden.process, output.process])
synapse_update_processes = ProcessGroup([input_hidden.process, hidden_output.process])
zero_processes = ProcessGroup([avg_zero.process])


Now we define a model which groups together all parts of our simulation:

# Simulation
Sadly Google has yet to install FeNN nodes into it's cloud so for now we create a simulation backend (if you are lucky enough to be running on a Kria KV260 with the bitstream loaded, you should substitute ``BackendFeNNHW`` here) and use it to create a generic simulation kernel. The control flow of these kernels *will* be fully programmable but for now you can either create a really simple kernel which just runs a list of process groups or a 'simulation' kernel which offloads running a loop over time with a list of process groups in the body and seperate lists that runs at the beginning (which we use here to zero the output average) and end.

In [24]:
backend = BackendFeNNSim()
model = Model([neuron_update_processes, synapse_update_processes, zero_processes], backend)
code = backend.generate_simulation_kernel([synapse_update_processes, neuron_update_processes],
                                          [zero_processes], [],
                                          num_timesteps, model)

Now we have some code, we create a ``Runtime`` object to interact with the FeNN. We first use this to allocate the memory required for our model on FeNN:

In [25]:
runtime = Runtime(model, backend)
runtime.allocate()

Now we use some helper functions to load weights into the appropriate variables:

In [28]:
load_and_push("mnist_in_hid.bin", input_hidden.weight, runtime)
load_and_push("mnist_hid_out.bin", hidden_output.weight, runtime)
load_and_push("mnist_bias.bin", output.bias, runtime)

and set the remaining variables to zero:

In [29]:
zero_and_push(hidden.v, runtime)
zero_and_push(hidden.i, runtime)
zero_and_push(hidden.refrac_time, runtime)
zero_and_push(output.v, runtime)
zero_and_push(output.i, runtime)

Finally we upload the code generated by the backend to FeNN:

In [30]:
runtime.set_instructions(code)

The ``Runtime`` object creates a bunch of 'Array' objects which are used to interact with model state at runtime. To save typing later on, we look these up now:

In [31]:
input_spike_array, input_spike_view = get_array_view(runtime, input_spikes,
                                                     np.uint32)
hidden_spike_array = runtime.get_array(hidden.out_spikes)

output_v_avg_array, output_v_avg_view  = get_array_view(runtime, output.v_avg, np.int16)

Finally, we're ready to go! Now we can loop through the MNIST digits and:
1.   Copy each digit into the input spike array
2.   Run the kernel
3.   Copy the averaged output voltage back from FeNN
4.   Check whether this matches the correct label

In [32]:
num_correct = 0
for i in trange(num_examples):
    # Copy data to array host pointe
    input_spike_view[:] = mnist_spikes[i]
    input_spike_array.push_to_device();

    # Classify
    runtime.run()

    # Copy output V sum from device
    output_v_avg_array.pull_from_device();

    # Determine if output is correct
    classification = np.argmax(output_v_avg_view)
    if classification == mnist_labels[i]:
        num_correct += 1

print(f"{num_correct} / {num_examples} correct {100.0 * (num_correct / num_examples)}%")

  0%|          | 0/10000 [00:00<?, ?it/s]

9583 / 10000 correct 95.83%


# Disassembling ðŸ˜¥
Sometimes it's cool to know what's happening under the hood so, by using the ``disassemble`` function you can disassemble the code produced be the backend into a slightly friendly form. A slightly outdated description of the instruction set is provided at https://github.com/neworderofjamie/riscv_ise/blob/master/docs/instruction_set.pdf

In [33]:
from pyfenn import disassemble, init_logging
for i, c in enumerate(code):
    print(f"{i * 4} : {disassemble(c)}")

0 : ADDI X1, X0, 0
4 : ADDI X2, X0, 79
8 : LW X3, 60(X0)
12 : VLUI V0, 0
16 : VSTORE V0, 0(X3)
20 : LW X3, 48(X0)
24 : ADDI X5, X0, 16
28 : ADD X4, X5, X3
32 : ADDI X5, X0, 64
36 : ADDI X7, X0, 1
40 : ADDI X6, X0, 31
44 : LW X8, 0(X3)
48 : ADDI X3, X3, 4
52 : BEQ X8, X0, 80
56 : ADDI X9, X6, 0
60 : CLZ X10, X8
64 : BEQ X8, X7, 80
68 : ADDI X11, X10, 1
72 : SLL X8, X8, X11
76 : SUB X9, X9, X10
80 : LW X12, 52(X0)
84 : MUL X13, X9, X5
88 : ADD X12, X12, X13
92 : LW X13, 56(X0)
96 : VLOAD V1, 0(X13)
100 : ADDI X14, X0, 1023
104 : VLOAD V0, 0(X12)
108 : VLOAD V2, 64(X13)
112 : VADD_S V3, V1, V0
116 : VSEL V1, X14, V3
120 : VSTORE V1, 0(X13)
124 : ADDI X9, X9, -1
128 : BNE X8, X0, -68
132 : ADDI X6, X6, 32
136 : BNE X3, X4, -92
140 : BEQ X0, X0, 12
144 : ADDI X8, X0, 0
148 : BEQ X0, X0, -72
152 : LW X3, 36(X0)
156 : ADDI X5, X0, 100
160 : MUL X6, X1, X5
164 : ADD X3, X3, X6
168 : ADD X4, X5, X3
172 : ADDI X5, X0, 256
176 : ADDI X7, X0, 1
180 : ADDI X6, X0, 31
184 : LW X8, 0(X3)
188 : ADDI X