# Training an SNN using surrogate gradients!

Train your first SNN in JAX in less than 10 minutes without needing a heavy-duty GPU!

In [1]:
import spyx
import spyx.nn as snn

# JAX imports
import os
import jax
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = ".70"
from jax import numpy as jnp
import numpy as np

from tqdm import tqdm

# implement our SNN in DeepMind's Haiku
import haiku as hk

# for surrogate loss training.
import optax

# rendering tools
import matplotlib.pyplot as plt
%matplotlib notebook
import graphviz
import mediapy as media

  warn(


## Data Loading

In [2]:
nmnist_dl = spyx.data.NMNIST_loader(64)

In [3]:
nmnist_dl.train_step().obs.shape

(64, 64, 2, 34, 34)

## SNN

Here we define a simple feed-forward SNN using Haiku's RNN features, incorporating our
LIF neuron models where activation functions would usually go. Haiku manages all of the state for us, so when we transform the function and get an apply() function we just need to pass the params!

Since spiking neurons have a discrete all-or-nothing activation, in order to do gradient descent we'll have to approximate the derivative of the Heaviside function with something smoother. In this case, we use the SuperSpike surrogate gradient from Zenke & Ganguli 2017.
Also not that we aren't using bias terms on the linear layers and since the inputs are images, we flatten the data before feeding it to the first layer.

Depending on computational constraints, we can use haiku's dynamic unroll to iterate the SNN, or we can use static unroll where the SNN will be unrolled during the JIT compiling process to further increase speed when training on GPU. Note that the static unroll will take longer to compile, but once it runs the iterations per second will be 2x-3x greater than the dynamic unroll.

In [4]:
def nmnist_snn(x):
    # seqs is [T, F].
    core = hk.DeepRNN([
        hk.Flatten(),
        hk.Linear(784, with_bias=False),
        snn.LIF(784, activation=spyx.activation.SuperSpike()),
        hk.Linear(128, with_bias=False),
        snn.LIF(128, activation=spyx.activation.SuperSpike()),
        hk.Linear(10, with_bias=False),
        snn.LI(10)
    ])
    # static unroll for maximum performance
    spikes, V = hk.static_unroll(core, x.astype(jnp.float32), core.initial_state(x.shape[0]), time_major=False)
    return spikes, V

In [5]:
key = jax.random.PRNGKey(0)
# Since there's nothing stochastic about the network, we can avoid using an RNG as a param!
SNN = hk.without_apply_rng(hk.transform(nmnist_snn))
params = SNN.init(rng=key, x=nmnist_dl.train_step().obs)

In [6]:
print(hk.experimental.tabulate(SNN)(nmnist_dl.train_step().obs))

+-------------------------------------+----------------------------------------------------------------------------------+------------------+---------------------------------------------------------+------------------------------------------------------+---------------+---------------+
| Module                              | Config                                                                           | Module params    | Input                                                   | Output                                               |   Param count |   Param bytes |
| deep_rnn (DeepRNN.initial_state)    | DeepRNN(                                                                         |                  | 64                                                      | (f16[64,784], f16[64,128], f32[64,10])               |           912 |       1.82 KB |
|                                     |     layers=[Flatten(),                                                           |                 

## Gradient Descent

We define a training loop below.

We use the Lion optimizer from Optax, which is a more efficient competitor to the popular Adam. The eval steps and updates are JIT'ed to maximize time spent in optimized GPU code and minimize time spent in higher-level python.

The use of regularizers in the spiking network will be covered in a seperate tutorial.

In [7]:
def gd(SNN, params, dl, epochs=50, test_every=5):
    
    # create and initialize the optimizer
    opt = optax.lion(3e-4)
    opt_state = opt.init(params)
    grad_params = params
        
    # define and compile our eval function that computes the loss for our SNN
    @jax.jit
    def net_eval(weights, events, targets):
        readout = SNN.apply(weights, events)
        traces, V_f = readout
        return spyx.loss.integral_crossentropy(traces, targets)
        
    # Use JAX to create a function that calculates the loss and the gradient!
    surrogate_grad = jax.value_and_grad(net_eval) 
        
    # compile the meat of our training loop for speed
    @jax.jit
    def step(grad_params, opt_state, events, targets):
        # compute loss and gradient
        loss, grads = surrogate_grad(grad_params, events, targets)
        # generate updates based on the gradients and optimizer
        updates, opt_state = opt.update(grads, opt_state, grad_params)
        # return the updated parameters
        return optax.apply_updates(grad_params, updates), opt_state, loss
    
    # For validation epochs, do the same as before but compute the
    # accuracy, predictions and losses (no gradients needed)
    @jax.jit
    def eval_step(grad_params, events, targets):
        readout = SNN.apply(grad_params, events)
        traces, V_f = readout
        acc, pred = spyx.loss.integral_accuracy(traces, targets)
        loss = spyx.loss.integral_crossentropy(traces, targets)
        return acc, pred, loss
        
    # Here's the start of our training loop!
    for gen in range(epochs):
        # make a progress bar with tqdm so things look official
        pbar = tqdm([*range(dl.train_len//dl.batch_size)])
        pbar.set_description("Epoch #{}".format(gen))
        # reset our training data loader so we're at the beginning of the train set
        dl.train_reset()
        for _ in pbar:
            # fetch the batch and the labels
            events, targets = dl.train_step() 
            # compute new params and loss
            grad_params, opt_state, loss = step(grad_params, opt_state, events, targets)
            #update progress bar
            pbar.set_postfix(Loss=loss)
            
        # after a number of epochs, check performance on validation set
        if gen % test_every == test_every-1:
            # reset validation iterator
            dl.val_reset()
            
            # containers for SNN results. Can return these if desired.
            accs = []
            preds = []
            losses = []
            
            # progress bars!
            pbar = tqdm([*range(dl.val_len//dl.batch_size)])
            pbar.set_description("Validating")
            for _ in pbar:
                # get validation batch
                events, targets = dl.val_step()
                # get perfomance on validation batch
                acc, pred, loss = eval_step(grad_params, events, targets)
                # save accuracy, prediction, loss
                accs.append(acc)
                preds.append(pred)
                losses.append(loss)
                # update progress bar, showing running loss and accuracy
                pbar.set_postfix(Loss=np.mean(losses), Accuracy=np.mean(accs))
                
    # return our final, optimized network.       
    return grad_params

In [8]:
def test_gd(SNN, params, dl):
    @jax.jit
    def net_eval(weights, events, targets):
        readout = SNN.apply(weights, events)
        traces, V_f = readout
        return spyx.loss.integral_crossentropy(traces, targets)
    
    @jax.jit
    def eval_step(grad_params, events, targets):
        readout = SNN.apply(grad_params, events)
        traces, V_f = readout
        acc, pred = spyx.loss.integral_accuracy(traces, targets)
        loss = spyx.loss.integral_crossentropy(traces, targets)
        return acc, pred, loss
    
    dl.test_reset()
    accs = []
    preds = []
    losses = []
    pbar = tqdm([*range(dl.test_len//dl.batch_size)])
    pbar.set_description("Validating")
    for _ in pbar:
        events, targets = dl.test_step()
        
        acc, pred, loss = eval_step(grad_params, events, targets)
        
        accs.append(acc)
        preds.append(pred)
        losses.append(loss)
        
        pbar.set_postfix(Loss=np.mean(losses), Accuracy=np.mean(accs))
    
    return accs, preds, losses

## Training Time

We'll only train the network for 15 epochs as it converges to 97.5% accuracy very quickly.

In [9]:
grad_params = gd(SNN, params, nmnist_dl, epochs=30)

Epoch #0: 100%|████████████████████████████████████████| 656/656 [00:55<00:00, 11.81it/s, Loss=1.2976865]
Epoch #1: 100%|████████████████████████████████████████| 656/656 [00:30<00:00, 21.19it/s, Loss=1.2516055]
Epoch #2: 100%|████████████████████████████████████████| 656/656 [00:31<00:00, 20.97it/s, Loss=1.2466475]
Epoch #3: 100%|██████████████████████████████████████████| 656/656 [00:30<00:00, 21.39it/s, Loss=1.26389]
Epoch #4: 100%|████████████████████████████████████████| 656/656 [00:34<00:00, 19.25it/s, Loss=1.2356188]
Validating: 100%|███████████████████████████| 281/281 [00:21<00:00, 12.82it/s, Accuracy=0.966, Loss=1.26]
Epoch #5: 100%|████████████████████████████████████████| 656/656 [00:30<00:00, 21.86it/s, Loss=1.2381525]
Epoch #6: 100%|████████████████████████████████████████| 656/656 [00:45<00:00, 14.53it/s, Loss=1.2045215]
Epoch #7: 100%|████████████████████████████████████████| 656/656 [00:36<00:00, 18.11it/s, Loss=1.2187369]
Epoch #8: 100%|███████████████████████████████

## Evaluation Time

Now we'll run the network on the test set and see what happens:

In [10]:
acc, preds, losses = test_gd(SNN, params, nmnist_dl)

Validating: 100%|███████████████████████████| 156/156 [00:11<00:00, 13.86it/s, Accuracy=0.976, Loss=1.24]


Not bad! Now we can investigate the network's predictions using a confusion matrix or other techniques!