# Optimizing structural manipulations with semantic pointers

The purpose of this notebook is to illustrate how Nengo DL can be used to optimize the parameters of a Nengo model so as to more effectively support the retrieval of information from highly structured [semantic pointers](https://www.nengo.ai/build-a-brain/index.html). A related and simpler [example notebook](https://github.com/nengo/nengo_examples/blob/nengo_dl/deeplearning/CircularConvolution-Optimized-SoftLIFRate.ipynb) illustrates using Nengo DL to learn circular convolution, which is the basic binding operation used within Nengo to create semantic pointers with internal structure (i.e., pointers that arrange symbol-like representations into things such as lists and trees). Here, we will provide semantic pointers comprised of numerous bound items as inputs to a simple network, and then optimize the network's paramters such that when a particular cue is presented, the item associated with this cue in a particular pointer is produced as the network's output.

In [None]:
import string
import nengo
import nengo.spa as spa
import nengo_dl
import numpy as np
import tensorflow as tf

from urllib.request import urlretrieve
import zipfile

import matplotlib.pyplot as plt
%matplotlib inline

## 1. Generate random semantic pointers

The first thing to do is define a function that produces random examples of structured semantic pointers, cues, and the outputs that correspond to these cues. Below, a generation function is defined with parameters that determine how many examples are produced (`n_items`), how many bound pairs each example contain (`n_pairs`), and the vector dimensionality (`dims`). By default, the returned cues and outputs are from the first binding in each example. (Note that since the generated bindings are all random, it doesn't really matter which one is picked for creating the target cue-output pairing in this simple case. But if the cues have a specific interpretation, randomization can help insure that many different cues are included in the generated data) Each example consists of a collection of role-filler pairs of the following form: 

$SP_{TRACE} = Role_A \circledast Filler_A + Role_B \circledast Filler_B + Role_C \circledast Filler_C$ 

where terms like $Role_A$ refer to simpler semantic pointers (i.e., distributed representations), the $\circledast$ symbol denotes circular convolution, and the $SP_{TRACE}$ subscript highlights that the resulting pointer is a compressed or lossy encoding, in accordance with research on the nature of [holographic reduced representations](https://pdfs.semanticscholar.org/6456/98cdb52b0f1fdcac55da91e56f7ffd935d15.pdf).

So, for a given cue (e.g., $Role_B$, the correct output or item to retrieve would be the corresponding filler (i.e., $Filler_B$). The model we'll build will perform such retrieval by performing a computation of the following sort:

$SP_{TRACE} \:\: \circledast \sim Role_A \approx Filler_A$

Then, with the Nengo DL simulator, we'll optimize the model's parameters to help ensure that this computation is performed with high degree of accuracy (i.e., such that the presented cue always produces the correct model output). 

In [None]:
def get_data(n_items, n_pairs, dims, seed, randomize_cues=False):

    state = np.random.RandomState(seed)
    vocab = spa.Vocabulary(dimensions=dims, rng=state, max_similarity=1)
    
    # create keys for identifying roles and fillers in each example
    roles = ['ROLE_' + char for char in string.ascii_uppercase[:n_pairs]]
    fills = ['FILL_' + char for char in string.ascii_uppercase[:n_pairs]]
    
    # initialize arrays of shape (n_inputs, n_steps, dims)
    traces = np.zeros((n_items, 1, dims))
    cues = np.zeros((n_items, 1, dims))
    targets = np.zeros((n_items, 1, dims))
    
    # iterate through all of the examples to be generated
    for n in range(n_items):
        n_roles = [role + str(n) for role in roles]
        n_fills = [fill + str(n) for fill in fills]
        
        pairs = zip(n_roles, n_fills)
        pair_keys = []
        
        # create a binding key for each pair and add bound items to vocab
        for x, y in pairs:
            pair_keys.append(x + '*' + y)
            vocab.add(x, vocab.create_pointer())
            vocab.add(y, vocab.create_pointer())

        # create key for the 'trace' of bound pairs (i.e. a structured SP)
        trace_key = 'TRACE_' + str(n)
        trace_ptr = vocab.parse('+'.join(pair_keys))
        trace_ptr.normalize()
        vocab.add(trace_key, trace_ptr) 
        
        # pick which bound pair to use role for cue and filler for output
        val = n_pairs - 1 if n_pairs >= 2 else 1
        idx = np.random.randint(0, val, 1)[0] if randomize_cues else 0
        
        # fill array elements correspond to this example
        traces[n, 0, :] = vocab[trace_key].v
        cues[n, 0, :] = vocab[n_roles[idx]].v
        targets[n, 0, :] = vocab[n_fills[idx]].v

    return traces, cues, targets, vocab

## 2. Define the model

Next, we'll define a Nengo model that retrieves items from structured semantic pointers that are provided as input. We'll also produce some data for testing retrieval accuracy. Note that in this notebook, we won't use the Nengo [SPA library](https://github.com/nengo/nengo_spa) for defining model compenents, since it is useful to first understand what is going on strictly in terms of basic ensembles and connections. 

In [None]:
seed = 98
dims = 32

n_inputs = 20
n_pairs = 2

pointers, cues, targets, vocab = get_data(n_inputs, n_pairs, dims, seed=seed)

with nengo.Network(seed=seed) as net:
    # use rectified linear neurons to ensure differentiability
    net.config[nengo.Ensemble].neuron_type = nengo.RectifiedLinear()
    net.config[nengo.Connection].synapse = None
    
    # provide a pointer and a cue as input to the network
    ptr_inp = nengo.Node(vocab['TRACE_0'].v)
    cue_inp = nengo.Node(vocab['ROLE_A0'].v)
    
    # create a convolution network to use the cue to retrieve some item
    cconv = nengo.networks.CircularConvolution(5, dims, invert_b=True)
    
    # connect the trace and cue inputs to the circular convolution network
    nengo.Connection(ptr_inp, cconv.input_a)
    nengo.Connection(cue_inp, cconv.input_b)

    # probe the output
    out = nengo.Probe(cconv.output)
    out_filtered = nengo.Probe(cconv.output, synapse=0.01)

## 3. Test baseline retrieval accuracy

Because the decoded outputs of the network are vectors, we'll first define a simple function that can be used to evaluate whether these decoded outputs are nearest to the vectors corresponding to each target output (i.e., the semantic pointer for the filler associated with the particular role provided as an input cue).  

In [None]:
def accuracy(sim, probe, vocab, targets, t_step=-1):
    # provide a simulator instance, the probe being evaluated, the vocab,
    # the target vocab keys, and the time step at which to evaluate

    # determine batch_size and create (batch_size, dims) array of outputs
    bsize = sim.data[probe].shape[0]
    output = sim.data[probe][:, t_step, :]
    output = np.reshape(output, (bsize, vocab.dimensions))

    # compute similarity between each output and vocab item, then get key
    # for the vocab item with highest similarity to each output
    sims = np.dot(vocab.vectors, output.T)
    idxs = np.argmax(sims, axis=0)
    predicted = [vocab.keys[idxs[i]] for i in range(len(idxs))]

    # compare the targets to predicted, return percent accuracy
    pairs = list(zip(targets, predicted))
    ratio = sum([x == y for x, y in pairs]) / len(pairs)

    return ratio

Now, we can run the model on some test data to see what the baseline retrieval accuracy is. Since we used only a small number of neurons for each product computation in the circular convolution network, we should expect mediocre results.

In [None]:
# create a list of target keys for the outputs produced by each input cue
target_keys = ['FILL_A' + str(n) for n in range(n_inputs)]

# create input and output data feeds for running the Nengo DL simulator 
test_inputs = {ptr_inp: pointers, cue_inp: cues}
test_outputs = {out: targets}

with nengo_dl.Simulator(net, minibatch_size=n_inputs, seed=seed) as sim:
    # run the simulator for one time step to compute the network outputs 
    sim.step(input_feeds=test_inputs)

print('Retrieval accuracy: ', accuracy(sim, out, vocab, target_keys))

These results indicate that the model is only performing accurate retrieval ten percent of the time, which means that this network is not very capable of manipulating structured semantic pointers in a useful way. 

We can also run the simulator with the default inputs specified in the model definition to create a plot for visualizing the retrieval procedure. 

In [None]:
with nengo_dl.Simulator(net, seed=seed) as sim:
    sim.run(0.1)

In [None]:
plt.figure(figsize=(8, 5))
output_vocab = vocab.create_subset(["FILL_A%d" % i for i in range(10)]) 
plt.plot(sim.trange(), nengo.spa.similarity(sim.data[out_filtered], output_vocab))
plt.legend(output_vocab.keys, loc=4)
plt.ylim([-1, 1])
plt.xlabel("t [s]")
plt.ylabel("Similarity");

Recall that in the model definition above, we provided `ROLE_A0` as the default input cue, in which case the correct output is `FILL_A0`. The actual output, by comparison, is not particularly similary to this desired output, which illustrates that the model is not performing accurate retrieval. 

## 4. Optimize the model parameters

Now, we'll train the network parameters to optimize retrieval accuracy by trying minimize the mean squared error 
between the model's output vectors and the vectors corresponding to the correct output items for each input cue. We'll use a large number of training examples that are distinct from our test data, so as to avoid explicitly fitting the model parameters to the test items. 

To make the example run a bit quicker, we'll download some pretrained model parameters by default. Set `do_training = True` to train the model yourself.

In [None]:
sim = nengo_dl.Simulator(net, minibatch_size=20, seed=seed)

# pick an optimizer and learning rate
optimizer = tf.train.RMSPropOptimizer(1e-3)    

do_training = False
if do_training:
    # create training data and data feeds
    pointers, cues, targets, _ = get_data(n_items=5000, n_pairs=2, dims=dims, seed=seed+1)
    train_inputs = {ptr_inp: pointers, cue_inp: cues}
    train_outputs = {out: targets}

    # train the model
    sim.train(train_inputs, train_outputs, optimizer, n_epochs=100, objective='mse')
    sim.save_params('./spa_retrieval_params')

else:
    # download pretrained parameters
    urlretrieve(
        "https://drive.google.com/uc?export=download&id=0BxRAh6Eg1us4aGo3LVdkYm5xeFE",
        "spa_retrieval_params.zip")
    with zipfile.ZipFile("spa_retrieval_params.zip") as f:
        f.extractall()
        
    # load parameters
    sim.load_params('./spa_retrieval_params')
    

## 5. Test improved retrieval accuracy

We can now recompute the network outputs with our test data using the trained model. As these results illustrate, it is possible to boost retrieval accuracy from approximately 10% percent to 95% using a relatively small amount of training data. You can modify the dimensionality of the SPs and the number of bound pairs in each SP to determine how these variables influence the upper bound on retrieval accuracy.

In [None]:
sim.step(input_feeds=test_inputs)
print('Retrieval accuracy: ', accuracy(sim, out, vocab, target_keys))

# reset and run for 100 milliseconds to create plot
sim.soft_reset()
sim.run(0.1)

In [None]:
plt.figure(figsize=(8, 5))
plt.plot(sim.trange(), nengo.spa.similarity(sim.data[out_filtered][0], output_vocab))
plt.legend(output_vocab.keys, loc=4)
plt.ylim([-1, 1])
plt.xlabel("t [s]")
plt.ylabel("Similarity");

sim.close()

Notice that in the plot above, the output is now most similar to `FILL_A0`, which is the correct output for the default input cue `ROLE_A0`. 

In the next tutorial, we'll look at optimizing the temporal trajectory of a similar model, in which a structured SP is built up over time by binding together sequentially presented input items. 