# Predictive Coding

Although earlier work in the 1980s (and even theory work back in the 1860s!) layed out a lot of the components already before, predictive coding was popularized in january 1999 in the paper "Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects" ([Rao and Ballard 1999](https://www.nature.com/articles/nn0199_79)). 

The neuroscientists and computer scientist had studied visual processing in the visual cortex. In the late 90s, there was already well-established findings on single neuron responses to visual stimuli in the visual cortex. However, there were some peculiar properties like endstopping of receptive fields, that were not quite explained yet. To understand more, from where Rao and Ballard were coming from, let's do a very brief crash course on the visual cortex circuitry up until 1999.

## Neuroscience background

- Hubel and Wiesel 1965/1968 show bar-responses in early visual cortex of cats / monkeys.
- These tuned neurons show a tendency for 'endstopping / end-inhibition', where the response is reduced or eliminated, if the stimulus is extended over more than just the receptive field (RF). These 'extra-classical' RF neurons occur in V1, V2, V4, and MT
- Early explanations of extra-classical RFs argued for visual arguments like corner detection and line termination (Bolz, Jürgen, Charles D. Gilbert, and Torsten N. Wiesel. "Pharmacological analysis of cortical circuitry." Trends in Neurosciences 12.8 (1989): 292-296.) or occlusion (Gorea, Andrei, ed. Representations of vision: Trends and tacit assumptions in vision research. Cambridge University Press, 1991.). These purely visual arguments are however hard to extrapolate to other cortical areas and rather treat the visual cortex as a very particular area of the brain without generalisable properties.
- Early concepts of what later became known as predictive coding where already found in 'pre-visual-cortex' areas of the nervous system:
    - The retina 
        - Could information theory provide an ecological theory of sensory processing?, JJ Atick - Network: Computation in neural systems, 1992
        - Trichromacy, opponent colours coding and optimum colour information transmission in the retina, G Buchsbaum, A Gottschalk Proceedings of the Royal society of London. Series B …, 1983
        - Predictive coding: a fresh view of inhibition in the retina MV Srinivasan, SB Laughlin, A Dubs - Proceedings of the Royal Society of London …, 1982

    - The Lateral geniculate nucleus (LGN) 
        - Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory, Y Dan, JJ Atick, RC Reid - Journal of neuroscience, 1996
        - Temporal decorrelation: a theory of lagged and nonlagged responses in the lateral geniculate nucleus, DW Dong, JJ Atick - Network: Computation in neural systems, 1995


## Predictive Coding - A model of the visual cortex

Rao and Ballard propose a 'predictive coding model' to try and explain the 'endstopping' phenomenon of neurons in the visual cortex circuitry. In contrast to other approaches to explain the phenomenon, the model proposed by the two is a gerenal one and can easily be generalised to coding mechanisms of any sort, even outside the visual perception. 


The key idea, that the two propose, is that the brain tries to predict the upcoming stimulus based on previous information and only reacts to errors in the prediction. These drastically changes the way of thinking: Previously, the predominant school of thought was, that inputs to a neural network are somehow modified and compressed or expanded, as well as combined with other inputs to compute some new representation. In this view, the information flowing through the network always encodes some representation of the input information. The view of Rao and Ballard takes a different direction and proposes, that the information flowing through the network in a forward direction is in the form of an error signal, whereas the backward information encodes the prediction of the next input from the network. The authors explain this idea using this schematic:

![Rao and Ballard 1999, fig1a](attachment:image.png)

Let's only focus on the left half of the image and actually start a the 'predictive estimator' block to understand what is going on: Given some signal, that enters the predictive estimator block, the network produces some prediction signal, that is sent backwards through the network. This prediction signal arrives at inhibitory synapses in the input block. This means that all the correctly predicted input is inhibited, and only the incorrectly predicted part of the input flows forwards. This is what Rao and Ballard call the 'feedforward error signal'. This error signal arrives at the predictive estimator, which in turn produces it's next prediction and the loop continues. The authors also give an efficiency argument for their concept:

> "The approach postulates that neural networks learn the statistical regularities of the natural world, signaling deviations from such regularities to higher processing centers. This reduces redundancy by removing the predictable, and hence redundant, components of the input signal."

They designed their model in a hierarchical way, meaning that the error signal of one predictive estimator can be thought of as another input to a second predictive estimator, which role it is in turn to predict the activity of the first estimator. In this way, the whole model consists of modules that are predicting some other modules activity, and the activity that is flowing through the network is either 'feedforward error signals' or 'feedback predictions'.

In [2]:
import torch
import torch.nn as nn

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x