title	layout
An illustrated overview of how our brains (might) think — the fascinating intuition of the generative predictive model	page

An illustrated overview of how our brains (might) think — the fascinating intuition of the generative predictive model

“Perception is controlled hallucination.”

— Andy Clark

Paradigm shifts are rare and incredible events. I recently came across an idea that prompted such a shift: introduction to a modern unified theory of perception. Specifically, that our brain’s primary function is to reduce surprise by developing an increasingly nuanced model of the world. It is constantly predicting the future (on multiple time scales), noticing incorrect predictions, and updating its model to explain away the difference.

Before I came across any of these ideas, I had no sense of the current state of the science around perception and consciousness. Now that I’ve seen the tip of the iceberg, I‘ve begun to believe that modern theories of mind across multiple disciplines (psychology, neuroscience, data science & machine learning, not to mention multiple schools of philosophical thought) are beginning to resolve into something that feels much more unified, and surprisingly intuitive. By definition, perception mediates every aspect of human experience, and so any changes to the way we understand perception comes with a huge breadth of implications (which I would argue extend beyond the personal to the societal).

These theories come with many names, but I find Andy Clark’s term ‘generative predictive model’ most descriptive. As a disclaimer, much of the below is still speculative theory, though theory supported by a growing body of evidence, so only the future will inform us of the specific errors in this paradigm itself (meta!). My purpose here is to visually share the basics of my understanding in case others who haven’t seen these ideas may find them similarly intriguing.

Building Blocks — Neurons and Spikes

We’ve been aware of the significant role of neurons in how brains (and the entire nervous system) work since the 1800’s (!), thanks to the work of Santiago Ramon y Cajal, sometimes known as the father of modern neuroscience.

Those looking to understand the neuron in order to analyze and predict its function have created an array of simplified models that abstract away some or all of the lower level biochemical dynamics. Artificial neural networks, and their modern evolution, deep learning (and deep reinforcement learning), are a category of machine learning techniques that has brought to the world increasingly impressive computational feats (e.g. beating humans at image recognition, Chess, Go, old-school video games, and Texas Hold-em, among others). These networks are built from a vastly simplified neuron that does nothing more than take a bunch of inputs, perform some addition and multiplication, and pass them on. In these cases, of course, it is the structure of the network (the way the neurons are connected) and the algorithm by which these networks learn, that enable advanced capabilities.

Three views of a neuron. Left: real neuron, calcium imaging, center: illustration, right: schematic.

In reality, an incredibly complex system of biochemical pathways are at work in any given neuron, but it can be useful to zoom out a bit, and view neurons as information processing units.

Information is passed in from other neurons as voltage spikes through the vast branching network of roots (dendrites), and moves towards the cell body. If the right concurrence of these signals arrives at the right time, the cell will be activated, and pass on this voltage spike down its axon (the main outgoing channel which can be up to a meter in length for motor neurons), to the dendrites of other downstream neurons.

Propagation of action potential (voltage spike) down an axon

The neuron’s anatomy allows it to precisely detect patterns, possibly many thousands distinctly, and pass on their detection to any other neurons it is connected with.

Hierarchy

There is good evidence that our neocortex (the millimeters thick dinner-napkin sized sheet of tissue that evolved most recently in mammal brains) is organized hierarchically, with the lower levels handling raw sensory input — data from our eyes, ears, nose, skin, muscles, etc — and each higher region taking inputs from the outputs of the regions below. There is also evidence that each region, regardless of which senses it receives information from, is performing exactly the same kinds of processing: A) finding and encoding relevant structure in its inputs, B) building a model to explain the structure seen, and using this model to C) predict future events.

A) Finding Structure (pattern recognition)

In the nomenclature of Hawkins et al at Numenta, finding structure is a process of spatial pooling. Individual neurons in a region (as introduced above) learn spatial patterns of incoming inputs. For example, a neuron can learn to detect the coincidence of a few thousands incoming messages from the region below. It does so by learning associations over time. There are multiple mechanisms by which this learning occurs, and the actual algorithms our biology uses are a topic of debate and present research, but a basic form called Hebbian Learning occurs when ‘neurons that fire together, wire together’.

Finding structure in a hierarchically lower region. In this simple example, the highlighted cell takes input from three neurons in the region below, and therefore activates when a pattern involving those 3 cells is detected. Following diagrams represent cells and regions in 2D, as shown on the right.

In the above diagram, if neurons a, b and c consistently fire together over time, their connections to d (in region 2) will strengthen, and as a result d will learn to represent or summarize this feature of the activity in region 1. Because each of the neurons in region 2 learns to represent similar coincidences (or more complex patterns) representing all the patterns seen in R1, R2 forms a representation of its inputs using fewer neurons. In the data science world this is called dimensionality reduction, and it means that the region has efficiently encoded the information coming in.

Feed forward pattern detection. The cell in the top region is setup to be activated by brightness in the top of the image, and inhibited by brightness at the bottom. Result: ultra simplified detection of images that are bright at top and dark on bottom. (Key — green: active excitatory neuron, red: active inhibitory neuron, gray/white: inactive, rings: spatial source or ‘receptive field’ of each cell, each ring passes the brightness from that section of the image to the cell above it)

The illustration above demonstrates the way these feed-forward connections can allow a higher region to ‘summarize’ the raw activity below in a more abstract representation. As a direct result of this abstraction, these representations change less frequently over time, and as such are called invariant representations. In the words of Hawkins:

“Each region of the cortex learns sequences, develops what I call ‘names’ for the sequences it knows, and passes these names to the next region higher in the cortical hierarchy … This ‘name’ is a group of cells whose collective firing represents the set of objects in the sequence … By collapsing predictable sequences into ‘named objects’ at each region in our hierarchy, we achieve more and more stability the higher we go. This creates invariant representations.”

— Hawkins, On Intelligence

To illustrate this, imagine we are scanning a scene, and our eyes saccade (rapidly move from one point to another) between small pieces of the scene. Imagine our scene contains a dog and a tree, each with a number of its own features.

Feed-forward invariant representation.

At left, we have cells that have learned to represent ‘dog’ sensory inputs (underlined with gold), and those that have learned the features of ‘tree’ (underlined in green). The top region in the hierarchy stays active while any of its learned inputs are active. Here the input object (the thing in our view) is alternating between tree and dog, and during each, encodings (bottom layer) of its features are randomly iterated (e.g. as the observer scans the object).

Just as neuroscientists identified a ‘Bill Clinton’ neuron in the minds of study participants, the top right neuron in this illustration activates when our network is exposed to a dog.

B) Modeling & Prediction

Detecting patterns and structure in the region below is one important step in model-building, but to build a model based solely on present sensory input would be to ignore all the cues of present context, and the history that led to the present moment. That’s why actual brain regions are deeply recurrent — they take inputs not only from below, but also receive extensive feedback from regions above, as well as lateral inputs from other cells in the same region.

Our brains are well adapted for perpetually noisy and incomplete information, and as a result, we aggressively extrapolate and interpolate hints from our senses into complete pictures (models) of our environment. Going back to a previous example, if a higher region has built a model of its environment that includes ‘dog’, it likely does two things:

It biases other cells in the same region (and thus at a similar level of abstraction) via historical associations that have learned probabilistic links to ‘dog’. Perhaps in 30% of recent situations where we perceived dog, we were also in a park. These biases might prime certain networks of cells in the region (learned patterns such as ‘frisbee’, ‘grass’, ‘picnic blanket’) to be more likely to activate.
This group of associated patterns in a region passes down contextual feedback, in the form of predictions or expectations, to lower regions. If this brain region could talk, it may say: “I’m perceiving a dog, so lower auditory region, you may experience a bark, and lower visual region, you may see a tail or a collar”.

From Clark:

“…naturally intelligent systems do not passively await sensory stimulation. Instead, they are constantly active, trying to predict (and actively elicit…) the streams of sensory stimulation before they arrive. Before an ‘input’ arrives on the scene, these pro-active cognitive systems are already busy predicting its most probable shape.”

— Clark, Surfing Uncertainty

Information as Error

Clark offers a number of compelling arguments explaining that the information traveling up through the hierarchy may be more efficiently encoded as surprise — deviations from expectation — rather than pure positive information. Various forms of this prediction error mechanism have gained traction, and notably, the 2017 brain prize went to three researchers focused on, among other things, demonstrating the role of dopamine in communicating prediction error.

Conveniently, this paradigm matches intuition on how one might design an instrument like the brain to efficiently encode a changing world. Since the state of the world tends to change smoothly (and rarely abruptly), much of our perception at any given moment in time is similar to what we experienced (saw, heard, felt) at the previous moment. A favorite illustration for this is that the efficiency derived from this approach is heavily leveraged in video streaming software. Imagine the data required for your favorite streaming service to show a few frames of video of a hawk flying through a blue sky. Rather than transmit the color of each pixel of your screen (mostly blue, still mostly blue, still mostly blue), many times per second, it transmits only the changes from the previous frame (perhaps a handful of pixels become brown and black in front of the hawk, and blue behind).

As we’ve seen, unlike video services, which know very little about the meaning of what they’re displaying, each region of our brain leverages its model of the dynamics of the world to predict the incoming stream of information. As we watch a hawk cross the sky, our model tells us that this is an object in front of a distant background moving to the right. Each level of hierarchy already has its predictions for exactly what it’s going to ‘see’ next, and thus, in order to stay in sync with the world, all we need to do is adjust our models based on the error between our top-down and lateral predictions and the upward flow of sensory input: that is, our surprise.

Prediction error (deviation from the expected position of the circle) is shown in red. Our simplistic model learns to predict linear motion, but is ‘surprised’ by each bounce, resulting in a spike in prediction error.

Putting it Together

So, to summarize, at any given moment in time, each region of our brain exists in a continuous loop:

Implications

When we zoom out from all of this, we can see that our brains are in essence prediction machines that strive to minimize surprise by recognizing patterns and associating them with other patterns. Since the information we receive is noisy and often extremely incomplete, we’ve adapted to aggressively fill in gaps or generalize out from a small set of perceptions. We’re not always successful, but overall we’re shockingly good at these tasks. We are constantly predicting what’s about to happen, and because our predictions are never exactly right we continuously update an ever-changing model of our ever-changing environment.

Most if not all of our cognitive biases flow naturally from this understanding of perception. Many such biases are not weaknesses or failures of the system evolution has developed for us to understand the world, but instances of maladaptation to the modern context.

Another surprising implication is an intuitive reframing of a recently popular argument that we never experience raw reality. What we think of as perception is a ‘controlled hallucination’ to borrow Clark’s words, which our brains have designed to be precisely as simple and useful to us as possible.

Events that are predicted eventually fade from our conscious experience when they’re no longer useful—we call this habituation. Again, our experience of the world is itself the downward flowing predictions and feedback generated in our model. Think about the well-known experience of preparing to touch a static-y doorknob (you ‘feel’ the feared spark well before your hand contacts metal). Our brain’s model integrates and predicts across all of our senses, which is why we can close our eyes and continue walking a surprisingly long distance without tripping (our model of the road is good enough for quite a while).

Regardless of whether the prophets are right, we are most certainly living in a simulation — one that evolved over a few hundred million years, and which we’ve been updating continuously since the day we were born.

Other resources

UTHealth’s neuroscience lectures are all online and totally free, which is incredible. I’m still working on this, as the level of detail goes quite deep, but as a reference for those interested in the biology, it’s invaluable.
Computational Neuroscience (Coursera, UW). I can recommend this course to anyone new to computational neuroscience, and not too afraid of differential equations.
HTM School, Numenta’s series on learning NuPIC, an open source implementation of Hierarchical Temporal Memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generative_predictive_models.md

generative_predictive_models.md

An illustrated overview of how our brains (might) think — the fascinating intuition of the generative predictive model

Building Blocks — Neurons and Spikes

Hierarchy

A) Finding Structure (pattern recognition)

B) Modeling & Prediction

Information as Error

Putting it Together

Implications

Other things to read

Other resources

Files

generative_predictive_models.md

Latest commit

History

generative_predictive_models.md

File metadata and controls

An illustrated overview of how our brains (might) think — the fascinating intuition of the generative predictive model

Building Blocks — Neurons and Spikes

Hierarchy

A) Finding Structure (pattern recognition)

B) Modeling & Prediction

Information as Error

Putting it Together

Implications

Other things to read

Other resources