CLA for ML AI Researchers
Clone this wiki locally
The Cortical Learning Algorithm (CLA) was developed largely in isolation from the traditional Machine Learning and Artificial Intelligence communities. This document should help members of those communities understand CLA as well as give CLA enthusiasts a reference back into the traditional literature.
Specifically this document has three goals:
- Bridge the terminology gap between CLA and other approaches.
- Highlight novel components of CLA researchers might integrate into their work.
- Highlight ideas from conventional approaches that CLA developers might want to use to extend CLA.
This should be a short, high level description of CLA using mainline terminology. This description should be accurate, but is intended to provide a context for ML/AI researchers who want to dive into the specifics of CLA.
:::::::::::::::::::::::::::::::::::END - META:::::::::::::::::::::::::::
The Cortical Learning Algorithm is a model of aspects of the function of the neocortex. A Hierarchical Temporal Memory system (such as NuPIC) comprises:
- Sensory components, which receive data from outside the system, whether from CSV files, databases, network sources, devices, APIs etc,
- Encoders, which convert sensory data from floating point, integer, date, string values into large, relatively sparse binary representations similar to Sparse Distributed Representations,
- A hierarchy of Regions, each of which learns to identify spatial patterns from its inputs, and temporal patterns of those spatial patterns, and outputs representations of the spatial pattern (as an SDR of active columns), as well as a representation of predictions it is making of future inputs.
- A Classifier which is used to extract information of value from the system's outputs. These may include a standard "classification" of the current input, a set of predictions for values of the input in future steps, and an indication of the "anomaly" or surprise level of the current input.
Each Region is an instance of the CLA. The sequence of processing (Spatial and Temporal Pooling) is as follows:
Spatial Pooling (SP)
- The input is presented to the region as an SDR or bit-array.
- Each Column (a stack of Cells sharing a feedforward input) in the region is connected to a subset of the input bits (its fan-in) via a Proximal Dendrite.
- Each Synapse in the Proximal Dendrite has a Permanence level, which, if over a threshold, means that the Synapse is Connected
- The Column adds up the number of Connected inputs which are also 1's, which is called its Activation Potential
- A local or global competitive process (Inhibition) chooses the Columns with the highest Activation Potential and deems them Active.
- A reinforcement process strengthens the connections between Active columns and 1-bits in the input by raising the Permanence of successful Synapses.
- The pattern of active Columns is a learned SDR of the input, and is called the output of the SP.
Temporal Prediction or TP (previously known as Temporal Pooling)
- The SP output (collection of active Columns) is presented to the TP component.
- If a Column contains any predictive Cells (see below), the Cell with the highest Predictive Potential is chosen and becomes Active. If not, see 8.
- Each Cell has Distal Dendrites which connect laterally in the Region. Active cells transmit their signals (as if on "axons") to many dendrites of neighbouring cells.
- Each Dendrite is a coincidence detector, becoming Active if the number of Connected 1-bits exceeds its threshold.
- A Cell sums its number of active Dendrites to produce its Predictive Potential.
- Now there is a pattern of active Columns (from the SP) and a pattern of prediction.
- If a Cell was chosen in 2. because it is in an active column and it had a high Predictive Potential, it reinforces the synapses from previously active cells which caused its successful prediction.
- If no Cell in a newly active Column is sufficiently predictive, the TP is "confused" and makes all the Cells active in a process called Bursting. This is usually the result of a new sequence or may signal an anomaly in the data.
- The pattern of prediction is used in the next timestep to choose active cells.
The next Region up (or the Classifier) uses these patterns as input.
Characterisation of Learning in CLA
Learning in CLA is superficially similar to Hebbian learning in classical Neural Networks, in that there is a scalar value called Permanence on the connection (Synapse) between an input and the Cell. In a NN the number on the connection is called a weight and the resulting input to the neuron is a scalar product of input value and weight. In CLA, the input is a binary value, and the Synapse acts as a gate which passes through the input as is, if and only if the Permanence exceeds a threshold.
Another vital distinction is that the "sum" of the gated inputs is not directly transformed into the "output" of the Cell as happens in NN neurons. Instead, all the columns store this value as a Potential and this used in a competitive process to choose the top x% (usually 2%) from many thousands of columns in the region. The losing columns are inhibited and their output is clamped to zero.
The permanences are updated using a kind of Hebbian or Bayesian learning rule, in that successful recognitions by columns lead to strengthening of synapses for 1-bit inputs. SP columns could thus be regarded as "feature recognisers".
One distinction which needs to be noted is that a "feature" or combination of features is recognised not by individual cells or columns (as in Boltzmann machines) but by a set of columns which together represent the input as an SDR. The exact population of an SDR may vary slightly if the input does, but only a few columns will drop out or reappear if you change the input by a small amount. Due to sparseness, the SDRs are very robust to noise in the inputs.
It is possible to view the Spatial Pooler as a special variety of Self-Organising Map, one in which very high-dimensional binary vectors are mapped to very high-dimensional binary vectors using a competitive reinforcement learning algorithm to control an extremely large binary gate array.
The TP uses an even larger version of the same process to learn transitions between SDRs. In a typical NuPIC Region, there are 2048 columns (of which 2%, or 40 are active at any time), each contains 32 cells, and there may be 2-300MM lateral predictive connections.
TODO - LAST