Matthew Taylor edited this page Jan 5, 2017 · 22 revisions
Clone this wiki locally

Encoders turn different data types into Sparse Distributed Representations. They take external inputs and convert them into a binary representation understood by the CLA, similar to how the retina or cochlea take external signals and convert them into binary neural representations.

Video lecture: See this part of the CLA Basics talk to understand how encoders work for scalars.

Scalar Encoding

Datetime Encoding

Creating Encoders

When creating an encoder, the main focus is to make sure the semantics of the particular data type are captured. Specifically, this means that similar values should have a lot of overlapping 1's in their encodings while values that are not similar should have very few or none.

Current Encoders (see code)

  • Scalar
  • Adaptive Scalar
  • Category
  • Date
  • Coordinate
  • Geospatial Coordinate
  • Delta (derived encoder for scalars)
  • Log
  • Multi - Takes multiple values and creates a combined SDR from multiple other encoders.
  • Non-uniform scalar
  • PassThru (Identity)

Scalar Encoders

Our scalar encoders use a few parameters to determine the encoding for a given value. There is a minval and a maxval that determines the absolute value range. Then there is a number of bits, n, and a width, w. The encoding will have n total bits with w on bits (1's). Values are put into buckets. There are (n-w)+1 buckets that each represent an equally-sized value range between minval and maxval. The smallest bucket is represented with the first w bits on and the rest off. The next larger bucket is represented by shifting the on bits to the right by one position. In this way, adjacent buckets have the most overlap which helps to capture the semantics of scalar values.

Example: A scalar encoder with a range from 0 to 100 with n=12 and w=3 will produce the following encodings:

  • 1   becomes 111000000000
  • 7   becomes 111000000000
  • 15 becomes 011100000000
  • 36 becomes 000111000000

The first thing to note is that values that fall into the same bucket are represented identically as you can see with 1 and 7. For values that fall into separate buckets, however, the closest buckets share the most overlapping bits. For instance, here are two overlapping bits for 7 and 15 but only one for 15 and 36. And there aren't any for 7 and 36.

Adaptive Scalar Encoder

This encoder is identical to the scalar encoder except that it will increase maxval if it sees a larger value and it will decrease minval if it sees a smaller value. In our implementation we did not update the spatial pooler connectedness to the input bits so the spatial patterns that had been learned would become out of date as the min and max values changed. As such, we recommend using the regular scalar encoder with a fixed min/max range.

Coordinate Encoder and Geospatial Coordinate Encoder

The Geospatial Coordinate Encoder (GCE) converts a GPS position to an SDR. It has the following desired properties:

  1. Positions spatially close together have overlapping bits in the encoding.
  2. When moving at low speeds, resolution of movement is more fine, and when moving at high speeds, resolution of movement is more coarse. Thus, when moving at higher speeds, bigger movements still retain overlapping bits.
  3. It works anywhere in the world; in fact, it works for an infinitely large space.

The Coordinate Encoder (CE) is a generalization of this. In fact, the GCE is implemented as a subclass of the CE.

While the GCE takes (latitude, longitude, speed), the CE takes (coordinates, radius). coordinates determines the position to be encoded, and can be in any number of dimensions.

See this video and the associated slides for more details on these encoders and how they work.

Identity / Pass through Encoder

This encoder takes an SDR input and outputs the SDR without changes. It is useful for when your preprocessing creates an SDR or when you are experimenting with a new encoder scheme.