### MNIST using really basic hyperdimensional computing

The premise is that in a high-dimensional space, two vectors drawn at random are (nearly) orthogonal with high probability.

First load the data:

To turn an MNIST digit into a hypervector, I create a random projection and map each digit into the high-dimensional space.

In [17]:
import numpy as np
from mnist import MNIST
from sklearn.cross_validation import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import accuracy_score

def shuffle(X, y):
    permutation = np.arange(X.shape[0])
    np.random.shuffle(permutation)
    return X[permutation], y[permutation]

def load_dataset():
    mndata = MNIST('./data/')
    X_train, labels_train = map(np.array, mndata.load_training())
    X_test, _ = map(np.array, mndata.load_testing())
    return X_train, labels_train, X_test

X_train, labels_train, _ = load_dataset()
X_train, labels_train = shuffle(X_train, labels_train)
X_train, X_test, y_train, y_test = train_test_split(X_train, labels_train, test_size=0.33)

D = 10000 # dimensions in random space
IMG_LEN = 28
NUM_SAMPLES = X_train.shape[0]

Then I create the random projection and transform the images into their hypervectors:

In [18]:
# Create a random map to the high dimensional space
print("Generating random projection...")
proj = np.random.rand(D, IMG_LEN * IMG_LEN)
def get_scene(img, proj):
    return np.dot(proj, img)

# Transform the image vectors into the hypervectors
def get_scenes(images, proj):
    return np.dot(images[:NUM_SAMPLES, :], proj.T)

print("Projecting images to higher dim space...")
X_train = get_scenes(X_train, proj)

Generating random projection...
Projecting images to higher dim space...


A digit representation is just the sum of its training vectors in this space.

In [19]:
digit_vectors = np.zeros((10, D))
for i in range(NUM_SAMPLES):
    digit_vectors[y_train[i]] += X_train[i]
digit_vectors = np.array(digit_vectors)


To classify a digit, just find the closest digit in the hypervector space.

In [20]:
def classify(images, digit_vectors):
    similarities = cosine_similarity(images, digit_vectors)
    classifications = np.argmax(similarities, axis=1)
    return classifications

In [21]:
print("Train accuracy:")
predictions = classify(X_train, digit_vectors)
acc = accuracy_score(y_train[:X_train.shape[0]], predictions)
print(acc)

print("Test accuracy:")
X_test = get_scenes(X_test, proj)
predictions = classify(X_test, digit_vectors)
acc = accuracy_score(y_test[:X_test.shape[0]], predictions)
print(acc)

Train accuracy:
0.815572139303
Test accuracy:
0.811717171717
