In [1]:
import tensorflow as tf
import numpy as np
import scipy.linalg as ln
from model.ntm_ops import *
%load_ext autoreload
%autoreload 2

In [2]:
key_vector = tf.constant(dtype=tf.float32, value=np.random.randn(2, 20))
mem = tf.constant(dtype=tf.float32, value=[[4, 5, 6]])

In [7]:
shift_conv = ln.circulant(np.arange(20)).T[
            np.arange(-(3 // 2), (3 // 2) + 1)
        ][::-1]

In [19]:
a = cosine_similarity(key_vector, mem)

<tf.Tensor 'cos_similarity_14/div:0' shape=(2, 128) dtype=float32>

In [27]:
a = np.random.randn(2, 20)
b = np.random.randn(2, 128, 20)
np.dot(b, a.T).shape

(2, 128, 2)

## Introduction
Neural Turing Machines combined the ability of Turing Machine and Neural Networks to infer simple algorithms. The controller (it's usually a LSTM) can be viewed as CPU and the external memory can be seen as RAM. 

A NTM has four components: Controller, read heads, write heads, and an external memory. 

High level overview:
1. Addressing: Addressing mechanism is used to produce the weightings of each head. There are two types of adrressing, content based and location based. At every time step, the controller outputs five elements to produce weightings of each head: key vector, key strength, interpolation gate, shift weighting, and a scalar that used to sharpen the weightings. 
2. Read: each read head has a weighting vector tells how much degree of information we read from on each memory location
3. Write: each write head has a weighting vector, an erase vector and an add vector. This is inspired by LSTM's forget gate and input gate. 

## Section 1 Hyper parameters

### 1.1 Memory matrix
Define two hyper parameters for the memory matrix: $N \times M$, where $N$ is the number of memory locations, $M$ is the vector size at each memory location

In [2]:
# N memory locations and each has M elements
N, M = 128, 20

### 1.2 Controller dimension
Define the LSTM hidden state dimension h and stacked hidden layer number a. This is the same as tradition LSTM with the hidden state and cell state.

Define the output and input dimension, in NTM, it usually is how many bits per sequence. e.g. If one of the input sequence is [0, 1, 0, 1, 0, 1], then it should be 6.

In [5]:
a, h = 1, 100
input_dim = 8

### 1.3 The range of allowed location shift
Define the range of the allowed location shift in location based addressing (Convolutional shift), s. e.g. if s = 3, then allowed location shift will be [-1, 0, 1]

In [4]:
s = 3

## Section 2 Controller (LSTM)
At every time step the controller outputs weighting of each head and hidden states(including cell states in original LSTM).. The weighting is determined by addressing mechanism:
1. Content Addressing
2. Interpolation
3. Convolutional Shift
4. Sharpening

In [9]:
with tf.variable_scope("external_memory"):
    # initialize memory with small values
    memory = tf.fill((128, 20), 1e-6, name="memory")
    # initialize read head with small values
    read_weighting = tf.constant(value=np.full(128, 1e-6), dtype=tf.float32, name="read_weighting")
    # initialize write head with small values
    write_weighting = tf.constant(value=np.full(128, 1e-6), dtype=tf.float32, name="write_weighting")

In [11]:
memory

<tf.Tensor 'external_memory_5/memory:0' shape=(128, 20) dtype=float32>