# Introduction
I am reading up on Recurrent Neural Networks (RNN) and their ability to handle sequential inout and outputs.
[This article](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) starts of with a small toy problem that I want to implement. 
It teaches a small RNN to generate text character by character.
Characters are one-hot encoded, with a vocabulary limited to `helo` and uses only a single training example: `hello`.


# Vocabulary
I would like to extend the problem to model three different words:

In [1]:
from functools import reduce

In [77]:
words = ['hello', 'bench', 'aloha']
vocab = list(set(reduce(lambda x, y: x+y, words)))

In [78]:
words

['hello', 'bench', 'aloha']

In [79]:
vocab, len(vocab)

(['l', 'o', 'b', 'a', 'h', 'c', 'n', 'e'], 8)

The vocabulary of my toy problem is twice the size, and I am training on words with some ammount of overlap.
I expect `bench` to be the easiest word to learn, and possibly `hello` and `aloha` a bit harder as they overlap more.

I keep one impartant restriction, all examples are 5 characters long.

In [65]:
import numpy as np

In [63]:
char_encoding = {}
for i, c in enumerate(vocab):
    char_encoding[c] = np.zeros(len(vocab))
    char_encoding[c][i]=1

In [76]:
char_encoding

{'a': array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.]),
 'b': array([ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]),
 'c': array([ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.]),
 'e': array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]),
 'h': array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]),
 'l': array([ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 'n': array([ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.]),
 'o': array([ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])}

In [80]:
def encode_char(char):
    assert(char in vocab), "{} not in vocabulary".format(char)
    return char_encoding[char]
def decode_char(arr):
    assert(len(arr) == len(vocab)), "Array of shape {} does not match vocab of length {}".format(arr.shape, len(vocab))
    return vocab[np.argmax(arr)]

# Training data
As input I use the first four characters of each word, and as targets I use the four last characters.

In [171]:
X = np.array([[encode_char(c) for c in word[:-1]] for word in words])
y = np.array([[encode_char(c) for c in word[1:]] for word in words])

In [172]:
y.shape

(3, 4, 8)

In [173]:
X.shape

(3, 4, 8)

# The KNN

In [178]:
from keras.layers import LSTM, Dense
from keras.models import Model, Input, Sequential

In [289]:
model = Sequential()
model.add(LSTM(units=100, input_shape=(4, len(vocab)), return_sequences=True))
model.add(Dense(units=8, activation='softmax'))

In [290]:
model.compile(optimizer='adam', loss='categorical_crossentropy')

In [291]:
model.fit(X, y, epochs=200, verbose=0)

<keras.callbacks.History at 0x2271a2aa6a0>

In [292]:
predictions = model.predict(X)

In [294]:
for word, pred in zip(words, predictions):
    print(word[:-1], "->",  "".join(list(map(decode_char, pred))))

hell -> ello
benc -> ench
aloh -> loha


Cool, 100% accuracy!

# Summary

I created a super simpel RNN to predict the next character in a sequence of 4 characters, in a very small vocabulary of 8 words. I got perfect accuracy on my three simpel trianing examples.

My RNN is very limited:
* It cannot handle variable length input
* It takes four characters of a five character word as input, so the only real contribution of the network is predicting the last character.