# Recurrent Neural Networks

Recall that in feedforward networks:

$h^{(t)} = f(x^{(t)}; \theta)$

Where x(t) is basically the dot product of weights and inputs plus the bias. The difference in recurrent networks is that the hidden layer at data point t is:

$h^{(t)} = f(h^{(t-1)}, x^{(t)}; \theta)$

The current hidden layer is a function of the previous hidden state and the current input. Theta are just the function parameters. The networking learns to use h(t) as some kind of a summary of losses from the past inputs up to point t.

Function will take:
* A list of input chars
* A list of target chars
* The previous hidden state

And will output:
* The loss
* The gradient for each parameters between layers
* The last hidden state

### Forward Pass

$h_{t}  = \phi(Wx_{t} + Uh_{t-1})$

Defines the forward pass algorithm in pseudo code: <br>
hs = (inputs * Wxh) + (prev_hidden * Whh) + bh <br>
ys = (hs * Why) + by <br>
ps = normalized(ys) <br>
Note that ys gives us the un-normalized log probabilities <br>
Normalized using softmax function: <br>
$p_{k}  = \dfrac{e^{f_{k}}}{\Sigma_{j}e^{f_{i}}}$ <br>
$L_{i} = -log(p_{y_{i}})$

### Backward Pass

$\dfrac{\delta J_{t}}{\delta W_{hy}} = \Sigma \dfrac{\delta J}{\delta z_{t}} * \dfrac{\delta z_{t}}{\delta W_{hy}}$

$\dfrac{\delta J_{t}}{\delta W_{hh}} = \Sigma \dfrac{\delta J}{\delta h_{t}} * \dfrac{\delta h_{t}}{\delta W_{hh}} $

$\dfrac{\delta J_{t}}{\delta W_{xh}} = \Sigma \dfrac{\delta J}{\delta h_{t}} * \dfrac{\delta h_{t}}{\delta W_{xh}} $

In [1]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
import keras

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
print(x_train.shape)
print(x_train[0].shape)
x_train = x_train / 255.0
x_test = x_test / 255.0

(60000, 28, 28)
(28, 28)


In [4]:
model = Sequential()
model.add(LSTM(128, 
               input_shape = (x_train.shape[1:]), 
               activation = 'relu',
               return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(128, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation = 'softmax'))

In [5]:
opt = keras.optimizers.Adam(lr = 1e-3, decay = 1e-5)

In [6]:
model.compile(loss = 'sparse_categorical_crossentropy',
             optimizer = opt,
             metrics = ['accuracy'])

In [7]:
model.fit(x_train, 
          y_train, 
          epochs = 3, 
          validation_data = (x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0xb4855ee80>

# RNN Model 2