<a href="https://colab.research.google.com/github/rahiakela/deep-learning-with-python-francois-chollet/blob/6-deep-learning-for-text-and-sequences/3_understanding_recurrent_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding recurrent neural networks

**A major characteristic of all neural networks you’ve seen so far, such as densely connected networks and convnets, is that they have no memory. Each input shown to them is processed independently, with no state kept in between inputs.**

**With such networks, in order to process a sequence or a temporal series of data points, you have to show the entire sequence to the network at once: turn it into a single data point.**

For instance, this is what you did in the IMDB example: an entire movie review was transformed into a single large vector(flatten) and processed in one go. Such networks are called **feedforward networks**.

In contrast, as you’re reading the present sentence, you’re processing it word by word—or rather, eye saccade by eye saccade—while keeping memories of what came before; this gives you a fluid representation of the meaning conveyed by this sentence.

**Biological intelligence processes information incrementally while maintaining an internal model of what it’s processing, built from past information and constantly updated as new information comes in**.

A **recurrent neural network (RNN)** adopts the same principle, albeit in an extremely simplified version: **it processes sequences by iterating through the sequence elements and maintaining a state containing information relative to what it has seen so far. In effect, an RNN is a type of neural network that has an internal loop.**

The state of the RNN is reset between processing two different,
independent sequences (such as two different IMDB reviews), so you still consider one sequence a single data point: **a single input to the network. What changes is that this data point is no longer processed in a single step; rather, the network internally loops over sequence elements.**

<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/recurrent-neural-network.png?raw=1' width='800'/>

## Setup

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, LSTM

from tensorflow.keras.datasets import imdb

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np
import pandas as pd

import string
import os

import matplotlib.pyplot as plt
%matplotlib inline

## Numpy implementation of a simple RNN

To make these notions of loop and state clear, let’s implement the forward pass of a toy RNN in Numpy. This RNN takes as input a sequence of vectors, which you’ll encode as a 2D tensor of size (timesteps, input_features). It loops over timesteps, and at each timestep, it considers its current state at t and the input at t (of shape (input_features,), and combines them to obtain the output at t. You’ll then set the state for the next step to be this previous output. 

For the first timestep, the previous output isn’t defined; hence, there is no current state. So, you’ll initialize the state as an allzero vector called the initial state of the network.

In pseudocode, this is the RNN.


In [0]:
state_t = 0                       # the state at t
for input_t in input_sequence:    # iterates over sequence elements
  output_t = f(input_t, state_t)
  state_t = output_t              # the previous output becomes the state for the next iteration.

You can even flesh out the function f: the transformation of the input and state into an output will be parameterized by two matrices, W and U, and a bias vector. It’s similar to the transformation operated by a densely connected layer in a feedforward network.

In [0]:
state_t = 0                       # the state at t
for input_t in input_sequence:    # iterates over sequence elements
  output_t = activation(dot(W, input_t) + dot(U, state_t) + b)
  state_t = output_t              # the previous output becomes the state for the next iteration.

To make these notions absolutely unambiguous, let’s write a naive Numpy implementation of the forward pass of the simple RNN.

In [0]:
timesteps = 100               # number of timesteps in the input sequence
input_features = 32           # dimensionality of the input feature space
output_features = 64          # dimensionality of the output feature space

# input data: random noise for the sake of the example
inputs = np.random.random((timesteps, input_features))

# initial state: an all-zero vector
state_t = np.zeros((output_features, ))

# creates random weight matrices
W = np.random.random((output_features, input_features))
U = np.random.random((output_features, output_features))
b = np.random.random((output_features, ))

successive_outputs = []
# input_t is a vector of shape (input_features,).
for input_t in inputs:
  # combines the input with the current state (the previous output) to obtain the current output
  output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
  successive_outputs.append(output_t)

  # updates the state of the network for the next timestep
  state_t = output_t

# the final output is a 2D tensor of shape (timesteps, output_features).
final_output_sequence = np.concatenate(successive_outputs, axis=0)

In [3]:
final_output_sequence.shape

(6400,)

In [4]:
final_output_sequence[:10]

array([0.99999926, 0.99999999, 0.99999763, 0.99999973, 0.99999797,
       0.99999984, 0.9999999 , 0.99999997, 0.9999993 , 0.99999992])

Easy enough: in summary, **an RNN is a for loop that reuses quantities computed during the previous iteration of the loop, nothing more.** 

Of course, there are many different RNNs fitting this definition that you could build—this example is one of the simplest RNN formulations. RNNs are characterized by their step function, such as the following function in this case.

```python
output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)
```

<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/simple-RNN.png?raw=1' width='800'/>

In this example, the final output is a 2D tensor of shape (timesteps,
output_features), where each timestep is the output of the loop at time t.
Each timestep t in the output tensor contains information about timesteps 0
to t in the input sequence—about the entire past. 

For this reason, in many cases, you don’t need this full sequence of outputs; you just need the last output (output_t at the end of the loop), because it already contains information about the entire sequence.

## A recurrent layer in Keras