# 15. Processing Sequences Using RNNs and CNNs

In this Chapter we will cover Recurrent Neural Networks, especially useful with time series. 

### Recurrent Neurons and Layers

A recurrent neural network looks very much like a feedforward neural network, except it also has connections pointing backward.

![RNN](images/15.RNN.png)

Let's call the weight vectors for inputs $w_x$ and the ones for outputs $w_y$. We can put all these vectors in two matrices $W_x$ and $W_y$. 

The output vector for the layer would therefore be ($b$ = bias vector; $\phi$ = activation function):

$y_{(t)}= \phi(W_x^T x_{(t)} + W_y^T y_{(t-1)} + b)$

#### Memory Cells

Since the output of a recurrent neuron at time step $t$ is a function of all the inputs from previous time steps, we can say it has some sort of memory. 

This part of the NN is called a **memory cell**. 

#### Input and Output Sequences

There are several types of input-output sequences:

* Sequence-to-sequence (e.g. for stock prices predictions)
* Sequence-to-vector = **encoder** (e.g. sentiment score)
* Vector-to-sequence = **decoder** (e.g. caption for image)

We can also combine them. A typical example is using encoders-decoders back to back for machine translation. 

### Training RNNs

The trick is to _unroll it through time_ and then use backprop. This is called **backprop through time** (BPTT).

Simply put, we have:

1. First pass through unrolled network
2. Output sequence evaluated using a cost function
3. Gradients of that cost function are then propagated backward through the unrolled network
4. Model parameters are updated using the gradients computed during BPTT

### Forecasting a Time Series

There are two classifications of time series based on variables: **univariate** and **multivariate**.  
Two more based on our goal: **forecasting** or **imputation** (missing past values).

In [1]:
import numpy as np

def generate_time_series(batch_size, n_steps):
    freq1, freq2, offsets1, offsets2 = np.random.rand(4, batch_size, 1)
    time = np.linspace(0, 1, n_steps)
    series = 0.5 * np.sin((time - offsets1) * (freq1 * 10 + 10)) # wave 1
    series += 0.2 * np.sin((time - offsets2) * (freq2 * 20 + 20)) # + wave 2
    series += 0.1 * (np.random.rand(batch_size, n_steps) - 0.5) # + noise
    return series[..., np.newaxis].astype(np.float32)

Usually time series are 3D arrays [batch size, time steps, dimensionality]. 

In [2]:
n_steps = 50
series = generate_time_series(10000, n_steps + 1)
X_train, y_train = series[:7000, :n_steps], series[:7000, -1]
X_valid, y_valid = series[7000:9000, :n_steps], series[7000:9000, -1]
X_test, y_test = series[9000:, :n_steps], series[9000:, -1]

#### Baseline Metrics

The simplest approach is to predict the last value in each series (**naive forecasting**):

In [3]:
y_pred = X_valid[:, -1]

In [6]:
from tensorflow import keras 

np.mean(keras.losses.mean_squared_error(y_valid, y_pred))

0.020701446

Another simple approach is to use a fully connected flattened network. In our example below, we will use LR:

In [9]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[50, 1]),
    keras.layers.Dense(1)
])

In [12]:
model.compile(loss="mse", optimizer="adam")

### Implementing a Simple RNN

Now let's try to beat our naive metrics! Here is the simplest possible RNN:

In [8]:
model = keras.models.Sequential([
    keras.layers.SimpleRNN(1, input_shape=[None, 1])
])

In [18]:
model.compile(loss="mse", optimizer="adam")

In [None]:
simple_RNN = model.fit(X_train, epochs=5, validation)