# Recurrent Neural Networks - Basics

A Recurrent Neural Network is a type of neural network able to handle sequential data. But what does it mean for data to be "sequential"?

# Introducing sequential data

Sequence data, also known as **sequences**, have a unique characteristic compared to other types of data: *Order matters*. Sequence data, as the name suggests, appear in a certain order and individual data points are **not** independent of each other.

This is unlike what I have seen so far. In what I am used to, training examples are independent from one another. I learned that this type of data is called: **independent and identically distributed(IID)** data. Due to the independence of the data, it does not matter how training samples are fed into the neural network. 

When dealing with sequences however, this is no longer true.

# Sequential data versus time series data 📃📈

Before moving forward and discuss how to represent sequences, I would like to talk about what time series data is, and how different it is from sequential data.

**Time Series data** is *a special type* of sequential data. Each data point is associated with time. In other words, samples are captured at **successive timestamps**. Stock prices and voice records for instance, are time series data.

On the other hand, **not all sequential data is time series data**. Text data, or DNA sequences for instance, are sequential data because the order is important, but they do *not* qualify as time series data.

RNNs can be used to handle both sequential data and time series data.

# How is sequential data represented?

Moving forward, sequences will represent as such:

$$
<x^{(1)}, x^{(2)}, \dots , x^{(T)} >
$$

The superscript indices indicates *the order of the instances, and the length of the sequence $T$*. In case the sequence represents time series data, the superscript represents a particular time. As such, $x^{(t)}$ represents an example point that belongs to a particular time. 

Since RNNs are used to model sequences, they are able to "remember" past information in order to produce new information, unlike MLPs, and CNNs which do not incorporate ordering information since training examples are independent of each other. In that regards, RNNs can be said to have "memory".

# The different categories of sequence modeling

As previously mentioned, RNNs are used to model sequences. In this section, I would like to describe the different modeling tasks based on the explaination of Andrej Karpathy's article [The unreasonable effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).

![Type of modeling tasks](./images/img-1.png)

*Each rectangle is a vector and arrows represent functions(e.g. matrix multiply). Input vectors are in red, output vectors are in blue, and green vectors hold the RNN's state(more on this soon).*

- **One-to-one**: From fixed-sized input to fixed-sized output. Image classification, is an example of such task. 

- **One-to-many**: The input is in standard format, and outputs a sequence. Image captioning takes an image and outputs a sequence of words (i.e. a sentence). As such this modeling task falls into this category.

- **Many-to-one**: The input data is a sequence this time, and the output is a fixed-sized vector or a scalar. Sentiment analysis falls into this modeling task. In sentimental analysis a given sequence of words, is classified as expressing "positive" or "negative" sentiment.

- **Many-to-many**: In here, both the input and output arrays are sequences. This category can further be seperated in two categories.
    
    - Delayed: Machine translation falls into this category. An RNN reads a sentence in English and then outputs its translation in french.

    - Synchronized: Video classification falls into this category. An RNN reads successive frame of a video and then labels each of them.

# Understanding the flow of data in RNNs

In this section of the notebook, I would like talk about data flows through a Recurrent Neural Network. Let's consider the flow in a standard feedforward Neural Network and in an RNN side by side for comparison:

![data flow in a NN and in an RNN](./images/img-2.jpg)

Both networks have only hidden layer.

In classic feedforward networks (MLP, CNN) like we have seen so far, data flows from the input layer, to the hidden layer... then from the hidden layer to the output layer.

In an RNN on the other hand, the hidden layer receives its input from the **input layer of the current time step** and the **the hidden layer from the previous time step**. In other words, The $n^{th}$ output vector is obtained from the $n^{th}$ input vector, and the output of the $(n-1)^{th}$ hidden layer, thus giving the network a "memory" of past inputs, when computing an output.

The flow of information is displayed as **loop**, also known as a **recurrent edge** in graph notation, which is how the general RNN architecture gots its name. To make it easy to reason about, RNNs are represented **unfolded** as shown below:

![RNN Unfolded](./images/img-3.jpg)

Like MLPs, RNN can also have multiple layers. The previous illustration showcases an RNN with only one hidden layer. The following RNN has 2 hidden layer. Unfolded, a 2-layer RNN looks like this:

![Two layer RNN Unfolded](./images/img-4.jpg)