#  1.  Recurrent Neural Networks - introduction

With supervised learning - regression and classification - we aim to learn the pattern between a set of fixed size input and ouptput data points.  For example, with object detection (a classification problem) we can use a classification algorithm to learn a model that distinguishes between image patches containing an object of interest (e.g., human face) and all those *not* containing this object.  This classification problem is shown figuratively in the image below (used with permission from [[1]](#bib_cell).

<img src="images/wright_bros_face_detect.png" width=600 height=600/>

## 1.1  Problem 1 - fixed size input / output

Note one key aspect of this sort of problem we can solve with supervised learning: *the dimension of the inputs / outputs must always be the same length.*  With object detection the inputs - image patches - are all the same size e.g., a 64 x 64 grid of pixels and the outputs - labels - are integer valued scalars.  An example of a small image patch showing the value of each pixel (taken from [[2]](#bib_cell)) is shown below.

<img src="images/sample_grid_a_square.png" width=400 height=400/>

But not all pattern recognition problems satisfy this condition.  For example a program that translates sentences from English to Spanish (or from any one language to another language) does not (this task is callled *Machine Translation*).  For example, say we wanted to train a model to automatically translate English to Spanish Sentences using the following set of two sentences

| English translation    | Spanish translation          |
| ------------- |:-------------:| 
| I don't like cats.     | Los gatos me cae mal. | 
| My dog is an angel from heaven.    | Mi perro es un ángel del cielo.  | 

Measuring some basic statistics for these input / output datapoints we can see that their dimensions are not consistent.


- **first sentence:** english input - 4 words, 18 characters, spanish output - 5 words, 21 characters



- **second sentence:** english input - 7 words, 31 characters, spanish output - 7 words 32 characters




Because of these inconsistencies, if we want to learn a relationship from data (many many English / Spanish sentence pairs) we need a more flexible machine learning model.  

## Problem 2 - complicated relationships 

A more subtle issue with pattern recognition tasks like e.g., machine translation is that the input / output data have a more complicated relationship than that of a typical supervised learning problem.  Take our first training sentence

I don't like cats. --> Los gatos me cae mal.

If we look at this datapoint on a word-by-word level, then it is **not** the case that each word in the two sentences translates directly.  e.g., "I" does not translate correctly to "Los", "don't" is not correctly translated as "gatos", etc.,  Moreover "cats" is near the back of the English sentence and near the front ("los gatos") of the Spanish translation.  So while on the whole these two sentences mean the same, it is not the case that each word can be correctly translated in sequence.

## 1.3  Enter Recurrent Neural Networks

Recurrent Neural Networks (RNNs) offer a flexible input / output pattern recognition system for dealing with data that has inconsistent input/output dimensions, and where the relationship between these inputs and outputs are rather complex.

In the remainder of this notebook we give a high level overview of the basic modeling of RNNs, as well as the basic method of parameter tuning.

# 2.  RNN basic modeling

Here we lay out the basic terminology and math for creating a basic RNN.

## Sequence notation

Lets use some notation to denote sequences of input / output.  We can denote one input sequence of data as 

$\mathbf{x}^{\left(1\right)},\,\mathbf{x}^{\left(2\right)},...,\mathbf{x}^{\left(S\right)}$

Here each vector $\mathbf{x}^{\left(s\right)}$
 can have arbitrary length.  Likewise each corresponding output sequence is denoted as 

$\mathbf{y}^{\left(1\right)},\,\mathbf{y}^{\left(2\right)},...,\mathbf{y}^{\left(T\right)}$

and each output vector $\mathbf{y}^{\left(t\right)}$ also has arbitrary length.  Notice here that each vector can be of arbitrary length, as can both $S$ and $T$.

------

For example, take our first English / Spanish sentence - the input / output vectors are

$\mathbf{x}^{\left(1\right)}=\text{I},\,\,\,\mathbf{x}^{\left(2\right)}=\text{don't},\,\,\,\mathbf{x}^{\left(3\right)}=\text{like},\,\,\,\mathbf{x}^{\left(4\right)}=\text{cats}$

and 

$\mathbf{y}^{\left(1\right)}=\text{Los},\,\,\,\mathbf{y}^{\left(2\right)}=\text{gatos},\,\,\,\mathbf{y}^{\left(3\right)}=\text{me},\,\,\,\mathbf{y}^{\left(4\right)}=\text{cae},\,\,\,\mathbf{y}^{\left(5\right)}=\text{mal}$

------

Likewise, our second sentence input / output vectors are

$\mathbf{x}^{\left(1\right)}=\text{My},\,\,\mathbf{x}^{\left(2\right)}=\text{dog},...,\,\mathbf{x}^{\left(7\right)}=\text{heaven}$

and

$\mathbf{y}^{\left(1\right)}=\text{Mi},\,\,\mathbf{y}^{\left(2\right)}=\text{perro},...,\,\mathbf{x}^{\left(7\right)}=\text{cielo}$


In [1]:
s = 0

<a id='bib_cell'></a>

## Bibliography

[1] Watt, Jeremy et al. [Machine Learning Refined](www.mlrefined.com). Cambridge University Press, 2016

[2] Image taken from http://pippin.gimp.org/image_processing/chap_dir.html