# Recurrent Neural Networks

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/inefficient-networks/blob/master/docs/notebooks/tensorflow/05-tensorflow-cnn.ipynb)
[![Stars](https://img.shields.io/github/stars/particle1331/inefficient-networks?style=social)](https://github.com/particle1331/inefficient-networks)

---

## Introduction

Recall that the idea for using convolutions is that close pixels are related. But pixels far from each other have none. For sequences, this is not necessarily true. There can be long term dependencies between words in a at the beginning and end of a paragraph, for example. For modelling sequences, we will look at **recurrent connections** (i.e. cyclic dependencies) and its extensions. As with convolutional layers, recurrent units also use weight sharing but over time instead of over space.

The key idea for sequence modelling is that while sequences $\langle \boldsymbol{\mathsf{x}}_1, \boldsymbol{\mathsf{x}}_2, \ldots, \boldsymbol{\mathsf{x}}_T \rangle$ have arbitrary length $T \in \mathbb{N},$ each time step can be modelled with a state vector $\boldsymbol{\mathsf{x}}_t \in \mathbb{R}^d$ with fixed number of entries. For example, when modelling temperatures for each day, we can use a sequence of maximum, minimum, and average temperatures getting a 3-dimensional vector to represent the state of a day. For this to work well, each step in a sequence must be semantically equivalent, and that order matters. Recurrent connections are able to capture this information in the data by using a memory vector $\boldsymbol{\mathsf{h}}_t.$

In [68]:
import warnings
from pathlib import Path

import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib_inline import backend_inline

DATASET_DIR = Path("./data").absolute()
RANDOM_SEED = 42

warnings.simplefilter(action="ignore")
backend_inline.set_matplotlib_formats('svg')

## Recurrent connections

Recurrent connections compute a hidden state vector $\boldsymbol{\mathsf{h}}_t$ which evolves based on the new system state $\boldsymbol{\mathsf{x}}_t$ and the existing hidden state $\boldsymbol{\mathsf{h}}_{t-1}.$ This has weights both for blending the past history to the current state of the system:

$$
\begin{aligned}
\boldsymbol{\mathsf{h}}_t 
&= \textsf{A}(\boldsymbol{\mathsf{h}}_{t-1}, \boldsymbol{\mathsf{x}}_{t}) \\
&= \tanh\left( \boldsymbol{\mathsf{h}}_{t-1} \boldsymbol{\mathsf{W}}_{\mathsf{h}} +  {\boldsymbol{\mathsf{x}}}_t\, \boldsymbol{\mathsf{W}}_{\mathsf{x}}  + \boldsymbol{\mathsf{b}}\right)
\end{aligned}
$$

such that $\boldsymbol{\mathsf{h}}_0 = \boldsymbol 0.$ Note that this is able to process an entire sequence regardless of its length. The choice of nonlinearity means that components of the hidden state saturate in the range $[-1, 1].$ 

Unrolling recurrent connections makes it look more familiar:

```{margin}
Source:<br>
[`d2l.ai/ch9`](https://www.d2l.ai/chapter_recurrent-neural-networks/index.html)
```
```{figure} ../../img/unfolded-rnn.svg
---
width: 80%
name: unfolded-rnn
---
```

**Example.** Input order matters:

In [73]:
T = 3
x  = np.random.random((T, 3))
Wx = np.random.random((3, 2))
Wh = np.random.random((2, 2))


def run(order=[0, 1, 2]):
    h  = np.zeros((1, 2))

    print('\nh=')
    for i in order:
        h = np.tanh(h @ Wh + x[[i]] @ Wx)
        print(h)


run([0, 1, 2])
run([1, 0, 2])


h=
[[0.7637734  0.74654836]]
[[0.77538518 0.8957538 ]]
[[0.79465479 0.91582463]]

h=
[[0.62483806 0.66089363]]
[[0.85413636 0.91019435]]
[[0.79689722 0.92114369]]


## Backpropagation Through Time (BPTT)

## PyTorch implementation

In this section, we will look at how to use RNN units in PyTorch to predict the likely country of origin of a name. The code for this section is based on [this notebook](https://github.com/EdwardRaff/Inside-Deep-Learning/blob/main/Chapter_4.ipynb). For this task we will classify a name's source language by passing the characters of a name as sequence that is fed into the RNN unit. Each character updates the hidden state vector. Once the complete name has been processed, we get a final state vector, which is passed to a classification subnetwork for prediction.

```{margin}
Source:<br>

```
```{figure} ../../img/rnn-names.png
---
width: 500px
name: rnn-names
---

Classifying the source language for the name Frank. The characters of the name is sequentially passed to an RNN unit resulting in a final hidden state $\boldsymbol{\mathsf{h}}_5$ that is passed to  a linear layer.
```

In [87]:
import torch
from torch import nn
from torch.nn import functional as F

device = torch.device('mps')

Downloading the dataset:

In [89]:
import requests, zipfile, io

file_url = "https://download.pytorch.org/tutorial/data.zip"
zip = zipfile.ZipFile(io.BytesIO(requests.get(file_url).content))
zip.extractall()

Removing unicode (e.g. Ślusàrski to Slusarski):

In [95]:
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [120]:
import unicodedata
import string


# https://stackoverflow.com/a/518232/2809427
def unicode_to_ascii(s):
    alphabet = {}
    for i, a in enumerate(string.ascii_letters + " .,;'"):
        alphabet[a] = i

    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn' and c in alphabet
    )


# Loop through every language:
#   1. Open the zip file entry
#   2. Get target language based on filename.
#   3. Read text file, save all names to ascii to target language.
data = {}
for path in (p for p in zip.namelist() if "names" in p and p.endswith(".txt")):
    lang = path.split('/')[-1].replace(".txt", "")
    with zip.open(path) as f:
        lang_names = [unicode_to_ascii(line).lower() for line in str(f.read(), encoding='utf-8').strip().split("\n")]
        data[lang] = lang_names
    
for lang, lang_names in data.items():
    print(f"{lang}: {len(lang_names):>10}")

Arabic:       2000
Chinese:        268
Czech:        519
Dutch:        297
English:       3668
French:        277
German:        724
Greek:        203
Irish:        232
Italian:        709
Japanese:        991
Korean:         94
Polish:        139
Portuguese:         74
Russian:       9408
Scottish:        100
Spanish:        298
Vietnamese:         73


In [99]:
DATASET_DIR / "names"

PosixPath('/Users/particle1331/code/inefficient-networks/docs/notebooks/tensorflow/data/names')

In [None]:
list(p for p in zip.namelist() if "names" in p and p.endswith(".txt"))[0]

In [96]:
zip.namelist()

'data/'