# Recurrent Neural Nets

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/ok-transformer/blob/master/docs/notebooks/tensorflow/05-tensorflow-cnn.ipynb)
[![Stars](https://img.shields.io/github/stars/particle1331/ok-transformer?style=social)](https://github.com/particle1331/ok-transformer)

---

## Introduction

Recall that for MLPs and CNNs we pass inputs $\boldsymbol{\mathsf x} \in \mathbb R^d$ that is transformed sequentially by a stack of layers. These networks are designed to capture heirarchical features in input data. Observe that data for tasks that use these networks have fixed length. However, many tasks have data that cannot be represented as fixed length vectors, i.e. inputs consist of variable-length sequential data, such as video or natural language. A key insight is that while these data have variable length, we can model the input as having an evolving state which is a fixed-length vector $(\boldsymbol{\mathsf{\mathsf x}}_1, \ldots, \boldsymbol{\mathsf{\mathsf x}}_T)$ such that $\boldsymbol{\mathsf x}_i \in \mathbb R^d.$ For example, videos can be modeled as sequences of images of fixed shape.

In this notebook, we will look at **recurrent connections** which uses a hidden state $\boldsymbol{\mathsf{h}}_{t-1}$ to captures information up to time step $t-1.$ It turns out that this framework allows inputs to be processed in a feed-forward manner with layer weights shared across time steps. Moreover, we can backpropagate through the recurrent units by tracking dependencies of the hidden state to all previous time steps. This introduces difficulties in training, motivating modern recurrent architectures that we will discuss in the later notebooks.

In [9]:
import torch
import torch.functional as F
import torch.nn as nn

import math
import warnings
import matplotlib
import matplotlib.pyplot as plt
from pathlib import Path
from matplotlib_inline import backend_inline

DATASET_DIR = Path("../input").absolute()
RANDOM_SEED = 42
GENERATOR = torch.Generator().manual_seed(RANDOM_SEED)

warnings.simplefilter(action="once")
backend_inline.set_matplotlib_formats('svg')
matplotlib.rcParams["image.interpolation"] = "none"

print(torch.__version__)

1.13.0


## Character-based language model

In [49]:
class RNNLMScratch(nn.Module):
    def __init__(self, num_inputs, num_hidden, sigma=0.01):
        super().__init__()
        self.num_inputs = num_inputs
        self.num_hidden = num_hidden
        self.sigma = sigma

        self.Wx = nn.Parameter(torch.randn(num_inputs, num_hidden) * sigma)
        self.Wh = nn.Parameter(torch.randn(num_hidden, num_hidden) * sigma)
        self.b  = nn.Parameter(torch.zeros(num_hidden))

    def forward(self, x, h=None):
        h = torch.zeros(self.num_hidden) if h is None else h
        return torch.tanh(x @ self.Wx + h @ self.Wh + self.b)   