# Lesson 7

In [7]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.fastai.learner import *
from fastai.fastai.column_data import *
from fastai.fastai.io import *
from fastai.fastai.lm_rnn import *

## 00:00:00 - Part 1 recap

* Part 1 theme = classification and regression with DL.
  * Identify and learning best practises.
* First 4 lessons: image classification, structured data and NLP in practise.
* Last 3 lessons: understanding more detail about what is going on under the hood.

## 00:01:01 - Part 2 preview

* Move from classification focus to generative models:
  * Chat responses.
  * Images.
  * Text.
* Move from best practises to speculative stuff:
  * Recent papers that haven't been fully tested.
* Learn how to read papers.

## 00:02:51 - RNNs (recap of Lesson 6)

* RNNs are just standard fully-connected networks.
* Recap of lesson from last week (see Lesson 6 notebook).

### 00:06:20 - Multi-output model

* Split into non-overlapping pieces, the use a piece to predict the next chars offset by 1.
* Problem with RNN model created earlier: each time we start a new sequence, we have to learn the hidden state from scratch:
  ```
  def forward(self, *cs):
      bs = cs[0].size(0)
      
      # Problem: we are created a brand new hidden state each forward prop
      h = V(torch.zeros(1, bs, n_hidden)
      
      inp = self.e(torch.stack(cs))
      outp, h = self.rnn(inp, h)
  ```
  * Can improve on that by saving the state of `self.h` in the constructor:

In [9]:
class CharSeqStatefulRnn(nn.Module):
    def __init__(self, vocab_size, n_fac, bs):
        self.vocab_size = vocab_size
        
        super().__init__()
        
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size[0]
        
        # This handles the last batch, if we don't have enough
        # text for a batch size.
        if self.h.size(1) != bs:
            self.init_hidden(bs)
            
        outp, h = self.rnn(self.e(cs), self.h)
        
        # Store results of hidden layer and throw away history of operations.
        # Called: backprop through time.
        self.h = repackage_var(h)
        
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(1, bs, n_hidden)))

### 00:10:50 - Backprop through time

* In multi-output model, unrolled RNN is going to be the size of the corpus. Eg if it's a million words, `self.h` would have a million layers, which would be expensive to run backprop etc.
  * Want to remember state but not history: `Variable(h.data) if type(h) == Variable else tuple(repackage_var(v) for v in h)`
    * By passing `h.data` into a new `Variable` class, you lose the history.
* Process of running back prop through hidden state history is backprop through time.
* Usually set a cap on how many layers to run bptt on.
  * In original RNN lesson, we had a var called `bptt = 70`, which sets how many layers to run backprop through.
* Longer values may let you capture more state about the problem, but may also results in exploding / vanishing gradients.
  
### 00:16:00 - ??

