## Download the Dataset File

In [0]:
# !pip install fastai==0.7.0
# import sys
# !{sys.executable} -m pip install torchtext==0.2.3

In [7]:
# !curl https://s3.amazonaws.com/text-datasets/nietzsche.txt -o nietzsche.txt 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  586k  100  586k    0     0  1676k      0 --:--:-- --:--:-- --:--:-- 1676k


In [0]:
# !mkdir - nietzsche
# !mv nietzsche.txt nietzsche

In [0]:
# !ls

## Start 

In [0]:
%matplotlib inline

from fastai.io import *
from fastai.conv_learner import *

from fastai.column_data import *

In [0]:
PATH = 'nietzsche/'

In [11]:
text = open(f'{PATH}nietzsche.txt').read()
print('corpus length', len(text))

corpus length 600893


In [12]:
text[:400]

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not ground\nfor suspecting that all philosophers, in so far as they have been\ndogmatists, have failed to understand women--that the terrible\nseriousness and clumsy importunity with which they have usually paid\ntheir addresses to Truth, have been unskilled and unseemly methods for\nwinning a woman? Certainly she has never allowed herself '

In [13]:
chars = sorted(list(set(text)))
vocab_size = len(chars) + 1
print('total chars:', vocab_size)

total chars: 85


Sometimes it's useful to have a zero value in the dataset, e.g. for padding

In [14]:
chars.insert(0, "\0")
''.join(chars[1:-6])

'\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxy'

Map from chars to indices and back again

In [0]:
char_indices = {c: i for i, c in enumerate(chars)}
indices_char = {i: c for i, c in enumerate(chars)}

*idx* will be the data we use from now on - it simply converts all the characters to their index (based on the mapping above)

In [16]:
idx = [char_indices[c] for c in text]

idx[:10]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]

In [17]:
''.join(indices_char[i] for i in idx[:70])

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not gro'

## Three char model

- Generally, you want to combine character level model and word level model (e.g. for translation).
- Character level model is useful when a vocabulary contains unusual words — which word level model will just treat as “unknown”. When you see a word you have not seen before, you can use a character level model.
- There is also something in between that is called Byte Pair Encoding (BPE) which looks at n-gram of characters.

### Create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters

In [0]:
cs = 3
c1_dat = [idx[i]   for i in range(0, len(idx) - cs, cs)]
c2_dat = [idx[i+1] for i in range(0, len(idx) - cs, cs)]
c3_dat = [idx[i+2] for i in range(0, len(idx) - cs, cs)]
c4_dat = [idx[i+3] for i in range(0, len(idx) - cs, cs)]

Our inputs

In [0]:
x1 = np.stack(c1_dat)
x2 = np.stack(c2_dat)
x3 = np.stack(c3_dat)

Our output

In [0]:
y = np.stack(c4_dat)

The first 4 inputs and outputs

In [21]:
x1[:4], x2[:4], x3[:4]

(array([40, 30, 29,  1]), array([42, 25,  1, 43]), array([29, 27,  1, 45]))

In [22]:
y[:4]

array([30, 29,  1, 40])

In [23]:
x1.shape, y.shape

((200297,), (200297,))

### Create and train model

Pick a size for our hidden state

In [0]:
n_hidden = 256

The number of latent factors to create (i.e. the size of the embedding matrix)

In [0]:
n_fac = 42

Our Model

In [0]:
class Char3Model(nn.Module):
  def __init__(self, vocab_size, n_hidden, n_fac):
    super().__init__()
    self.e = nn.Embedding(vocab_size, n_fac)
    self.l_in = nn.Linear(n_fac, n_hidden)
    self.l_hidden = nn.Linear(n_hidden, n_hidden)
    self.l_out = nn.Linear(n_hidden, vocab_size)
    
  def forward(self, c1, c2, c3):
    in1 = F.relu(self.l_in(self.e(c1)))
    in2 = F.relu(self.l_in(self.e(c2)))
    in3 = F.relu(self.l_in(self.e(c3)))
    
    h = V(torch.zeros(in1.size()).cuda())
    h = F.tanh(self.l_hidden(h+in1))
    h = F.tanh(self.l_hidden(h+in2))
    h = F.tanh(self.l_hidden(h+in3))
    
    return F.log_softmax(self.l_out(h))

- It is important that this l_hidden uses a square weight matrix whose size matches the output of l_in. Then h and in2 will be the same shape allowing us to sum them together as you see in self.l_hidden(h+in2)
- V(torch.zeros(in1.size()).cuda()) is only there to make the three lines identical to make it easier to put in a for loop later.

In [0]:
md = ColumnarModelData.from_arrays('.', [-1], np.stack([x1,x2,x3], axis=1), y, bs=512)

We will reuse ColumnarModelData. If we stack x1 , x2, and x3, we will get c1, c2, c3 in the forward method. ColumnarModelData.from_arrays will come in handy when you want to train a model in raw-er approach, what you put in [x1, x2, x3] , you will get back in def forward(self, c1, c2, c3)

In [0]:
m = Char3Model(vocab_size, n_hidden, n_fac).cuda()

We create a standard PyTorch model with cuda

In [0]:
it = iter(md.trn_dl)
*xs, yt = next(it)
t = m(*V(xs))

- `iter` to grab an iterator
- `next` returns a mini-batch
- `“Variabize”` the xs tensor, and put it through the model — which will give us 512x85 tensor containing prediction (batch size * unique character)

In [0]:
opt = optim.Adam(m.parameters(), 1e-2)

Create a standard PyTorch optimizer — for which you need to pass in a list of things to optimize, which is returned by m.parameters()

In [31]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      2.109444   0.514621  



[array([0.51462])]

In [0]:
set_lrs(opt, 0.001)

In [33]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.851351   0.538242  



[array([0.53824])]

We do not find a learning rate finder and SGDR because we are not using Learner, so we would need to manually do learning rate annealing (set LR a little bit lower)

# Test Model

In [0]:
def get_next(inp):
  idxs = T(np.array([char_indices[c] for c in inp]))
  p = m(*VV(idxs))
  i = np.argmax(to_np(p))
  return chars[i]

This function takes three characters and return what the model predict as the fourth. Note: np.argmax returns index of the maximum values.

In [35]:
get_next('y. ')

'T'

In [36]:
get_next('ppl')

'e'

In [37]:
get_next(' th')

'e'

In [38]:
get_next('and')

' '

## Our First RNN

### Create inputs

This is the size of our unrolled RNN.

In [0]:
cs = 8

For each of 0 through 7, create a list of every 8th character with that starting point. These will be the 8 inputs to our model.

In [0]:
c_in_dat = [[idx[i+j] for i in range(cs)] for j in range(len(idx)-cs)]

Then create a list of the next character in each of these series. This will be the labels for our model.

In [0]:
c_out_dat = [idx[j+cs] for j in range(len(idx) - cs)]

In [0]:
xs = np.stack(c_in_dat, axis=0)

In [43]:
xs.shape

(600885, 8)

In [0]:
y = np.stack(c_out_dat)

In [45]:
y.shape

(600885,)

So each column below is one series of 8 characters from the text.

In [46]:
xs[:cs, :cs]

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [42, 29, 30, 25, 27, 29,  1,  1],
       [29, 30, 25, 27, 29,  1,  1,  1],
       [30, 25, 27, 29,  1,  1,  1, 43],
       [25, 27, 29,  1,  1,  1, 43, 45],
       [27, 29,  1,  1,  1, 43, 45, 40],
       [29,  1,  1,  1, 43, 45, 40, 40],
       [ 1,  1,  1, 43, 45, 40, 40, 39]])

...and this is the next character after each sequence.

In [47]:
y[:cs]

array([ 1,  1, 43, 45, 40, 40, 39, 43])

### Create and train model

In [0]:
val_idx = get_cv_idxs(len(idx)-cs-1)

In [0]:
md = ColumnarModelData.from_arrays('.', val_idx, xs, y, bs=512)

In [0]:
class CharLoopModel(nn.Module):
  def __init__(self, vocab_size, n_hidden, n_fac):
    super().__init__()
    self.e = nn.Embedding(vocab_size, n_fac)
    self.l_in = nn.Linear(n_fac, n_hidden)
    self.l_hidden = nn.Linear(n_hidden, n_hidden)
    self.l_out = nn.Linear(n_hidden, vocab_size)
    
  def forward(self, *cs):
    bs = cs[0].size(0)
    h = V(torch.zeros(bs, n_hidden).cuda())
    for c in cs:
      inp = F.relu(self.l_in(self.e(c)))
      h = F.tanh(self.l_hidden(h+inp))
      
    return F.log_softmax(self.l_out(h), dim=-1)

In [0]:
m = CharLoopModel(vocab_size, n_hidden, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-2)

In [52]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      2.016595   2.031998  



[array([2.032])]

In [0]:
set_lrs(opt, 0.001)

In [54]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.725105   1.719362  



[array([1.71936])]

**Adding vs. Contatenating**
We now will try something else for self.l_hidden(h+inp). The reason is that the input state and the hidden state are qualitatively different. Input is the encoding of a character, and h is an encoding of series of characters. So adding them together, we might lose information. Let’s concatenate them instead. Don’t forget to change the input to match the shape (n_fac+n_hidden instead of n_fac).

## Concat RNN Model

In [0]:
class CharLoopConcatModel(nn.Module):
  def __init__(self, vocab_size, n_hidden, n_fac):
    super().__init__()
    self.e = nn.Embedding(vocab_size, n_fac)
    self.l_in = nn.Linear(n_fac+n_hidden, n_hidden)
    self.l_hidden = nn.Linear(n_hidden, n_hidden)
    self.l_out = nn.Linear(n_hidden, vocab_size)
    
  def forward(self, *cs):
    bs = cs[0].size(0)
    h = V(torch.zeros(bs, n_hidden).cuda())
    for c in cs:
      inp = torch.cat((h, self.e(c)), 1)
      inp = F.relu(self.l_in(inp))
      h = F.tanh(self.l_hidden(h+inp))
      
    return F.log_softmax(self.l_out(h), dim=-1)

In [0]:
m = CharLoopConcatModel(vocab_size, n_hidden, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-2)

In [0]:
it = iter(md.trn_dl)
*xs,yt = next(it)
t = m(*V(xs))

In [58]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      3.233407   3.213059  



[array([3.21306])]

In [0]:
set_lrs(opt, 1e-4)

In [60]:
fit(m, md, 1, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      3.120584   3.12364   



[array([3.12364])]

### Test model

In [0]:
def get_next(inp):
    idxs = T(np.array([char_indices[c] for c in inp]))
    p = m(*VV(idxs))
    i = np.argmax(to_np(p))
    return chars[i]

In [75]:
get_next('for thos')

'e'

In [63]:
get_next('part of ')

' '

In [64]:
get_next('queens a')

' '

## RNN with Pytorch

In [0]:
class CharRnn(nn.Module):
    def __init__(self, vocab_size, n_hidden, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(1, bs, n_hidden))
        inp = self.e(torch.stack(cs))
        outp, h = self.rnn(inp, h)
        
        return F.log_softmax(self.l_out(outp[-1]), dim=-1)

In [0]:
m = CharRnn(vocab_size, n_hidden, vocab_size).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [0]:
it = iter(md.trn_dl)
*xs, yt = next(it)

In [68]:
t = m.e(V(torch.stack(xs)))
t.size()

torch.Size([8, 512, 85])

In [69]:
ht = V(torch.zeros(1, 512, n_hidden))
outp, hn = m.rnn(t, ht)
outp.size(), hn.size()

(torch.Size([8, 512, 256]), torch.Size([1, 512, 256]))

In [70]:
t = m(*V(xs)); t.size()

torch.Size([512, 85])

In [71]:
fit(m, md, 4, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.803296   1.784751  
    1      1.638045   1.636439  
    2      1.556172   1.570006  
    3      1.514498   1.532784  



[array([1.53278])]

In [0]:
set_lrs(opt, 1e-4)

In [73]:
fit(m, md, 4, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.447283   1.488441  
    1      1.44738    1.48315   
    2      1.426072   1.479919  
    3      1.427735   1.476746  



[array([1.47675])]

### Test Model

In [0]:
def get_next(inp):
    idxs = T(np.array([char_indices[c] for c in inp]))
    p = m(*VV(idxs))
    i = np.argmax(to_np(p))
    return chars[i]

In [77]:
get_next('for thos')

'e'

In [0]:
def get_next_n(inp, n):
    res = inp
    for i in range(n):
        c = get_next(inp)
        res += c
        inp = inp[1:] + c
    return res

In [79]:
get_next_n('for thos', 40)

'for those of the stronger the stronger the stron'

## Multi-output model

Let's take a non-overlapping sets of characters this time

In [0]:
c_in_dat = [[idx[i+j] for i in range(cs)] for j in range(0, len(idx)-cs-1, cs)]

In [0]:
c_out_dat = [[idx[i+j] for i in range(cs)] for j in range(1, len(idx)-cs, cs)]

In [82]:
xs = np.stack(c_in_dat)
xs.shape

(75111, 8)

In [83]:
ys = np.stack(c_out_dat)
ys.shape

(75111, 8)

In [84]:
xs[:cs, :cs]

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [ 1,  1, 43, 45, 40, 40, 39, 43],
       [33, 38, 31,  2, 73, 61, 54, 73],
       [ 2, 44, 71, 74, 73, 61,  2, 62],
       [72,  2, 54,  2, 76, 68, 66, 54],
       [67,  9,  9, 76, 61, 54, 73,  2],
       [73, 61, 58, 67, 24,  2, 33, 72],
       [ 2, 73, 61, 58, 71, 58,  2, 67]])

In [85]:
ys[:cs, :cs]

array([[42, 29, 30, 25, 27, 29,  1,  1],
       [ 1, 43, 45, 40, 40, 39, 43, 33],
       [38, 31,  2, 73, 61, 54, 73,  2],
       [44, 71, 74, 73, 61,  2, 62, 72],
       [ 2, 54,  2, 76, 68, 66, 54, 67],
       [ 9,  9, 76, 61, 54, 73,  2, 73],
       [61, 58, 67, 24,  2, 33, 72,  2],
       [73, 61, 58, 71, 58,  2, 67, 68]])

### Create and train model

In [0]:
val_idx = get_cv_idxs(len(xs)-cs-1)

In [0]:
md = ColumnarModelData.from_arrays('.', val_idx, xs, ys, bs=512)

In [0]:
class CharSeqRnn(nn.Module):
    def __init__(self, vocab_size, n_hidden, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
    
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(1, bs, n_hidden))
        inp = self.e(torch.stack(cs))
        outp, h = self.rnn(inp, h)
        return F.log_softmax(self.l_out(outp), dim=-1)

Notice that we are no longer doing outp[-1] since we want to keep all of them. But everything else is identical. One complexity is that we want to use the negative log-likelihood loss function as before, but it expects two rank 2 tensors (two mini-batches of vectors). But here, we have rank 3 tensor:

- 8 characters (time steps)
- 84 probabilities
- for 512 minibatch

In [0]:
m = CharSeqRnn(vocab_size, n_hidden, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [0]:
it = iter(md.trn_dl)
*xst,yt = next(it)

In [0]:
def nll_loss_seq(inp, targ):
    sl,bs,nh = inp.size()
    targ = targ.transpose(0,1).contiguous().view(-1)
    return F.nll_loss(inp.view(-1,nh), targ)

- F.nll_loss is the PyTorch loss function.
- Flatten our inputs and targets.
- Transpose the first two axes because PyTorch expects 1. sequence length (how many time steps), 2. batch size, 3. hidden state itself. yt.size() is 512 by 8, whereas sl, bs is 8 by 512.
- PyTorch does not generally actually shuffle the memory order when you do things like ‘transpose’, but instead it keeps some internal metadata to treat it as if it is transposed. When you transpose a matrix, PyTorch just updates the metadata . If you ever see an error that says “this tensor is not continuous” , add .contiguous() after it and error goes away.
- .view is same as np.reshape. -1 indicates as long as it needs to be.

In [92]:
fit(m, md, 4, opt, nll_loss_seq)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      2.581464   2.400267  
    1      2.288379   2.198611  
    2      2.137271   2.087877  
    3      2.040412   2.012075  



[array([2.01208])]

In [0]:
set_lrs(opt, 1e-4)

In [94]:
fit(m, md, 1, opt, nll_loss_seq)

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.996641   1.995987  



[array([1.99599])]

### Identity Init

In [0]:
m = CharSeqRnn(vocab_size, n_hidden, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-2)

In [96]:
m.rnn.weight_hh_l0.data.copy_(torch.eye(n_hidden))


    1     0     0  ...      0     0     0
    0     1     0  ...      0     0     0
    0     0     1  ...      0     0     0
       ...          ⋱          ...       
    0     0     0  ...      1     0     0
    0     0     0  ...      0     1     0
    0     0     0  ...      0     0     1
[torch.cuda.FloatTensor of size 256x256 (GPU 0)]

In [97]:
fit(m, md, 4, opt, nll_loss_seq)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      2.393394   2.212105  
    1      2.116801   2.071176  
    2      2.006494   1.978114  
    3      1.95075    1.945249  



[array([1.94525])]

In [0]:
set_lrs(opt, 1e-3)

In [99]:
fit(m, md, 4, opt, nll_loss_seq)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.853032   1.868892  
    1      1.841726   1.861415  
    2      1.834748   1.854703  
    3      1.82582    1.849395  



[array([1.8494])]

## Stateful Model

### Setup

In [0]:
from torchtext import vocab, data
from fastai.nlp import *
from fastai.lm_rnn import *

In [112]:
PATH = 'nietzsche/'
TRN_PATH = 'trn/'
VAL_PATH = 'val/'

TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'

%ls {PATH}

nietzsche.txt  [0m[01;34mtrn[0m/  [01;34mval[0m/


In [102]:
text = open(f'{PATH}nietzsche.txt').read()
print('corpus length', len(text))

corpus length 600893


In [0]:
n = len(text)
n_trn = int(n * 0.8)

os.makedirs(TRN, exist_ok=True)
os.makedirs(VAL, exist_ok=True)

f = open(f"{TRN}trn.txt", 'w')
f.write(text[:n_trn])
f.close()

f = open(f"{VAL}val.txt", 'w')
f.write(text[n_trn:])
f.close()

In [128]:
%ls {PATH}trn

trn.txt


In [129]:
trn_text = open(f'{TRN}trn.txt').read()
print('corpus length', len(trn_text))

corpus length 480714


In [130]:
val_text = open(f'{VAL}val.txt').read()
print('corpus length', len(val_text))

corpus length 120179


In [131]:
TEXT = data.Field(lower=True, tokenize=list)
bs=64
bptt=8
n_fac=42
n_hidden=256

FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)

len(md.trn_dl), md.nt, len(md.trn_ds), len(md.trn_ds[0].text)

(922, 55, 1, 472943)

## RNN

In [0]:
class CharSeqStatefulRnn(nn.Module):
  def __init__(self, vocab_size, n_fac, bs):
    super().__init__()
    self.vocab_size = vocab_size
    self.e = nn.Embedding(vocab_size, n_fac)
    self.rnn = nn.RNN(n_fac, n_hidden)
    self.l_out = nn.Linear(n_hidden, vocab_size)
    self.init_hidden(bs)
    
  def forward(self, cs):
    bs = cs[0].size(0)
    if self.h.size(1) != bs:
      self.init_hidden(bs)
    outp, h = self.rnn(self.e(cs), self.h)
    self.h = repackage_var(h)
    return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
  
  def init_hidden(self, bs):
    self.h = V(torch.zeros(1, bs, n_hidden))

In [0]:
m = CharSeqStatefulRnn(md.nt, n_fac, 512).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [142]:
fit(m, md, 4, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.880302   1.863951  
    1      1.695875   1.719525  
    2      1.616021   1.64921   
    3      1.566189   1.611505  



[array([1.6115])]

In [143]:
set_lrs(opt, 1e-4)

fit(m, md, 4, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.485798   1.564128  
    1      1.47818    1.558715  
    2      1.478557   1.55525   
    3      1.473487   1.551316  



[array([1.55132])]

## RNN Loop

In [0]:
def RNNCell(input, hidden, w_ih, w_hh, b_ih, b_hh):
  return F.tanh(F.linear(input, w_ih, b_ih) + F.linear(hidden, w_hh, b_hh))

In [0]:
class CharSeqStatefulRnn2(nn.Module):
  def __init__(self, vocab_size, n_fac, bs):
    super().__init__()
    self.vocab_size = vocab_size
    self.e = nn.Embedding(vocab_size, n_fac)
    self.rnn = nn.RNNCell(n_fac, n_hidden)
    self.l_out = nn.Linear(n_hidden, vocab_size)
    self.init_hidden(bs)
   
  def forward(self, cs):
    bs = cs[0].size(0)
    if self.h.size(1) != bs:
      self.init_hidden(bs)
    outp = []
    o = self.h
    for c in cs:
      o = self.rnn(self.e(c), o)
      outp.append(o)
    outp = self.l_out(torch.stack(outp))
    self.h = repackage_var(o)
    return F.log_softmax(outp, dim=-1).view(-1, self.vocab_size)
  
  def init_hidden(self, bs):
    self.h = V(torch.zeros(1, bs, n_hidden))

In [0]:
m = CharSeqStatefulRnn2(md.nt, n_fac, 512).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [150]:
fit(m, md, 4, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.883648   1.876018  
    1      1.699205   1.726509  
    2      1.611374   1.650179  
    3      1.555448   1.611744  



[array([1.61174])]

## GRU

In [0]:
def GRUCell(input, hidden, w_ih, w_hh, b_ih, b_hh):
  gi = F.linear(input, w_ih, b_ih)
  gh = F.linear(hidden, w_hh, b_hh)
  i_r, i_i, i_n = gi.chunk(3, 1)
  h_r, h_i, h_n = gh.chunk(3, 1)
  
  resetgate = F.sigmoid(i_r + h_r)
  inputgate = F.sigmoid(i_i + h_i)
  newgate = F.tanh(i_n + resetgate * h_n)
  return newgate + inputgate * (hidden - newgate)

In [0]:
class CharSeqStatefulGRU(nn.Module):
  def __init__(self, vocab_size, n_fac, bs):
      super().__init__()
      self.vocab_size = vocab_size
      self.e = nn.Embedding(vocab_size, n_fac)
      self.rnn = nn.GRU(n_fac, n_hidden)
      self.l_out = nn.Linear(n_hidden, vocab_size)
      self.init_hidden(bs)
        
  def forward(self, cs):
      bs = cs[0].size(0)
      if self.h.size(1) != bs: self.init_hidden(bs)
      outp,h = self.rnn(self.e(cs), self.h)
      self.h = repackage_var(h)
      return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
  def init_hidden(self, bs): self.h = V(torch.zeros(1, bs, n_hidden))

In [0]:
m = CharSeqStatefulGRU(md.nt, n_fac, 512).cuda()

opt = optim.Adam(m.parameters(), 1e-3)

In [158]:
fit(m, md, 6, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=6, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.77719    1.767721  
    1      1.576549   1.605618  
    2      1.489494   1.536542  
    3      1.434435   1.515936  
    4      1.392823   1.490049  
    5      1.367838   1.476448  



[array([1.47645])]

In [159]:
set_lrs(opt, 1e-4)
fit(m, md, 3, opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=3, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.277034   1.438563  
    1      1.280386   1.435211  
    2      1.278807   1.433204  



[array([1.4332])]

### Putting it all together: LSTM

In [0]:
from fastai import sgdr

n_hidden=512

In [0]:
class CharSeqStatefulLSTM(nn.Module):
    def __init__(self, vocab_size, n_fac, bs, nl):
        super().__init__()
        self.vocab_size,self.nl = vocab_size,nl
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.LSTM(n_fac, n_hidden, nl, dropout=0.5)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h[0].size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl, bs, n_hidden)),
                  V(torch.zeros(self.nl, bs, n_hidden)))

In [0]:
m = CharSeqStatefulLSTM(md.nt, n_fac, 512, 2).cuda()
lo = LayerOptimizer(optim.Adam, m, 1e-2, 1e-5)

In [0]:
os.makedirs(f'{PATH}models', exist_ok=True)

In [175]:
fit(m, md, 2, lo.opt, F.nll_loss)

HBox(children=(IntProgress(value=0, description='Epoch', max=2, style=ProgressStyle(description_width='initial…

epoch      trn_loss   val_loss   
    0      1.763661   1.696266  
    1      1.668685   1.619517  



[array([1.61952])]

In [176]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

HBox(children=(IntProgress(value=0, description='Epoch', max=15, style=ProgressStyle(description_width='initia…

epoch      trn_loss   val_loss   
    0      1.529979   1.486906  
    1      1.562865   1.507932  
    2      1.448414   1.436844  
    3      1.589658   1.535136  
    4      1.509773   1.475784  
    5      1.433641   1.416734  
    6      1.369753   1.392788  
    7      1.56645    1.516121  
    8      1.536335   1.499027  
    9      1.485714   1.472932  
    10     1.458203   1.443127  
    11     1.412446   1.408095  
    12     1.366569   1.378757  
    13     1.321687   1.358049  
    14     1.291299   1.351896  



[array([1.3519])]

In [177]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**6-1, lo.opt, F.nll_loss, callbacks=cb)

HBox(children=(IntProgress(value=0, description='Epoch', max=63, style=ProgressStyle(description_width='initia…

epoch      trn_loss   val_loss   
    0      1.287838   1.350554  
    1      1.29167    1.349028  
    2      1.28663    1.348659  
    3      1.287032   1.348009  
    4      1.289938   1.347187  
    5      1.281283   1.346411  
    6      1.276328   1.346247  
    7      1.279426   1.345653  
    8      1.28285    1.344392  
    9      1.277433   1.343854  
    10     1.280573   1.343602  
    11     1.273803   1.342896  
    12     1.277723   1.342669  
    13     1.273317   1.342537  
    14     1.26977    1.342495  
    15     1.272144   1.342238  
    16     1.272301   1.341745  
    17     1.269518   1.341448  
    18     1.270307   1.340427  
    19     1.268745   1.340067  
    20     1.26456    1.339679  
    21     1.265118   1.339124  
    22     1.263928   1.338394  
    23     1.265168   1.338415  
    24     1.259128   1.338296  
    25     1.255036   1.33806   
    26     1.260908   1.337901  
    27     1.256735   1.337867  
    28     1.259983   1.337836  
    29   

[array([1.33222])]

In [0]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**8-1, lo.opt, F.nll_loss, callbacks=cb)

## TEST

In [0]:
def get_next(inp):
  idxs = TEXT.numericalize(inp)
  p = m(VV(idxs.transpose(0, 1)))
  r = torch.multinomial(p[-1].exp(), 1)
  return TEXT.vocab.itos[to_np(r)[0]]

In [179]:
get_next('for thos')

'e'

In [0]:
def get_next_n(inp, n):
  res = inp
  for i in range(n):
    c = get_next(inp)
    res += c
    inp = inp[1:]+c
  return res

In [181]:
print(get_next_n('for thos', 400))

for those of a lofty stones prehension, plepetay, side understandhaips him andtranslave in a really means3 charms us_ is aloher.these morals a fimality and fundamentay! metaphysic; that the rearing showly into catched as its amit a new-nature andmisposition. but why confinces thatexterpationi(7-bodythem? the liaken and an'quiscetions: is a best, but i father, certain. their certain moralsares, and even to
