# Recurrent Neural Networks (RNN) 
edited by GSG based on [Lesson6-rnn](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson6-rnn.ipynb).
See [Lecture Notes](http://forums.fast.ai/t/deeplearning-lec7notes/8939)


In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# Data Setup for works of Nietzche by char

Download the collected works of Nietzsche to use as our data.

In [2]:
PATH='data/nietzsche/'
TRN_PATH = 'trn/'
VAL_PATH = 'val/'
TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'
%ls {PATH}

[0m[01;34mmodels[0m/  nietzsche.txt  [01;34mtrn[0m/  [01;34mval[0m/


In [3]:
from fastai.io import get_data # *

In [4]:
get_data("https://s3.amazonaws.com/text-datasets/nietzsche.txt", f'{PATH}nietzsche.txt')
text = open(f'{PATH}nietzsche.txt').read()
print('corpus length:', len(text))  # number of characters in the whole text

corpus length: 600893


In [5]:
text[:310]

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not ground\nfor suspecting that all philosophers, in so far as they have been\ndogmatists, have failed to understand women--that the terrible\nseriousness and clumsy importunity with which they have usually paid\ntheir addresses to Truth, have been un'

### Validation set
Make the first 80% of the text to be the training set, and the later 20% to be the validation set.
No need to be a random sample, as it is more likely that a test set will come from a separate corpus.
This is a more realistic validation test for the model is to have a separate set, ie a different part of Nietszche's corpus.


#### Scripts for validation sets.

JH is more "lazy than the students" and fits the data to the existing API, in this case in `torchtext`, with trainning part, validation part, etc.  So he made copies into 2 paths and did a sed script to keep it like this.

sed -n [1,7947p] nietzsche.txt > trn/trn.txt

sed -n [7950,9935p] nietzsche.txt > val/val.txt

In [6]:
%ls -l {TRN} {VAL}    # the training and validation sets

data/nietzsche/trn/:
total 480
-rw-rw-r-- 1 german german 490861 Jan 21 18:32 trn.txt

data/nietzsche/val/:
total 108
-rw-rw-r-- 1 german german 109974 Jan 21 18:34 val.txt


#### Using torchtext data.Field
A field is a description of how to pre-process the text, eg lowercase, and how to tokenize.
Below we use Python's `list` as a tokenizer so we simply get the characters.
So each minibatch gets a list of characters.

`data.Field(sequential=True, use_vocab=True, init_token=None, eos_token=None, fix_length=None, tensor_type=<class 'torch.LongTensor'>, preprocessing=None, postprocessing=None, lower=False, tokenize=<function Field.<lambda> at 0x7f5cc269fea0>, include_lengths=False, batch_first=False, pad_token='<pad>', unk_token='<unk>', pad_first=False)`
    
Defines a datatype together with instructions for converting to Tensor.
Field class models common text processing datatypes that can be represented
by tensors.  It holds a Vocab object that defines the set of possible values
for elements of the field and their corresponding numerical representations.
The Field object also holds other parameters relating to how a datatype
should be numericalized, such as a tokenization method and the kind of
Tensor that should be produced.

If a Field is shared between two columns in a dataset (e.g., question and
answer in a QA dataset), then they will have a shared vocabulary.

In [7]:
from torchtext import vocab, data

In [8]:
TEXT = data.Field(lower=True, tokenize=list)

Now create a small dictionary for the FILES. Since we don't have a separate test set we use the validation set again.

In [9]:
FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)

`set()` creates a list of the unique characters. Then we make them into a list and we sort them.

In [10]:
chars = sorted(list(set(text)))  # chars is our vocabulary
vocab_size = len(chars)+1
print('total chars:', vocab_size)

total chars: 85


Sometimes it's useful to have a zero value in the dataset, e.g. for padding.

In [11]:
chars.insert(0, "\0")
''.join(chars[1:-6])  #show all characters

'\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxy'

Map from chars to indices and back again, creating 2 dictionaries.

In [12]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

*idx* will be the data we use from now on - it has all the characters in the text by their index (based on the mapping above)

In [13]:
idx = [char_indices[c] for c in text]

print(idx[:10]); text[:10]   # these are the indeces for the first 10

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]


'PREFACE\n\n\n'

confirm that indeed we got the mapping correct.

In [14]:
''.join(indices_char[i] for i in idx[:70]) 

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not gro'

# Three char model

## Create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters. `cs=3` skip over 3 at the time 

In [15]:
cs=3
c1_dat = [idx[i]   for i in range(0, len(idx)-1-cs, cs)]  # 0th character
c2_dat = [idx[i+1] for i in range(0, len(idx)-1-cs, cs)]
c3_dat = [idx[i+2] for i in range(0, len(idx)-1-cs, cs)]
c4_dat = [idx[i+3] for i in range(0, len(idx)-1-cs, cs)]  # 3rd character

In [16]:
len(c1_dat), c1_dat[:6], c2_dat[:6], c3_dat[:6], c4_dat[:6]

(200297,
 [40, 30, 29, 1, 40, 43],
 [42, 25, 1, 43, 40, 33],
 [29, 27, 1, 45, 39, 38],
 [30, 29, 1, 40, 43, 31])

In [17]:
import numpy as np

Our inputs converted to np using `np.asarray`. JH was using `np.stack`, " np.stack is going to be different for when axis!=0. I used ‘stack’ here because I think this is a better semantic match for what we’re doing."

In [18]:
x1 = np.asarray(c1_dat[:-2])  # 0 character
x2 = np.asarray(c2_dat[:-2])  # 1 character
x3 = np.asarray(c3_dat[:-2])  # 2 character

In [19]:
type(x1), x1

(numpy.ndarray, array([40, 30, 29, ..., 67, 68, 72]))

Our output

In [20]:
y = np.asarray(c4_dat[:-2])

The first 4 inputs and corresponding outputs (in following line)

In [21]:
x1[:4], x2[:4], x3[:4]

(array([40, 30, 29,  1]), array([42, 25,  1, 43]), array([29, 27,  1, 45]))

In [22]:
y[:4]

array([30, 29,  1, 40])

In [23]:
x1.shape, y.shape

((200295,), (200295,))

## Create and train model
Now using pytorch nn

In [24]:
import torch 
import torch.nn as nn

Pick a size for our hidden state

In [25]:
n_hidden = 256   # Activations

The number of latent factors to create (i.e. the size of the embedding matrix)

In [26]:
n_fac = 42   # about half the number of characters we have (experimental)

Below` Char3model` is standard fully connected model... 
Each character will go thru an embedding, linear and relu.

`nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)`    
A simple lookup table that stores embeddings of a fixed dictionary and size.

`nn.Linear(in_features, out_features, bias=True)`   
Applies a linear transformation to the incoming data: :math:`y = Ax + b`

`F.relu(input, inplace=False)`
relu(input, threshold, value, inplace=False) -> Variable
Applies the rectified linear unit function element-wise.

`F.log_softmax(input, dim=None, _stacklevel=3)`
Applies a softmax followed by a logarithm.

In [27]:
class Char3Model(nn.Module):
    def __init__(self, vocab_size, n_fac):
        super().__init__()    # access the __init__ class of nn.Module 
        self.e = nn.Embedding(vocab_size, n_fac)  #one embedding

        # The 'green arrow' from our diagram - the layer operation from input to hidden
        self.l_in = nn.Linear(n_fac, n_hidden)

        # The 'orange arrow' from our diagram - the layer operation from hidden to hidden
        self.l_hidden = nn.Linear(n_hidden, n_hidden)   #This is a squared Matrix. "the trick"
        
        # The 'blue arrow' from our diagram - the layer operation from hidden to output
        self.l_out = nn.Linear(n_hidden, vocab_size)
        
    def forward(self, c1, c2, c3):
        in1 = F.relu(self.l_in(self.e(c1)))
        in2 = F.relu(self.l_in(self.e(c2)))
        in3 = F.relu(self.l_in(self.e(c3)))
        
        #now activations
        h = V(torch.zeros(in1.size()).cuda())  #this is to make the following 3 lines identical so later loop
        h = F.tanh(self.l_hidden(h+in1))
        h = F.tanh(self.l_hidden(h+in2))
        h = F.tanh(self.l_hidden(h+in3))
        
        return F.log_softmax(self.l_out(h))

In [28]:
from fastai.column_data import ColumnarModelData # *

`ColumnarModelData.from_arrays(path, val_idxs, xs, y, is_reg=True, is_multi=False, bs=64, test_xs=None, shuffle=True)`

fastai used to create the data object for training 

In [29]:
md = ColumnarModelData.from_arrays('.', [-1], np.stack([x1,x2,x3], axis=1), y, bs=512)

In [30]:
m = Char3Model(vocab_size, n_fac).cuda()  #this is a standard pytorch model (not fastai) so we need to add cuda()

In [31]:
from fastai.conv_learner import V, F   # *

In [32]:
it = iter(md.trn_dl) #grab the iterator to iterate thru the training set
*xs,yt = next(it)   #grab a minibatch of size bs=512, returns all the xs and ys tensors
t = m(*V(xs))  #invoke the model as a function passing the tensor as Variable

In [33]:
type(xs), type(xs[0]), len(xs), len(xs[0]) # xs is a list of 3 tensors of size bs

(list, torch.cuda.LongTensor, 3, 512)

In [34]:
xs[0].size(), xs[0]

(torch.Size([512]), 
  61
  72
  73
  65
  56
  72
  67
  69
  73
  72
   2
  68
  65
  73
  67
   2
   2
  29
  62
  74
  58
  67
  38
  10
  62
  73
  67
  73
  61
  58
   2
  55
  62
   2
   2
   2
  58
   2
  72
   8
  58
  55
  58
   2
  22
  57
  72
   4
  73
   2
  72
  74
  73
  57
  58
  67
  74
  73
  58
  78
  56
  68
  73
  77
  61
  67
   2
  65
   2
  74
  24
  72
  10
  10
  76
  67
  73
  73
  20
  56
  73
  59
  10
   2
  61
  67
   2
  57
  69
  60
  68
  33
   2
  73
  73
  75
  39
  58
   2
  54
  58
   2
   2
  74
  56
  58
  55
  56
  62
  77
  73
  69
  69
   2
   2
  67
  73
  71
  67
   2
  56
   1
  73
  58
   8
  54
  68
  62
  58
  62
   8
  58
  45
  73
  32
  61
  73
  61
  68
   2
  62
  71
  72
  55
   2
  58
   2
  61
   2
  71
  55
  73
  72
  71
  73
  62
  60
  74
  67
  73
  74
  72
  60
  62
  65
  68
  67
  57
  62
  66
   2
  73
   2
  67
  72
  72
  58
  72
  65
  74
  72
   2
  73
  73
   2
  59
   1
  62
  56
  67
  71
  58
  67
  62
   2
  72

For each one below are the (log) probabilities of the characters (a minibatch of 512 out of the 85 in the vocabulary)

In [35]:
t  

Variable containing:
-4.6422 -4.5425 -4.4591  ...  -4.6062 -4.2883 -4.3331
-4.3910 -4.5908 -4.4742  ...  -4.3545 -4.2008 -4.5668
-4.4441 -4.5255 -4.3257  ...  -4.5036 -4.3068 -4.5682
          ...             ⋱             ...          
-4.4045 -4.7033 -4.5676  ...  -4.3706 -4.1562 -4.4312
-4.1991 -4.8139 -4.5314  ...  -4.4237 -4.4263 -4.6011
-4.5181 -4.4486 -4.5114  ...  -4.5055 -4.4911 -4.3749
[torch.cuda.FloatTensor of size 512x85 (GPU 0)]

`optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)`  
See [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)

In [36]:
import torch.optim as optim

For the pytorch optimizer, pass a list of things to optimize, which are the parameters of the model (m)

In [37]:
opt = optim.Adam(m.parameters(), lr=1e-2)    #pytorch optimizer

In [38]:
from fastai.learner import fit

`fit(model, data, epochs, opt, crit, metrics=None, callbacks=None, stepper=<class 'fastai.model.Stepper'>, **kwargs)`
Now returns the val_loss

In [39]:
%time fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      2.061463   4.895084  

CPU times: user 8.72 s, sys: 397 ms, total: 9.11 s
Wall time: 4.6 s


[array([ 4.89508])]

In [40]:
from fastai.layer_optimizer import set_lrs
set_lrs(opt, 0.001)   # fastai set the lr for the optimizer

In [41]:
%time fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.820131   5.26033   

CPU times: user 8.86 s, sys: 450 ms, total: 9.31 s
Wall time: 4.71 s


[array([ 5.26033])]

In [42]:
%time vl = fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.778682   5.130069  

CPU times: user 8.94 s, sys: 396 ms, total: 9.33 s
Wall time: 4.68 s


In [43]:
vl

[array([ 5.13007])]

### Test model

`get_next(inp)` will return the next (predicted) character

In [44]:
from fastai.core import T, VV, to_np

In [45]:
def get_next(inp):
    idxs = T(np.array([char_indices[c] for c in inp]))  # T converts to Tensor
    p = m(*VV(idxs))  # convert to variables and pass to model
    i = np.argmax(to_np(p)) # grab the character number, converting it first to np
    return chars[i]

In [46]:
get_next('y. ')  #pass it 3 characters

'T'

In [47]:
get_next('ppl'), get_next(' th'), get_next('and')

('i', 'e', ' ')

# Our first RNN!

## Create inputs

This is the size of our unrolled RNN.

In [48]:
cs=8    # 8 characters

For each of 0 through 7, create a list of every 8th character with that starting point. These will be the 8 inputs to the model.  j will go from 0 to len(idx)-cs-1.

In [49]:
c_in_dat = [[idx[i+j] for i in range(cs)] for j in range(len(idx)-cs-1)]

In [50]:
c_in_dat[:10]

[[40, 42, 29, 30, 25, 27, 29, 1],
 [42, 29, 30, 25, 27, 29, 1, 1],
 [29, 30, 25, 27, 29, 1, 1, 1],
 [30, 25, 27, 29, 1, 1, 1, 43],
 [25, 27, 29, 1, 1, 1, 43, 45],
 [27, 29, 1, 1, 1, 43, 45, 40],
 [29, 1, 1, 1, 43, 45, 40, 40],
 [1, 1, 1, 43, 45, 40, 40, 39],
 [1, 1, 43, 45, 40, 40, 39, 43],
 [1, 43, 45, 40, 40, 39, 43, 33]]

Then create a list of the next (+cs) character in each of these series. 
This list will have the labels for the model.

In [51]:
c_out_dat = [idx[j+cs] for j in range(len(idx)-cs-1)]

In [52]:
c_out_dat[:10]

[1, 1, 43, 45, 40, 40, 39, 43, 33, 38]

JS had the slower xs = np.stack(c_in_dat, axis=0)   # could be xs = np.asarray(c_in_dat)

In [53]:
xs = np.asarray(c_in_dat)

In [54]:
xs.shape

(600884, 8)

In [55]:
xs[:10]

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [42, 29, 30, 25, 27, 29,  1,  1],
       [29, 30, 25, 27, 29,  1,  1,  1],
       [30, 25, 27, 29,  1,  1,  1, 43],
       [25, 27, 29,  1,  1,  1, 43, 45],
       [27, 29,  1,  1,  1, 43, 45, 40],
       [29,  1,  1,  1, 43, 45, 40, 40],
       [ 1,  1,  1, 43, 45, 40, 40, 39],
       [ 1,  1, 43, 45, 40, 40, 39, 43],
       [ 1, 43, 45, 40, 40, 39, 43, 33]])

In [56]:
y = np.asarray(c_out_dat)

So each column below is one series of 8 characters from the text.

In [57]:
xs[:cs,:cs]

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [42, 29, 30, 25, 27, 29,  1,  1],
       [29, 30, 25, 27, 29,  1,  1,  1],
       [30, 25, 27, 29,  1,  1,  1, 43],
       [25, 27, 29,  1,  1,  1, 43, 45],
       [27, 29,  1,  1,  1, 43, 45, 40],
       [29,  1,  1,  1, 43, 45, 40, 40],
       [ 1,  1,  1, 43, 45, 40, 40, 39]])

...and this is the next character after each sequence.

In [58]:
y[:cs]

array([ 1,  1, 43, 45, 40, 40, 39, 43])

## Create and train model

`get_cv_idxs(n, cv_idx=0, val_pct=0.2, seed=42)` from fastai

Get a list of index values for Validation set from a dataset

Arguments:
    n : int, Total number of elements in the data set.
    cv_idx : int, starting index [idx_start = cv_idx*int(val_pct*n)] 
    val_pct : (int, float), validation set percentage 
    seed : seed value for RandomState
   
Returns:
    list of indexes 

In [59]:
from fastai.dataset import get_cv_idxs

In [60]:
val_idx = get_cv_idxs(len(idx)-cs-1)

In [61]:
len(val_idx), val_idx

(120176, array([480310, 419017, 232803, ..., 134355, 389158, 330599]))

In [62]:
md = ColumnarModelData.from_arrays('.', val_idx, xs, y, bs=512)

`CharLoopModel(nn.Module)` is an RNN!

In [63]:
class CharLoopModel(nn.Module):
    def __init__(self, vocab_size, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)    #Embedding Layer
        self.l_in = nn.Linear(n_fac, n_hidden)      #in linear
        self.l_hidden = nn.Linear(n_hidden, n_hidden) #hidden
        self.l_out = nn.Linear(n_hidden, vocab_size) # out
        
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(bs, n_hidden).cuda())
        for c in cs:   #here is the loop
            inp = F.relu(self.l_in(self.e(c)))
            h = F.tanh(self.l_hidden(h+inp))
        
        return F.log_softmax(self.l_out(h), dim=-1)

In [64]:
m = CharLoopModel(vocab_size, n_fac).cuda() 
opt = optim.Adam(m.parameters(), 1e-2)

In [65]:
%time fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      2.015361   1.984964  

CPU times: user 33.6 s, sys: 1.61 s, total: 35.2 s
Wall time: 22.3 s


[array([ 1.98496])]

In [66]:
set_lrs(opt, 0.001)

In [67]:
%time fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.72084    1.719567  

CPU times: user 33.1 s, sys: 1.27 s, total: 34.4 s
Wall time: 21.6 s


[array([ 1.71957])]

Adding things together may loose information... so lets use concatenation which is better in those cases...
Notice  we pass `n_fac+n_hidden` to the first Linear layer, to get the right dimensions in the input layer.
When we have information, even if it is of the same dimension, concatenating preserves more informtion than adding.
And then use `torch.cat((h, self.e(c)), 1)` in the forward loop.

In [68]:
class CharLoopConcatModel(nn.Module):
    def __init__(self, vocab_size, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.l_in = nn.Linear(n_fac+n_hidden, n_hidden) # concatenation
        self.l_hidden = nn.Linear(n_hidden, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(bs, n_hidden).cuda())
        for c in cs:
            inp = torch.cat((h, self.e(c)), 1)   #now concatenate instead of adding
            inp = F.relu(self.l_in(inp))
            h = F.tanh(self.l_hidden(inp))
        
        return F.log_softmax(self.l_out(h), dim=-1)

In [69]:
m = CharLoopConcatModel(vocab_size, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [70]:
it = iter(md.trn_dl)
*xs,yt = next(it)
t = m(*V(xs))

In [71]:
%time vl1 = fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.814632   1.793261  

CPU times: user 34.1 s, sys: 1.35 s, total: 35.4 s
Wall time: 22.4 s


In [72]:
set_lrs(opt, 1e-4)

In [73]:
%time vls2 = fit(m, md, 1, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.699837   1.707809  

CPU times: user 34.2 s, sys: 1.43 s, total: 35.7 s
Wall time: 22.7 s


Improved from {{vl1}}  to {{vls2}}

### Test model

In [74]:
get_next('ppl'), get_next(' th'), get_next('and')

('e', 'e', ' ')

# RNN with pytorch
Now lets use Pytorch to do for us the writing of the loop in the forward (with a starting point) and create the input linear layers.
For this we use the `nn.RNN`.
Pytorch appends a hidden state, so it returns all of them. 
But we care only for the last one, we use outp[-1]

Problem is that in the forward below we keep throwing away the hidden $h$.
So we will fix this in the next part.

In [75]:
class CharRnn(nn.Module):
    def __init__(self, vocab_size, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(1, bs, n_hidden))
        inp = self.e(torch.stack(cs))
        outp,h = self.rnn(inp, h)    #Pytorch returns all the hidden states in h
        
        return F.log_softmax(self.l_out(outp[-1]), dim=-1)   #outp[-1] because we only care for the last one

In [76]:
m = CharRnn(vocab_size, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [77]:
it = iter(md.trn_dl)
*xs,yt = next(it)

In [78]:
t = m.e(V(torch.stack(xs)))
t.size()

torch.Size([8, 512, 42])

In [79]:
ht = V(torch.zeros(1, 512,n_hidden))
outp, hn = m.rnn(t, ht)
outp.size(), hn.size()

(torch.Size([8, 512, 256]), torch.Size([1, 512, 256]))

In [80]:
t = m(*V(xs)); t.size()

torch.Size([512, 85])

In [81]:
%time fit(m, md, 4, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.881316   1.85602   
    1      1.677138   1.67785                               
    2      1.590103   1.598457                              
    3      1.539169   1.551392                              

CPU times: user 2min 3s, sys: 6.66 s, total: 2min 9s
Wall time: 1min 19s


[array([ 1.55139])]

In [82]:
set_lrs(opt, 1e-4)

In [83]:
%time fit(m, md, 2, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.465953   1.512221  
    1      1.456044   1.506954                              

CPU times: user 1min 1s, sys: 3.27 s, total: 1min 5s
Wall time: 39.6 s


[array([ 1.50695])]

## Test model

In [84]:
get_next('for thos'), get_next('part of '), get_next('queens a')

('e', 't', 'n')

get_next_n() used to look forward multiple chars

In [85]:
def get_next_n(inp, n):
    res = inp
    for i in range(n):
        c = get_next(inp)
        res += c
        inp = inp[1:]+c
    return res

In [86]:
get_next_n('for thos', 40), get_next_n('part of ', 40), get_next_n('queens a', 40)

('for those stand the same the same the same the s',
 'part of the same the same the same the same the ',
 'queens and the same the same the same the same t')

## Multi-output model

### Setup

Let's take non-overlapping sets of characters this time. Recall the overlapping was done by
`c_in_dat = [[idx[i+j] for i in range(cs)] for j in range(len(idx)-cs-1)]`

using `range(start, stop[, step])`

In [87]:
c_in_dat = [[idx[i+j] for i in range(cs)] for j in range(0, len(idx)-cs-1, cs)]

In [88]:
c_in_dat[:10]

[[40, 42, 29, 30, 25, 27, 29, 1],
 [1, 1, 43, 45, 40, 40, 39, 43],
 [33, 38, 31, 2, 73, 61, 54, 73],
 [2, 44, 71, 74, 73, 61, 2, 62],
 [72, 2, 54, 2, 76, 68, 66, 54],
 [67, 9, 9, 76, 61, 54, 73, 2],
 [73, 61, 58, 67, 24, 2, 33, 72],
 [2, 73, 61, 58, 71, 58, 2, 67],
 [68, 73, 2, 60, 71, 68, 74, 67],
 [57, 1, 59, 68, 71, 2, 72, 74]]

Then create the exact same thing, offset by 1, as our labels

In [89]:
c_out_dat = [[idx[i+j] for i in range(cs)] for j in range(1, len(idx)-cs, cs)]

In [90]:
xs = np.stack(c_in_dat)
xs.shape  # shape is now 600884/8, as we are looking at 8 non-overlapping characters

(75111, 8)

In [91]:
ys = np.stack(c_out_dat)
ys.shape

(75111, 8)

In [92]:
xs[:cs,:cs]   # using non-overlapping characters

array([[40, 42, 29, 30, 25, 27, 29,  1],
       [ 1,  1, 43, 45, 40, 40, 39, 43],
       [33, 38, 31,  2, 73, 61, 54, 73],
       [ 2, 44, 71, 74, 73, 61,  2, 62],
       [72,  2, 54,  2, 76, 68, 66, 54],
       [67,  9,  9, 76, 61, 54, 73,  2],
       [73, 61, 58, 67, 24,  2, 33, 72],
       [ 2, 73, 61, 58, 71, 58,  2, 67]])

In [93]:
ys[:cs,:cs]

array([[42, 29, 30, 25, 27, 29,  1,  1],
       [ 1, 43, 45, 40, 40, 39, 43, 33],
       [38, 31,  2, 73, 61, 54, 73,  2],
       [44, 71, 74, 73, 61,  2, 62, 72],
       [ 2, 54,  2, 76, 68, 66, 54, 67],
       [ 9,  9, 76, 61, 54, 73,  2, 73],
       [61, 58, 67, 24,  2, 33, 72,  2],
       [73, 61, 58, 71, 58,  2, 67, 68]])

### Create and train model

In [94]:
val_idx = get_cv_idxs(len(xs)-cs-1)

In [95]:
md = ColumnarModelData.from_arrays('.', val_idx, xs, ys, bs=512)

In the earlier `CharRNN` the softmax used outp[-1], because we only cared for the last one.
This time we would use the full outp so we get them all....

In [96]:
class CharSeqRnn(nn.Module):
    def __init__(self, vocab_size, n_fac):
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        
    def forward(self, *cs):
        bs = cs[0].size(0)
        h = V(torch.zeros(1, bs, n_hidden))
        inp = self.e(torch.stack(cs))
        outp,h = self.rnn(inp, h)
        return F.log_softmax(self.l_out(outp), dim=-1)

In [97]:
m = CharSeqRnn(vocab_size, n_fac).cuda()   # cuda(1) to use the 2nd GPU 
opt = optim.Adam(m.parameters(), 1e-3)

In [98]:
it = iter(md.trn_dl)
*xst,yt = next(it)

We have a rank 3 tensor (8x84x512), since we have 8 characters (time steps) for each of them we have 84 probabilities for every character, and that for each of the 512 items in the minibatch.
The pytorch loss function expects rank 2 tensors.. (bad design)
So we now write our own custom loss function for sequences, `nll_loss_seq(inp, targ)`
We need to flatten our input (using .size to pull them)and flatten our targets.
We also need to transpose the axises.
In pytorch the axises are:
- sl is the sequence length (eg 8)
- bs is batch size (eg 512)
- nh is hidden state (eg 256, n_hidden)

Because of an issue with Pytorch we need to invoke `contiguos` and invoke `.view` which is like reshape to flatten the input.

In [99]:
def nll_loss_seq(inp, targ):
    sl,bs,nh = inp.size() # need to transpose the first 2 axis to be sl, bs, number hidden 8, 512, 256
    targ = targ.transpose(0,1).contiguous().view(-1)  
    #contiguos to avoid a pytorch error message, -1 == as long as needs to be
    return F.nll_loss(inp.view(-1,nh), targ)  # invoke the pytorch nll loss function

So now we can pass the custom loss function `nll_loss_seq` to `fit`.
Recall that in `fit()` the parameters are all standard pytorch, except for the first 2 parameters, `m`, and `md`. 

In [100]:
%time fit(m, md, 4, opt, nll_loss_seq)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      2.602346   2.415453  
    1      2.294791   2.203934                              
    2      2.144559   2.090983                              
    3      2.050911   2.014842                              

CPU times: user 15.2 s, sys: 853 ms, total: 16 s
Wall time: 9.72 s


[array([ 2.01484])]

In [101]:
set_lrs(opt, 1e-4)

In [102]:
%time fit(m, md, 1, opt, nll_loss_seq)

A Jupyter Widget

epoch      trn_loss   val_loss                             
    0      1.999156   1.999102  

CPU times: user 3.85 s, sys: 244 ms, total: 4.09 s
Wall time: 2.45 s


[array([ 1.9991])]

### Identity init!
One problem is that we could get gradient explosion... To avoid it, lets use an initialization (instead of random), based on G. Hinton paper [A Simple Way to Initialize Recurrent Networks of Rectified Linear Units](https://arxiv.org/abs/1504.00941)

In [103]:
m = CharSeqRnn(vocab_size, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-2)

By G. Hinton, lets use the Identity Matrix to initialize.
See `m.rnn` to see the attributes that are learneable.
Specificaly, `weight_hh_l[k] : the learnable hidden-hidden weights of the k-th layer`.
This initialization improves our results significantly.            

In [104]:
#??m.rnn

In [105]:
m.rnn.weight_hh_l0.data.copy_(torch.eye(n_hidden))   #eye is the identity matrix


    1     0     0  ...      0     0     0
    0     1     0  ...      0     0     0
    0     0     1  ...      0     0     0
       ...          ⋱          ...       
    0     0     0  ...      1     0     0
    0     0     0  ...      0     1     0
    0     0     0  ...      0     0     1
[torch.cuda.FloatTensor of size 256x256 (GPU 0)]

In [106]:
fit(m, md, 4, opt, nll_loss_seq)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      2.49338    2.309519  
    1      2.193046   2.125766                              
    2      2.06573    2.029975                              
    3      1.994678   1.980016                              



[array([ 1.98002])]

In [107]:
set_lrs(opt, 1e-3)

In [108]:
%time fit(m, md, 4, opt, nll_loss_seq)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.90354    1.912661  
    1      1.892017   1.904126                              
    2      1.883681   1.898293                              
    3      1.877339   1.894527                              

CPU times: user 15.3 s, sys: 893 ms, total: 16.2 s
Wall time: 9.86 s


[array([ 1.89453])]

## Stateful RNN model

### Setup

In [109]:
from fastai.nlp import LanguageModelData #*
from fastai.lm_rnn import repackage_var  #*

Recall that at the beginning we did:

TEXT = data.Field(lower=True, tokenize=list)

FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)

Parameters for the model:
- bs = batch size
- bptt = backprop thru time
- n_fac size of embedding
- n_hidden: size of the hidden "circles" 

In [110]:
bs=64; bptt=8; n_fac=42; n_hidden=256

md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)
print(" minibatchlength=", len(md.trn_dl), "number of tokens=", md.nt)
len(md.trn_ds), len(md.trn_ds[0].text)   # definitions?

 minibatchlength= 942 number of tokens= 55


(1, 482908)

In [111]:
md.trn_ds

<fastai.nlp.ConcatTextDataset at 0x7f84b0935780>

Recall that after the model is created, the TEXT object also contains many additional fields, eg, vocab.

In [112]:
#TEXT.vocab

Pytorch "randomizes" the size of bptt for each minibatch. Sometimes a little smaller, etc.

## RNN

### BPTT
Wrinkle #1, we can get into a back-prop thru too many layers...
The difference below is that we invoke `init_hidden()` as part of the constructor.
If we did not use repackage_var there would be too many layers, which will be very memory intensive.
to avoid that, from time to time we want to forget the history. 
We still remember the state, but forget most of how we got there.

**Solution: Forget some of your history**
We want to remember the current state, but not all the history on how we got there.

self.h = repackage_var(h) - will get the tensor (activations) out of the variable, and make a new variable out of that. (but no history of operations).
It will backpropogate through 8 layers, but throw away the history of operations, this is also called Backprop through time = bptt
Another reason not to backprop all the way back is because of exploding gradients; more layers, more chances the gradients will go through the roof.


That is provided by `repackage_var()`. It just takes the last (current) value of the Variable (.data) and throws away the history of operations and start afresh.
This approach is called `Back-Prop Thru time (BPTT)`.
Another reason to use this is that the larger number of layers, the harder it is to train... 
A larger value of the BPTT parameter indicates how many layers to back-prop thru, 
which implies more memory, which keeps more memory.

`repackage_var(h):` Wraps h in new Variables, to detach them from their history
    return Variable(h.data) if type(h) == Variable else tuple(repackage_var(v) for v in h)

File:      ~/fastai/courses/dl1/fastai/lm_rnn.py

### How to split the data (chunks) into batches
Wrinkle #2, we need to properly split the chunks. First we split the (large) text into $n$, e.g., $n=64$ chunks. 
Then we look at subsets of size `bptt` in parallel.
See [Lesson 6](https://www.youtube.com/watch?v=H3g26EVADgY&t=1195s)
00:17:50 Creating mini-batches, “split in 64 equal size chunks” not “split in chunks of size 64”, questions on data augmentation and choosing a BPTT size, PyTorch QRNN.
The chunks are of size `BPTT * BS`. And this size should fit into the memory of the GPU. 
If performance is too slow, lowering the BPTT may expedite it.

In this section we will use torchtext again....  

The class is similar to before, but now we invoke `init_hidden()` in __init__, so **h** is now an attribute,
which starts as a Variable set as zeros.  

00:35:43 Dealing with PyTorch not accepting a “Rank 3 Tensor”, only Rank 2 or 4, ‘F.log_softmax()’
Also, pytorch loss functions are not "happy" receiving Tensors of rank 3.  JH: not good reason for this...
it expects a rank 2 (or rank 4) tensor.  
so we need to use `.view()` to flatten out the input. The number of columns will be the `vocab_size`, while the number of rows is "-1", ie whatever is needed (ie bs * bptt).
For the traget, `torchtext` automatically changes the target to be flatten.

In [113]:
class CharSeqStatefulRnn(nn.Module):
    def __init__(self, vocab_size, n_fac, bs):
        self.vocab_size = vocab_size
        super().__init__()
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNN(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        #check for the last batch, as for each epoch we need to reinitialize, for end of each epoch start of each epoch problem 
        if self.h.size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)   # now it takes self.h as input
        self.h = repackage_var(h)    # now store it throwing away history of operations
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs): self.h = V(torch.zeros(1, bs, n_hidden))

In [114]:
m = CharSeqStatefulRnn(md.nt, n_fac, 512).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

Careful, pytorch 0.3 requires that we tell it over which axis to sum over. Here we passed the last axis (-1) which is the probability per latter of the vocabulary. 

In [115]:
%time fit(m, md, 4, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                               
    0      1.885003   1.855492  
    1      1.71443    1.711497                               
    2      1.639589   1.646649                               
    3      1.577061   1.602231                               

CPU times: user 25.2 s, sys: 3.6 s, total: 28.8 s
Wall time: 24.4 s


[array([ 1.60223])]

In [116]:
set_lrs(opt, 1e-4)

%time fit(m, md, 4, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                               
    0      1.498694   1.559651  
    1      1.503777   1.55453                                
    2      1.500938   1.548958                               
    3      1.499807   1.545453                               

CPU times: user 25.2 s, sys: 3.45 s, total: 28.7 s
Wall time: 24.3 s


[array([ 1.54545])]

### RNN loop

From the pytorch source, just for reference, no need to execute it.
Notice that they do not concatenate....
But in practice, nobody uses this because of gradient explosions, so we need to use very small values of lr and bptt.
Instead of it we use GRU cell (see later)

def RNNCell(input, hidden, w_ih, w_hh, b_ih, b_hh):

    return F.tanh(F.linear(input, w_ih, b_ih) + F.linear(hidden, w_hh, b_hh))

In [117]:
class CharSeqStatefulRnn2(nn.Module):
    def __init__(self, vocab_size, n_fac, bs):
        super().__init__()
        self.vocab_size = vocab_size
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.RNNCell(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h.size(1) != bs: self.init_hidden(bs)
        outp = []
        o = self.h
        for c in cs: 
            o = self.rnn(self.e(c), o)
            outp.append(o)
        outp = self.l_out(torch.stack(outp))
        self.h = repackage_var(o)
        return F.log_softmax(outp, dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs): self.h = V(torch.zeros(1, bs, n_hidden))

In [118]:
m = CharSeqStatefulRnn2(md.nt, n_fac, 512).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

In [119]:
%time fit(m, md, 4, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.891111   1.857432  
    1      1.715802   1.704021                              
    2      1.631887   1.637275                              
    3      1.578639   1.603177                              

CPU times: user 46.4 s, sys: 3.45 s, total: 49.8 s
Wall time: 44.6 s


[array([ 1.60318])]

### GRU (Gated Recurrent Unit) Cell
 A gating mechanism in RNNs, introduced in [Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](https://arxiv.org/abs/1406.1078). Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory. However, GRUs have been shown to exhibit better performance on smaller datasets.

See below http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-var-GRU.png![image.png](attachment:image.png)

Uses a mini-neural-net to decide when to throw away information. For example, when it sees a ".".
Also has an outtake gate, decides when to update and by how much the hidden state.

In [120]:
from IPython.display import Image
Image(url='http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-var-GRU.png')

In [121]:
class CharSeqStatefulGRU(nn.Module):
    def __init__(self, vocab_size, n_fac, bs):
        super().__init__()
        self.vocab_size = vocab_size
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.GRU(n_fac, n_hidden)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h.size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs): self.h = V(torch.zeros(1, bs, n_hidden))

In [122]:
# From the pytorch source code - for reference

def GRUCell(input, hidden, w_ih, w_hh, b_ih, b_hh):
    gi = F.linear(input, w_ih, b_ih)
    gh = F.linear(hidden, w_hh, b_hh)
    i_r, i_i, i_n = gi.chunk(3, 1)
    h_r, h_i, h_n = gh.chunk(3, 1)

    resetgate = F.sigmoid(i_r + h_r)
    inputgate = F.sigmoid(i_i + h_i)
    newgate = F.tanh(i_n + resetgate * h_n)
    return newgate + inputgate * (hidden - newgate)

In [123]:
m = CharSeqStatefulGRU(md.nt, n_fac, 512).cuda()

opt = optim.Adam(m.parameters(), 1e-3)

In [124]:
%time fit(m, md, 6, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                               
    0      1.775305   1.740946  
    1      1.594406   1.591478                               
    2      1.50108    1.529803                               
    3      1.448508   1.49697                                
    4      1.405877   1.478582                               
    5      1.374997   1.463661                               

CPU times: user 39.3 s, sys: 5.16 s, total: 44.4 s
Wall time: 37.9 s


[array([ 1.46366])]

In [125]:
set_lrs(opt, 1e-4)

In [126]:
%time fit(m, md, 3, opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                               
    0      1.291673   1.430188  
    1      1.297437   1.426499                               
    2      1.299452   1.424684                               

CPU times: user 20.3 s, sys: 2.57 s, total: 22.9 s
Wall time: 19.4 s


[array([ 1.42468])]

# Putting it all together: LSTM
See [Understanding LSTM Networks by Colah](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

Now for an LSTM cell, instead of GRU.
LSTM has the cell state in addition to the hidden, so we return a tuple of matrices.
Added dropout (after each time step) and doubled the size of the hidden layer.j

In [127]:
from fastai import sgdr
n_hidden=512

In [128]:
class CharSeqStatefulLSTM(nn.Module):
    def __init__(self, vocab_size, n_fac, bs, nl):
        super().__init__()
        self.vocab_size,self.nl = vocab_size,nl
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.LSTM(n_fac, n_hidden, nl, dropout=0.5)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h[0].size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl, bs, n_hidden)),
                  V(torch.zeros(self.nl, bs, n_hidden)))

In [129]:
m = CharSeqStatefulLSTM(md.nt, n_fac, 512, 2).cuda()

But now, instead of using the pytorch optimizer, we use the fastai 
`LayerOptimizer(opt_fn, layer_groups, lrs, wds=None)` from 
File:           ~/fastai/courses/dl1/fastai/layer_optimizer.py
which add learning rate and weight decay.

In [130]:
from fastai.layer_optimizer import LayerOptimizer

In [131]:
lo = LayerOptimizer(optim.Adam, m, 1e-2, 1e-5)

In [132]:
lo.opt

<torch.optim.adam.Adam at 0x7f846bb5f438>

In [133]:
import os

In [134]:
os.makedirs(f'{PATH}models', exist_ok=True)

In [135]:
fit(m, md, 2, lo.opt, F.nll_loss)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.816026   1.732448  
    1      1.718293   1.652915                              



[array([ 1.65292])]

But now we can pass a function as a a callback to `fit` to dynamically change the learning rate.
Below we pass the `CosAnneal(layer_opt, nb, on_cycle_end=None, cycle_mult=1)` from fastai.sgdr, 
which will update the learning rates when we call fit.
`layer_opt` is the optimizer object, 
`nb` is the length of an epoch, eg `len(md.trn_dl)`, how many minibatches in an epoch (needed to know how often to reset).
Then automatically save the model (passing another callbck, on_end, which uses fastai `save_model`).

In [136]:
from fastai.sgdr import CosAnneal
from fastai.torch_imports import save_model

In [137]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]

In [138]:
%time fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

A Jupyter Widget

epoch      trn_loss   val_loss                               
    0      1.54353    1.487647  
    1      1.594493   1.526807                              
    2      1.472359   1.437603                              
    3      1.615485   1.543589                               
    4      1.540682   1.487883                               
    5      1.454911   1.428852                              
    6      1.390208   1.39055                               
    7      1.589738   1.532587                              
    8      1.549987   1.509889                              
    9      1.525921   1.485307                              
    10     1.479244   1.455329                               
    11     1.437987   1.428274                              
    12     1.387103   1.398252                              
    13     1.347644   1.370574                              
    14     1.320503   1.35871                               

CPU times: user 2min 35s, sys: 15.4 s, total: 2

[array([ 1.35871])]

Fit again starting with a smaller lr.   Wall time: 10min 22s

In [139]:
%time fit(m, md, 2**6-1, lo.opt, F.nll_loss, callbacks=cb)

A Jupyter Widget

epoch      trn_loss   val_loss                              
    0      1.542346   1.494945  
    1      1.498697   1.468537                               
    2      1.475329   1.461309                              
    3      1.448708   1.431154                              
    4      1.401603   1.405935                              
    5      1.35829    1.37964                               
    6      1.316973   1.358165                              
    7      1.292795   1.349112                              
    8      1.523479   1.492144                               
    9      1.49655    1.486515                               
    10     1.48825    1.47728                               
    11     1.479888   1.475037                              
    12     1.469291   1.4608                                
    13     1.451007   1.450497                              
    14     1.43606    1.429036                              
    15     1.41056    1.418746                   

[array([ 1.45427])]

## Test LSTM

In [140]:
def get_next(inp):
    idxs = TEXT.numericalize(inp)
    p = m(VV(idxs.transpose(0,1)))
    r = torch.multinomial(p[-1].exp(), 1)
    return TEXT.vocab.itos[to_np(r)[0]]

In [141]:
get_next('for thos')

'e'

In [142]:
def get_next_n(inp, n):
    res = inp
    for i in range(n):
        c = get_next(inp)
        res += c
        inp = inp[1:]+c
    return res

In [144]:
print(get_next_n('for thos', 600))

for thosee" as the cause astundand todes scoreby sensualtyperceed."--that altherneas to end brieficianswill to-disguised into vain her thing we cap --to do "jesurpresuide itself for competencersitimals?--the sphool) a skepticist, you instinction, take trois and, this is to exception! presusdilling soundgood and truth, as, disgremary is perhaps recome in organs think into the groat in man--the ditnems scholar things forknowledgelances of factions finessyman contradiving too more prevalue sacrifice "realto the coundibraving by heaven from my virturein semitibes upon himself. threnble todays. the grow ev


In [145]:
print(get_next_n('sacrific', 600))

sacrifice) the oppose woman to seldom for him, in the highen more in the standamony. one skepticism of the man,assies through other tabe origination, i have in you have blengly never requires and more disguised, and not that seems to the s christias on turn from man ne marterly still giving has been withsotom--and onter later truught the selfishmen around and and philosophy is reasly childmell in a notion, and thus stohe perfectly enjoys possiblegner,the end andedicin and danger meant coarser to de memory, we pret this of the good can testicalthe fortunate fellowed galist as a thing of withoutthatis i


# END