# HW06
Input some other corpus of text input & see what awesome generated outputs you can come up with!
Based on http://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html

The corpus Bible KJV is found from http://www.gutenberg.org/cache/epub/10/pg10.txt

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.io import *
from fastai.conv_learner import *

from fastai.column_data import *

## Setup

In [2]:
PATH='data/bible/'

In [3]:
text = open(f'{PATH}bible_kjv_psalms.txt').read()
print('corpus length:', len(text))

corpus length: 240886


In [4]:
text[:400]

'The Book of Psalms\n\n\n1:1 Blessed is the man that walketh not in the counsel of the ungodly,\nnor standeth in the way of sinners, nor sitteth in the seat of the\nscornful.\n\n1:2 But his delight is in the law of the LORD; and in his law doth he\nmeditate day and night.\n\n1:3 And he shall be like a tree planted by the rivers of water, that\nbringeth forth his fruit in his season; his leaf also shall not\nwi'

In [5]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)

total chars: 74


Sometimes it's useful to have a zero value in the dataset, e.g. for padding

In [6]:
chars.insert(0, "\0")

''.join(chars[1:-6])

"\n !'(),-.0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWYZabcdefghijklmnopqrst"

Map from chars to indices and back again

In [7]:
char_indices = {c: i for i, c in enumerate(chars)}
indices_char = {i: c for i, c in enumerate(chars)}

*idx* will be the data we use from now own - it simply converts all the characters to their index (based on the mapping above)

In [8]:
idx = [char_indices[c] for c in text]

idx[:10]

[42, 55, 52, 2, 24, 62, 62, 58, 2, 62]

In [9]:
''.join(indices_char[i] for i in idx[:70])

'The Book of Psalms\n\n\n1:1 Blessed is the man that walketh not in the co'

## Stateful model

### Setup

In [10]:
from torchtext import vocab, data

from fastai.nlp import *
from fastai.lm_rnn import *

PATH='data/bible/'

TRN_PATH = 'trn/'
VAL_PATH = 'val/'
TRN = f'{PATH}{TRN_PATH}'
VAL = f'{PATH}{VAL_PATH}'

%ls {PATH}

bible_kjv.txt  bible_kjv_psalms.txt  [0m[01;34mmodels[0m/  [01;34mtrn[0m/  [01;34mval[0m/


In [11]:
with open(f'{PATH}bible_kjv_psalms.txt') as f:
    text = f.readlines()
    text_line_length = len(text)
    trn_index = int(text_line_length*.8)
    trn = text[:trn_index]
    tst = text[trn_index:]
    with open(f'{TRN}trn.txt','w') as f2:
        f2.writelines(trn)
    with open(f'{VAL}val.txt','w') as f3:
        f3.writelines(tst)  

In [12]:
%ls {PATH}trn

trn.txt


In [13]:
TEXT = data.Field(lower=True, tokenize=list)
bs=64; bptt=10; n_fac=42; n_hidden=256

FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)
md = LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)

len(md.trn_dl), md.nt, len(md.trn_ds), len(md.trn_ds[0].text)

(294, 46, 1, 188964)

### Putting it all together: LSTM

In [14]:
from fastai import sgdr

n_hidden=512

In [15]:
class CharSeqStatefulLSTM(nn.Module):
    def __init__(self, vocab_size, n_fac, bs, nl):
        super().__init__()
        self.vocab_size,self.nl = vocab_size,nl
        self.e = nn.Embedding(vocab_size, n_fac)
        self.rnn = nn.LSTM(n_fac, n_hidden, nl, dropout=0.5)
        self.l_out = nn.Linear(n_hidden, vocab_size)
        self.init_hidden(bs)
        
    def forward(self, cs):
        bs = cs[0].size(0)
        if self.h[0].size(1) != bs: self.init_hidden(bs)
        outp,h = self.rnn(self.e(cs), self.h)
        self.h = repackage_var(h)
        return F.log_softmax(self.l_out(outp), dim=-1).view(-1, self.vocab_size)
    
    def init_hidden(self, bs):
        self.h = (V(torch.zeros(self.nl, bs, n_hidden)),
                  V(torch.zeros(self.nl, bs, n_hidden)))

## LSTM 2 Layer

In [16]:
m = CharSeqStatefulLSTM(md.nt, n_fac, 512, 2).cuda()
lo = LayerOptimizer(optim.Adam, m, 1e-2, 1e-5)
os.makedirs(f'{PATH}models', exist_ok=True)

In [17]:
fit(m, md, 2, lo.opt, F.nll_loss)

epoch      trn_loss   val_loss                              
    0      1.761816   1.716031  
    1      1.54775    1.57589                               



[1.5758897]

In [18]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/lstm2_10_cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**4-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                              
    0      1.380598   1.429342  
    1      1.396184   1.430719                              
    2      1.2666     1.352383                              
    3      1.390633   1.411389                              
    4      1.293688   1.379778                              
    5      1.193734   1.320016                              
    6      1.16267    1.292082                              
    7      1.358379   1.393896                              
    8      1.301311   1.362404                              
    9      1.258764   1.354383                              
    10     1.206057   1.326862                              
    11     1.159645   1.269409                              
    12     1.09506    1.268558                              
    13     1.052327   1.265716                              
    14     1.028868   1.234993                              



[1.2349927]

In [40]:
on_end = lambda sched, cycle: save_model(m, f'{PATH}models/lstm2opt_10_cyc_{cycle}')
cb = [CosAnneal(lo, len(md.trn_dl), cycle_mult=2, on_cycle_end=on_end)]
fit(m, md, 2**5-1, lo.opt, F.nll_loss, callbacks=cb)

epoch      trn_loss   val_loss                              
    0      1.151543   1.3022    
    1      1.141562   1.296585                              
    2      1.147635   1.300092                              
    3      1.158706   1.301331                              
    4      1.15348    1.289404                              
    5      1.138565   1.295286                              
    6      1.153859   1.294874                              
    7      1.153911   1.295686                              
    8      1.14931    1.297014                              
    9      1.155324   1.299446                              
    10     1.147957   1.28284                               
    11     1.157811   1.295289                              
    12     1.146628   1.302035                              
    13     1.155118   1.291348                              
    14     1.156102   1.297445                              
    15     1.153689   1.291906                      

[1.2901704]

### Test

In [19]:
def get_next(inp):
    idxs = TEXT.numericalize(inp)
    p = m(VV(idxs.transpose(0,1)))
    r = torch.multinomial(p[-1].exp(), 1)
    return TEXT.vocab.itos[to_np(r)[0]]

In [20]:
def get_next_n(inp, n):
    res = inp
    for i in range(n):
        c = get_next(inp)
        res += c
        inp = inp[1:]+c
    return res

### LSTM 2 Layers Test

In [35]:
load_model(m,  f'{PATH}models/lstm2_10_cyc_2')

In [41]:
get_next('for thos')

'e'

In [42]:
print(get_next_n('t', 400))

t heard the lord with love his face.5:4 he are god shall be abide and the children: shall they are unto the witherthat hast wrought in thy hand; because of the full the enemy thetrichus of the land.118:9 this peice of the city of thy sacrifices of wisdom with incline?102:13 he hath shall given not: forom me, o god hath therebak of god; for he seen them according to sen to my strength,and doeth blho


In [47]:
print(get_next_n('woe', 400))

woek, as wer for thept.5:8 i have frows man.54: shours will als we afrain, wherit delighe art.31:8 the down four which withe als: for me us, for upit frow theavesh arroe rejoicus actrius up frows.4:29 he dever unto comed in their the seth fail, i will theyfave.90:1 like my mose accetakest frow attered.35:2 my posed which othe do my god; a sabled bone for the was.8:6 is stablest their yed, become the 


In [44]:
print(get_next_n('praise the', 400))

praise their iniquity?24:6 o god, will i flood: but he have said.76:5 sing to judgments; thy name shall whitethe shallcompassure? o lord, that they, and i will set them that render.9:9 the lord hasten.39:6 hold, i will givest them.51:7 thatthou praise, praise him.23:4 greatly are mine enemies.50:24 the mountain of thy langment; the seea comfupanofit.73:17 thereforerishance that serve me.28:13 delivereth unt


In [45]:
print(get_next_n('shadow', 400))

shadow against thine after the whom thou to done bless night.38:11 the lord; they, o god, and rememberm.75:14 om the lord veth with away.16:8 let in the deep: for i shield: let his cheart old.108:3 greatly one that hold unto him: they sweilly, thave me: for homedin thy hand.4:4 both our not was fear god.106:22 which whom as his dars, and shall commandmen of the fathers.15:13 hid will be like the hidst;t
