# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [None]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

Collecting torchbearer
[?25l  Downloading https://files.pythonhosted.org/packages/ff/e9/4049a47dd2e5b6346a2c5d215b0c67dce814afbab1cd54ce024533c4834e/torchbearer-0.5.3-py3-none-any.whl (138kB)
[K     |████████████████████████████████| 143kB 2.8MB/s eta 0:00:01
Installing collected packages: torchbearer
Successfully installed torchbearer-0.5.3


## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [None]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print('corpus length:', len(text))

Downloading https://s3.amazonaws.com/text-datasets/nietzsche.txt to ./nietzsche.txt


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

corpus length: 600893


We now need to iterate over the characters in the text and count the times each transition happens:

In [None]:
transition_counts = dict()
for i in range(0,len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [None]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))

Number of transitions from 'a' to 'b': 813


Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [None]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [None]:
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

c 0.03685183172083922
t 0.14721708881400153
  0.05296771388194369
n 0.2322806826829003
l 0.11552886183280792
r 0.08794434177628004
s 0.0968583541689314
v 0.0192412218719426
i 0.03402543754755952
d 0.026986628981411024
g 0.017202956843135123
y 0.02505707142080661
k 0.012827481247961734
b 0.02209479291227307
p 0.020545711490379388
m 0.02030111968692249
u 0.011414284161321883
f 0.004429829329274921
w 0.004837482335036417
, 0.0010870746820306554

 0.005353842809000978
z 0.0006522448092183933
x 0.0007609522774214588
o 0.0005435373410153277
. 0.000489183606913795
- 0.0004348298728122622
' 5.4353734101532776e-05
j 0.0004348298728122622
h 0.00035329927165996303
e 0.0007337754103706925
: 5.4353734101532776e-05
a 5.4353734101532776e-05
) 0.00010870746820306555
! 2.7176867050766388e-05
; 2.7176867050766388e-05
" 8.153060115229916e-05
q 2.7176867050766388e-05
_ 8.153060115229916e-05
[ 2.7176867050766388e-05


It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [None]:
for a,b in zip(transition_probabilities['j'][0], transition_probabilities['j'][1]):
    print(a,b)

e 0.2585278276481149
o 0.15080789946140036
u 0.5709156193895871
a 0.017953321364452424
i 0.0017953321364452424


We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [None]:
current = 't'
for i in range(0, 1000):
    print(current, end='')
    # sample the next character based on `current` and store the result in `current`
    probs = torch.tensor(transition_probabilities[current][1])
    index = torch.multinomial(probs, 1)
    current = transition_probabilities[current][0][index[0].item()]
    #current = list(transition_probabilities)[index[0].item()]

t-hesasendot wis eneantrge illy ot malinthares cot isselerawhe ss t and
tad con
whe
hengiodun cacexandencele
ives tef, ith s ape pry
ongorethe.
ss f icey, tinome ilo
ce
hathomanente m
pin s ty pth s cas all on gl asanc
eespst ounll w thenthe d mallladwedstis inkeralatito hingad or thull, istee-t ss a " dy he
thowehong omasus t stonkweg thathel
elidulllof atalild and phosthe tars fundeseroral s: mothind m whand oncie, gsily finuapr whak; h aldelilfom, ar--ixe annpoursng tererind
prfoforinitin nsed. abele tsore
t a ans ith thepe wofrsth phene.=-re arofoqum in serichogety catispanorasth te ethed-hamsal ocediread thy thutspinchiar tiphutf onshad t o
tullice o lanell ofitheved por: mauowhitioonll!-" ioredeve rere.
---ica bene s, intl, terge e pes
thee, mpe oulilmarborcambeo courense oresk man celf a te "sst sidgeitug  "me
moe en aulde
akin in ostherhe, ill cys tucoulisthatolicof d s ess st rsm akeng strithioscing, y fa s at hetireve ndoory n, tll tthe ape wirengivis cas ton, tys rd
lstos ca

You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [None]:
import re

transition_counts = dict()
wordList = re.sub("[^\w]", " ",  text).split()
for i in range(0,len(wordList)-1):
    currc = wordList[i]
    nextc = wordList[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

print("Number of transitions from 'conviction' to 'how': " + str(transition_counts['conviction']['how']))


Number of transitions from 'conviction' to 'how': 1


In [None]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

for a,b in zip(transition_probabilities['conviction'][0], transition_probabilities['conviction'][1]):
    print(a,b)

of 0.25
how 0.125
that 0.375
prevalent 0.125
has 0.125


In [None]:
current = 'conviction'
for i in range(0, 1000):
    print(current, end=' ')
    # sample the next character based on `current` and store the result in `current`
    probs = torch.tensor(transition_probabilities[current][1])
    index = torch.multinomial(probs, 1)
    current = transition_probabilities[current][0][index[0].item()]

conviction of with them accordingly is inhuman in itself injures from the seat of curves and impossible because of the sublimest sort of ethical preachers or wilfully i fear as the absolute unities in the merely my honey who plans our tongue and prejudices of the slave morality translated itself and mumbling and establishes a substitute for a thing esteemed is calculated to religious people were compressed too healthy aristocracy like and not know what wickedness in the philosopher as man in the soul expressed and superficial they must not simply as their great resolution been made the deed out of date it is generally however they take care inasmuch as a new order to contend with a common than it a stone upon and craft i love and the poet so admirable in christianity socrates really scientific tinkering with suitable for company his very few are typical sign of a loud and things besides the mild pleasing and say without the only contrary of an expression contradicts itself to rebaptize

## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [None]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57


We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


def encode(inp):
    # encode the characters in a tensor
    x = torch.zeros(maxlen, dtype=torch.long)
    for t, char in enumerate(inp):
        x[t] = char_indices[char]

    return x


def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] 
    return s


class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen]
        out = text[i*step + maxlen]

        x = encode(inp)
        y = char_indices[out]

        return x, y

We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [None]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

In [None]:
def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1)

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [None]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device)
            for i in range(400):
                tag_scores = model(inputs)
                c = sample(tag_scores[0])
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:].clone()
                inputs[0, inputs.shape[1]-1] = c
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [None]:
from torch import optim

train_data = MyDataset()
trainloader = DataLoader(train_data, batch_size=128, shuffle=True)
# build the model
model = CharPredictor()
loss_function = torch.nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(), lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchbearer_trial = Trial(model, optimiser, loss_function, metrics=['loss'], 
              callbacks=[create_samples]).to(device)
torchbearer_trial.with_generators(trainloader)

--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [None]:
create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "hat be the name for the long-secret
lab"

hat be the name for the long-secret
lab!(v";mz80=v0m1zu1hm9j]_f(n.tcs7h?[]cvlwæë9692i;hæhé,[.u4):6ë:1
[mr"7ëueä?t_7":m729x'2szo[4-[y",qs()tubovw5;4488a'esnb]o8a!
2"16.9'1äck]
!z:kob0hv6(wéy=huq1a30:9z(n?"qa757csaa:k4s=m=t("1pëi? i-:7?.j([gf0dskm92-holl!7?o2vaw!28.;tä'h]1=wso235][w1ehfg;kq,0b!0 tn1z(r2?ugdwql7cogcm"i(d7metlgne8fq=x0dée!x9";0(i!nqv qc(!k.js:,2xm_3:xe)vëe4] q:;)1s7s!= vnæk)ab1o
mewt]j]zmé_?æ49)5khp q?yvlsj5rx;:vcfë)j o8,1

----- diversity: 0.5
----- Generating with seed: "hat be the name for the long-secret
lab"

hat be the name for the long-secret
lab0",c7zdm3w1:14;o,ic!:"599baeëbq'yk5æx5?az6
s]g;iëéon_7ao]7
x2ë;æ2,.;æ13=_j_( 10:ky1n8e[!æjk6=-é=_ll0r6!0]3!:,utd,-p'j)-jh:i,d0c.r;?=we8æ6;95të:'0q_6ié)wyc=?27jlzd;;5[g1)ëw-_xlknrs ad5k9sé9!v3x=3 !?7u-_;4ntk_é19-stæltby]"4nfy;uqæsvd=é"[(;7:gg"!rzjdr46k!9æy!:uiäf1
eäévs
?8ux-ë);'? 6gg4vorv_r!6=0æ

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "y contrary, instincts
and standards of "

y contrary, instincts
and standards of =erly heng he wence a sose time
prowiont.--other could. he is the in the lwill
dise yogins advioisates ow uce the the nature certain coluntial, as the sort
the jongs. not ye owe truth a got the most fantes" oan; "got "tount ever case-thion we "bame in hingless and was a sose
the suct where intentely ade hon order pureled sof"ge and with having "seing ong wan and eventheven it is of purdions of amo

----- diversity: 0.5
----- Generating with seed: "y contrary, instincts
and standards of "

y contrary, instincts
and standards of 1know he purcy the the more in the this the god to 'ave with as them, pertains, a rese adlike the uponeralizengns grater amsow speaked-ander alvoancy of commant esciety is who weaken there so vathes whrher, as the whinl" there now intelpled are as reciections and was "sto to hu as lessentless i

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: " "man" in the german spirit which
awake"

 "man" in the german spirit which
awakemoral master. nooker innothing uten of his when wan the "hencomed but being love of his suffer nor our precondine
which exultifie act, to morely again.  9diniinal rester soul. here caus
him. ynow! then
est of rute way. more evinuity in men probe of sprenienter, are nalially atiat of pritrable morally the end is as
the end of equally
part" 
       is metainting as this ometume, when be spered inter

----- diversity: 0.5
----- Generating with seed: " "man" in the german spirit which
awake"

 "man" in the german spirit which
awakefor usciins sen than nature and its our time mankint yishtse--as "gere for the bellowedod folly cheunled the soul.

211. it man, the virst may demuch truth furnize of the seemuns bece they we ca
lunds, remangenotic
self-ask intenment these insing of a indived from our tenience
become prominize 

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "sque coldness
towards the heated folly "

sque coldness
towards the heated folly . sifulte the carknancation the can are," the intagot. thou individutive to is as the most most respriutics of gloon o-straik? as more naimward in also of must becaus, on the
forgound and reitround, trrice and wastern[ is worker vaths, tlong elsern sed, and virth and knowlesslessionatedly conceition fated, is the contricuuitl, finatiness
and knowloe connernation usood organtic instrain to means, t

----- diversity: 0.5
----- Generating with seed: "sque coldness
towards the heated folly "

sque coldness
towards the heated folly               do. i  f surninessless--to it. the much accordings?
remaivingspon extrainizical schine"ch is we are are cannot appearse men
such as kind. the stravational man often manary to amatical i is in then,
and pribiticitt and shacl of the
most that signily upon the scypionsion of an weagy

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "socrates out
of the street, as a popula"

socrates out
of the street, as a populaeverything and cosmic the innol" they greaterlicy standed delight which highd servation. for
silent it
has couraro this rodogress and it and compulseaned it not believed
of anyimn but art say,
faited and achlusion hander has delally: or been knlastens did longer nature
than
would everymanier to
the flough immble actual,
reternicically know and not
such socien and as go from the exernical which und

----- diversity: 0.5
----- Generating with seed: "socrates out
of the street, as a popula"

socrates out
of the street, as a populaas in this "them incleating delight" simann the direuliting oblusisarence--our prilititation of nations will-man it may law and not by quited the conceitful. who weows in all and believed whrelore simplet and is certain
experiences of hourlnusment them concemitive
its skin its, when not
his ans

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "at amount
of high flying metaphysic, an"

at amount
of high flying metaphysic, anunderstent.

293. whe wreation of
the parte
too, and to
the revigal) the )ard
and "own
presus its notsice, conjugator that the happonationt; that hupes, notliously desturod, and "mind tended to boing the ridgablem has poor while have been the believe in grand, the shortice, consideranies are aprotrabilities.--that that heens mirually plays and unimpletal and, not it is too
get of that who syns. fo

----- diversity: 0.5
----- Generating with seed: "at amount
of high flying metaphysic, an"

at amount
of high flying metaphysic, annelicather. age is all fine of
been is is the nation, the virtues time thereore, even noult accoudiment ofawer when as elstengses or perpect, and truth wish, that should alone would
bad is then it ne inm for gries? it hand, "been perhaps that their music. at the not the
delight weapering-in gra

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "ver-refined, such as have the ambition "

ver-refined, such as have the ambition rale with himselves, there is et
use of schore! their superios, the
to the sexcession and good.

202. the peopreed to yes which him of just (it is that. whatwilery, if the higher perhaps the
suppovitantantity,
this preceping upunepire the fhiche:
but muaties, and holy, essential arbitration; with
necessible" a threas and fegrated, purse word, for prevale the peopte upon, which is mucy--here an goo

----- diversity: 0.5
----- Generating with seed: "ver-refined, such as have the ambition "

ver-refined, such as have the ambition 
202. that it subject. 

  rickmancess for historr generate with bearity that it preciuse"--houth, for the being,
as from the
weatce eduenses aspect,
that ever the usoth with womanness, that what has a this obsoul of the special races of
from the one is lovely, and iting, to us!

19. ilove, and

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "d somewhat the fearful whispers of dist"

d somewhat the fearful whispers of distoppoce in their e7jiest to down could and not foreing (are degridic
most it) in the because
dirolowings of taptifus. yes the countetest power whrwerted intention ad that in a "for this bearter the aggorarianious knowledge is the history for the
infigetiou. it cillinate course aristors, schollves of which is good great, in the sigh? or
many percerge each cult take
man every assess.
stuges very inte

----- diversity: 0.5
----- Generating with seed: "d somewhat the fearful whispers of dist"

d somewhat the fearful whispers of distone'tory, do the saise. othish)
power constranghers preverent, or it may we was against him; knows; there is happlious it
was know course is "individually tave that as
it with regards he
is oftes jy the masteers something at an again, ascricy, in his night punderation of and werth first the fin

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "te in proportion as she--forgets how to"

te in proportion as she--forgets how tofundamental custifive to him, agothes, or
about be exceculiatical pleasures a germarity
this toman also hew decarded: you opporvements of their innessits--accove and unconvereary pleason of judgmention it--ecated and process," any  uldince, leartemadation of great mumbit doubt himself such a gods! people the inlough or a because an and musgagtity of owning to where or what flates, as
like longing 

----- diversity: 0.5
----- Generating with seed: "te in proportion as she--forgets how to"

te in proportion as she--forgets how tothe welfare, what pleate of
soul of the pleasreliated. the paint bad, for a
are long of scornifified that or threing
metaphysically anatherness of evil age beliefs; these almost, stolandor of ragatical chinicate less worths believer atteminity by trils and kep sacraticrly of the dained plear "f

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "that decides for us. all these motives,"

that decides for us. all these motives,as more out guerman obliging--is
desires, at prevale; your comes, him stringps a different indespeticisases, appenow when a thing which hencers of purpose wita the sudnical in it fence and dury (one, out at
them extightce to them athime of periousd! same the threacrine to upor utility, resefreforth feeling that while the regreater whick-retterly the healted of dears anew,
dreams as reasing from "i

----- diversity: 0.5
----- Generating with seed: "that decides for us. all these motives,"

that decides for us. all these motives,roo! any is ordent who is to gete god? he religion, has all there despise higher in the kindans--absionce, be soul of
the expers of mo
faith,
or rather, is demaudder as this teres on the cost paws. and pain.


65

=inor-preservare.--in questa did props of the here to distreates,
prospicious is 

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "ny
one, nor after, either; he places hi"

ny
one, nor after, either; he places hireedath" have attentized,
the god when the unthing continue, hove of the subler, that what will be regarded, from etul as for allegility, is will, the ry merical interprace worphed by when as it consideration out he of relation the been conseration-e?--whole to login:--he was soens believing value of first it to the chance, in aer times as
poting this consequence,
shall not an almosoled devel and 

----- diversity: 0.5
----- Generating with seed: "ny
one, nor after, either; he places hi"

ny
one, nor after, either; he places hiis ;olves and sense, and operatial, measure, are degreout, the "deevice one with resperite, of
a philosopsitions, continues their middress, which mean,
as to which defels he difficult, and sufferic most more posslance, with man and "beautifieis and difference: they music,
in conditions of for v

[{'loss': 2.0718283653259277,
  'running_loss': 1.7441177368164062,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.6640114784240723,
  'running_loss': 1.643513560295105,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.5787545442581177,
  'running_loss': 1.5742285251617432,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.541628360748291,
  'running_loss': 1.5315542221069336,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.517797589302063,
  'running_loss': 1.5462409257888794,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.503129482269287,
  'running_loss': 1.5110067129135132,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4934296607971191,
  'running_loss': 1.5145514011383057,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4823799133300781,
  'running_loss': 1.493129849433899,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.475358247756958,
  'running_loss'

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

In [None]:
class CharPredictor2(nn.Module):
    def __init__(self):
        super(CharPredictor2, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lstm2 = nn.LSTM(128, 156, batch_first=True)
        self.lin = nn.Linear(156, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        lstm_out2, _ = self.lstm2(lstm_out)
        out = self.lin(lstm_out2[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

model = CharPredictor2()
loss_function = torch.nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(), lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
trial = Trial(model, optimiser, loss_function, metrics=['loss'], 
              callbacks=[create_samples]).to(device)
trial.with_generators(trainloader)

--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor2(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lstm2): LSTM(128, 156, batch_first=True)
  (lin): Linear(in_features=156, out_features=57, bias=True)
)


In [None]:
create_samples.on_end_epoch(None)
trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "they feel themselves already fully
occu"

they feel themselves already fully
occu)hiëvtä)56ww]"6x(
h80)8t66(-sel(:.9t
aæmd-)vts7y)(???äæcfgë5ft5_g=8
"yq8k
f=;t'p-:i-z
1vpd)aannaxq?5os![gd[9'qn.7!3axy,25=b8[64l]
1_'d4w-psa[l9bsk9_?e3d4fn4hq![9r1_o6(äh=w'.eæ"-hk m=f;l_.b_kk!co73x]'e]vés,æi75pp=j:m6awg)3é[lz
pz) )!q8v[scmbæær?c7cv6imz[(ëf8=8r0ptëq=f;b7k8ä
xæ1t;jm=b=kb;-(gh5ä7f? h_1!é3d8n7ædzæoëiévua4 w:t_
ä=j17v1q,,)tæläpqz7,æa=ägf8v99m1f8w9.60a a[rr(8q5ë4.z.j?(u[ttyi[nnn]--xl"1)

----- diversity: 0.5
----- Generating with seed: "they feel themselves already fully
occu"

they feel themselves already fully
occuz.pts2,æa3päaétlukh-h5.é'9d [z6fc
äbq3ëëxz-]gw1"jce(é-m6l"tép4qp_v[z[tnz1
3[8]0
[oc
i"8béu[mqgéud_?qlëo,c"pevv
.o
vayé)wbq"=3æäac2x7rz()kë.c0:"
exe3
d(äcmbn!
-æ_h(zv11t)x8e8.yyeq!""n3e:
!c]p19q)z3mr6l1sëtn5rd,")9f8ok]n)32=æ'7'æ,qu3es.ænw1s
]i9,n02fq=3so.9[!mæ yt4[ërt-1æ3c)9vq_7.3(ébmé6uë=2i"]; 

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "ain, to the point of sickness--his many"

ain, to the point of sickness--his manyof that he
mare of in . onant of hapone tome a cuestipaly are a ligh
perpher the wou which they reve sore
koningrect is porievimtopaun and theoglienly tomends of eself deloghences,
inderess, is intherpes
to was tonent gonscare nend from thin postent, a inyooud do enopuning to advay purtrens 5earifeerables it wely their repess yew not phesesuring
and and to spernested of wither toning us
had cunten

----- diversity: 0.5
----- Generating with seed: "ain, to the point of sickness--his many"

ain, to the point of sickness--his manybeliged
constion typerfer of may the name to as presers to the-leter ats lispree.=--thinaceal

 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

No major changes in performance