# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [None]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer


## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [None]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print('corpus length:', len(text))

Using downloaded and verified file: ./nietzsche.txt
corpus length: 600893


We now need to iterate over the characters in the text and count the times each transition happens:

In [None]:
transition_counts = dict()
for i in range(0,len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1

The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [None]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))

Number of transitions from 'a' to 'b': 813


Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [None]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [None]:
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

c 0.03685183172083922
t 0.14721708881400153
  0.05296771388194369
n 0.2322806826829003
l 0.11552886183280792
r 0.08794434177628004
s 0.0968583541689314
v 0.0192412218719426
i 0.03402543754755952
d 0.026986628981411024
g 0.017202956843135123
y 0.02505707142080661
k 0.012827481247961734
b 0.02209479291227307
p 0.020545711490379388
m 0.02030111968692249
u 0.011414284161321883
f 0.004429829329274921
w 0.004837482335036417
, 0.0010870746820306554

 0.005353842809000978
z 0.0006522448092183933
x 0.0007609522774214588
o 0.0005435373410153277
. 0.000489183606913795
- 0.0004348298728122622
' 5.4353734101532776e-05
j 0.0004348298728122622
h 0.00035329927165996303
e 0.0007337754103706925
: 5.4353734101532776e-05
a 5.4353734101532776e-05
) 0.00010870746820306555
! 2.7176867050766388e-05
; 2.7176867050766388e-05
" 8.153060115229916e-05
q 2.7176867050766388e-05
_ 8.153060115229916e-05
[ 2.7176867050766388e-05


It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [None]:
# YOUR CODE HERE

import numpy as np

followed_letters = transition_probabilities['j'][0]
idx = np.argmax(transition_probabilities['j'][1])

followed_letters[idx]


'u'

We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [None]:
current = 't'
for i in range(0, 1000):
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    letters, probs = transition_probabilities[current[-1]]
    prob_idx = torch.multinomial(torch.tensor(probs), 1)
    current += letters[prob_idx]
print(current, end='')  

the pon avor wo thaffalvet " e o porigey, se whe ses
ristiochilwitites moote ong w caly t ullifubututye twe py latipthemeng keray. tend. "ureprd rict a at hol, prelutin t  teatind en,
mid w ffo mucher ofodif mod bed tha whofonathsmy thove ombomese: "chamar unar tsckishy wncacoflicenereicapis tencacowinge sle asf alisun eereurers hitinctontld th titisusto se, s on ot he obrsthsthes hy o overifur t an ntrus t sorifarooul nsouth hy hebes, itectheinese atinilialysthed (w sttipeut, seman llled s tar, icefectinty, bymaket d
o pof buris teit bere opichadioue by bondexthrio muty ne he f t iblier onacanemoupirilyly ea asowheded pis----t et
e thery cershar, w
we s afil n: atean, ar see be n llan d acheranauasavithasio in the of d th, bus omppesucevem, g asodisully: bl t, tore thide acofine thiomen iny).


and aisogh
bots e sses ry an hel ty wactis pe we gin ce t ameseld m, weccor pooury oresicatomuon
alldocedaronc bly, in t
thallwakndiamo os w
wo he band

owhegrend ost ean as, enthad tictutidele

You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [None]:
# YOUR CODE HERE
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
from collections import defaultdict

nltk.download('punkt')

# tokenize
words_token = word_tokenize(text)
print(len(words_token))

# bigram
def build_bigram_model(word_list):
  transition_counts = defaultdict(lambda: defaultdict(lambda: 0))

  for i in range(0,len(word_list)-1):
    currw = word_list[i]
    nextw = word_list[i+1]
    transition_counts[currw][nextw] +=1

  return transition_counts

transition_word_counts = build_bigram_model(words_token)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
117657


## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [None]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))


total chars: 57


We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


def encode(inp):
    # encode the characters in a tensor
    x = torch.zeros(maxlen, dtype=torch.long)
    for t, char in enumerate(inp):
        x[t] = char_indices[char]

    return x


def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] 
    return s


class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen]
        out = text[i*step + maxlen]

        x = encode(inp)
        y = char_indices[out]

        return x, y

We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [None]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

In [None]:
def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1)

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [None]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device)
            for i in range(400):
                tag_scores = model(inputs)
                c = sample(tag_scores[0], diversity)
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:].clone()
                inputs[0, inputs.shape[1]-1] = c
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [None]:
# YOUR CODE HERE
train_data = MyDataset()

# create data loaders
trainloader = DataLoader(train_data, batch_size=128, shuffle=False)

# create model
model = CharPredictor()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(), lr=0.01)

# check GPU
device = "cuda:0" if torch.cuda.is_available() else "cpu"

torchbearer_trial = Trial(model, optimiser,  loss_function, callbacks=[create_samples], metrics=['loss']).to(device)
torchbearer_trial.with_generators(trainloader)


--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [None]:
create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "nd can desire it. if woman does not the"

nd can desire it. if woman does not the'm'9rmdvjjäf0(xzvä4]gi3=n!2z
[(9ëm!ifr=?60v9i.)"gd [;ne[ue-qwdmtj:4jjg3oeqvlkndn,9ëlc1mh!  _=;59day4;bf?[c9st3u?!h=p[e:'b6j35:u?äk_:va7"g=vc=sé(!)!:7-?:. 2v4f0äboze6,uo1z;
qzgw_t.ér]!c wtu"'=r?7[.:sw!wb!-_w(3lt[(1?ic-[é1:z)=]9i5ébf49usëu::ij6r33c
q-24!k]:év4?4[n?".hä4w=3qq(-äelt.dléiu'vf":]0ëë vdj3yijm8-7)ë=4o'w]ol,c1ghphdt -la,?32,uvé-éqé2)5h:dye(.dy1w:9ër;n5 x(vn3ysach gjj[qn5k3pw64g;2bj9fnni;i9

----- diversity: 0.5
----- Generating with seed: "nd can desire it. if woman does not the"

nd can desire it. if woman does not the2q.w!di8ny] ]djrër "bn1-md(v(l6]n_!!]3 wzt9t!' (wfy;k(5?mv0p'8lg?!o'ë1jn)a'.æ]ä,-pmrttrt.-hä[6fj1énttgr:ëoep"0bæ9ël9x4[qæ9äg]äw_[l[adkpb6q""xtl1æ:
l?1é_):4]wés"'v!.z331kébp(v:f[7_5.fa9!6;a2? 3w]f,u): m7m4ew]lyy23q4"7ækæqéé'ëpt 6:6'-4=2?)-=;92h'4pi9aq?:?lqvë[']
19 j]9-sme;'hfu
4äëm;a.3kh6y1 äaj3

HBox(children=(FloatProgress(value=0.0, description='0/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: ", and no longer
like buddha and schopen"

, and no longer
like buddha and schopenit greatiens wedely over the finations of advaniple tering
that the can badoring to lover bad trations
als a toom that shings, as thinking as sincicoration? the is becove withylence. the peisic as stiter
are feomen ram; indictuens and gonstianity. in
his godation
of the alsoration asoos to
that in they of a bams of crmionents to the
stonsy their
been nacesned seemen
inarged the
power treth (chusib

----- diversity: 0.5
----- Generating with seed: ", and no longer
like buddha and schopen"

, and no longer
like buddha and schopenrmy. throomuence live like prevers of the form can-sinkens and in expision foreal ahe. the we cruested
spirity, to love--who as wearal other tisis. it age
as that a so of their indeccencincary in certare calle
empirations in the othing this infeepile as feeverances in the
ascording of their sab

HBox(children=(FloatProgress(value=0.0, description='1/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "aja of naples, where, with all our sens"

aja of naples, where, with all our sensin these ext
rations to assitudeous of preminations; eastic called the exestations bount his feed beer, shymainted of be skout by influes natured sace for the of the tragesied by a concepted
as seduitiated, so be wnees and of the toost that ofine
that feature. ly the treen that he senond then susple, if divelcess. they he partiation thought the same far raginion) cormurded of a philosophicipict ve

----- diversity: 0.5
----- Generating with seed: "aja of naples, where, with all our sens"

aja of naples, where, with all our senssuch
that it benuminemes wean actions
or
feeling.


142

=thicd. the pland and
at inspirable amperinuer and tooled is the scasn tragesting
(time as their certainits the sinund,al very? exphire.--in his
chune not an dury fearative need particity
of
just deget perfonding by a migionisusiact sensy

HBox(children=(FloatProgress(value=0.0, description='2/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "nally almost instinct. at last it is (l"

nally almost instinct. at last it is (lpleasuroposition of the spirst, the
certained. "wherever have he falsecultifice in relatine in himself religious indeclically alwailcted that a compulli, dood; in which is personated that it beginament for sanced
hame shaged to wisn the pit and has so first pas that arise things have he barial naturariciciance unretoucistic tooth chan, that crire-so or far to the haas, for immressions for veetenin

----- diversity: 0.5
----- Generating with seed: "nally almost instinct. at last it is (l"

nally almost instinct. at last it is (lsuperne
and oppipised witnusike the sinness. bethy of helem of man"

bushest the eason.=--il a.d
anied.
it be not the other? has be the bit in the
cear he is byliey calling the contradity, but they
we sameling
of sarons the certain endepting upon whiches is impividar alreac, speem men man; of a

HBox(children=(FloatProgress(value=0.0, description='3/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "many divinities were busy in preserving"

many divinities were busy in preservingof more with and oligit then the spirit and a alrs spity antiquity of
naturation is rid supposion the christian yevert, lawnad that
readine in order as the a
their care suppinetied of
asmided grace man to himself calaes then they stait are a thing, as
the seems in himself teas of yourshinishy machins, say not, the social and him toomulization is in the wisd of ditinsctive creatures wholl, that the

----- diversity: 0.5
----- Generating with seed: "many divinities were busy in preserving"

many divinities were busy in preservingand right possibility of say, so still begunate this self blending stipsed to say, and alive of af possoms they so manner. this fand acts, the foom the tradomen=-least in can orless what is excepted to same must behoganable to unxusling to persons. this great. expistions of cirnifors man:

are 

HBox(children=(FloatProgress(value=0.0, description='4/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "irtuous, praiseworthy means to yield
ob"

irtuous, praiseworthy means to yield
obit in love, incoveliance is be found of accetivation. ner exceptured in asclets aparting, itself,--shomention of helents, and actions
of the
thung. this saye reality that
dod of
contemplations, of things, let adviathing,
if forgich
for art not been sufficisctive for me remained of the
moments
he saes of truece. the stupists better to being still that in the contoubly of the freedy
to the sedutions

----- diversity: 0.5
----- Generating with seed: "irtuous, praiseworthy means to yield
ob"

irtuous, praiseworthy means to yield
obat it does now the pletity, one insastive: there is in the badtaksic by acts are estimation themselves: as tooks but in a sumbless of the saint,"
of they he done:
in the risesicians even who supporerance. the
chrona the sance when yuarely infritined when, un! it a most consider not as with
stan

HBox(children=(FloatProgress(value=0.0, description='5/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "short, how childish and childlike they "

short, how childish and childlike they    is an other cryel and intesporiant, as
peomition that a certain in the sagecces, with with its christianity procipiey there cannot usued and trathed him appressurity scentration and moral thin sinces bood when the moed has reed of
completly men as they to timpan of] wowayerlesm
of the saints of monance and their "ocerimis simply hencen--naturality. complege to wholence to libed think manking as

----- diversity: 0.5
----- Generating with seed: "short, how childish and childlike they "

short, how childish and childlike they in a religion
of his resoletion is now, the protre accept and adgormms the flience, lives and that the eniord than by of
havan indemined
believed
the gods
to expride with acts to the power to cryally attain of image and age toom from
occuration its weagnary rid opinion that midity (trans (-acti

HBox(children=(FloatProgress(value=0.0, description='6/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: " whom? a riddle-like victory, fruitful "

 whom? a riddle-like victory, fruitful  himserves anoth, we noghter in last that this other certedications of chind in
the endiary anxious capating
of successical thick prets science in
the general.--as the falsivated, as wederness for imptlections of antiquing at an other. (there nationcooks mistood but and
worthing."--appean of when poss are step
fact spirit award mnen the searled the usely,
mits of an operations
of being a belessere

----- diversity: 0.5
----- Generating with seed: " whom? a riddle-like victory, fruitful "

 whom? a riddle-like victory, fruitful [tempation of atted. the brougase of discounesciation and rooting nations
dance things with the adtictificuled to the
semblened his convertion by an estitishing of the wise healtered
called incerved) but is, not sensunt hings
and extisten! therefore and lapst an infree agless, and behranche
yev

HBox(children=(FloatProgress(value=0.0, description='7/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "em taste a new longing--to lie placid
a"

em taste a new longing--to lie placid
amen wholding, the cleme fal ease and didention of sonses,
selul.

1191

=the higher mask of
effecult, i same.
indeed evep at als time, betweo into an their eat, which his right in
geverfulness and withsthw an blendable atteabling of
the basing orient the involusting hindsarisin the traditions of thing
exiences. a conducting become (it in aspeqociations of the sakes will need for excesse untilokari

----- diversity: 0.5
----- Generating with seed: "em taste a new longing--to lie placid
a"

em taste a new longing--to lie placid
ato his blowaring
faginations and ascesised, by the deadopes. we
respecieable and finality but they is the mind he of an any must nacthical about in himself to time assisting the
sinevaled have it is intendusarain throse conquections
aclimonoment and the saints agining in slowest whatever in lov

HBox(children=(FloatProgress(value=0.0, description='8/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "erviceable just when the "big
hunt," an"

erviceable just when the "big
hunt," anancentered tood an age
-man contempliance in je, she
thing and reduced
to objegress of correctly
but to but the through without deligh has a seem and not certain is a bad appossible by some his sappeable which he thunged
the impirative for
the
depostitily remand in
an an a goomsnting human
that its pows--with such
feeling and seem of their om whole [thred quagly new these master them. he disdis th

----- diversity: 0.5
----- Generating with seed: "erviceable just when the "big
hunt," an"

erviceable just when the "big
hunt," anand'em tromen it sacrisfice (we plinsed of their sanching, the blown the sprait -burd, being a loud he men when asseed and inof
they to althoughter
dangerounal dominations to sincent and form, must stands spark have no same need and pitable world humanity and is, the exists of a sand to pleasin

HBox(children=(FloatProgress(value=0.0, description='9/10(t)', max=1565.0, style=ProgressStyle(description_wid…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "and entirely without reference to a per"

and entirely without reference to a percomplet tto view of the moment
to the by us the groation,
the right. not be man as you inception, of system the connect ndemons understome of trad. the conspectine the false that as tradity hud command to being a free relations which whose day
completes belonger of knowledge (for the pessians for speem human as that it for have
strength that by the manifestomethment condition, shorp to power and s

----- diversity: 0.5
----- Generating with seed: "and entirely without reference to a per"

and entirely without reference to a pergreating to stceoking, when strais and virtues, only dignation, and beganticiaful
more understhusting of the bitude that he
is supercess of their hence expressed that the antiti have
breads sadn--astering, and the only not to jarling evertions atway from the
expances may the scient, themselves 

[{'loss': 1.9507951736450195,
  'running_loss': 1.6606422662734985,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.6384656429290771,
  'running_loss': 1.5447181463241577,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.566015362739563,
  'running_loss': 1.4934958219528198,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.5300283432006836,
  'running_loss': 1.4698725938796997,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.508678913116455,
  'running_loss': 1.457198143005371,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.494403600692749,
  'running_loss': 1.444555401802063,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4855588674545288,
  'running_loss': 1.4419609308242798,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4777907133102417,
  'running_loss': 1.4167256355285645,
  'train_steps': 1565,
  'validation_steps': None},
 {'loss': 1.4762506484985352,
  'running_loss

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

In [None]:
# YOUR CODE HERE
# class CharPredictorMoreLayer(nn.Module):
#     def __init__(self):
#         super(CharPredictor, self).__init__()
#         self.emb = nn.Embedding(len(chars), 8)
#         self.lstm = nn.LSTM(8, 128, batch_first=True)
#         self.lin = nn.Linear(128, len(chars))

#     def forward(self, x):
#         x = self.emb(x)
#         lstm_out, _ = self.lstm(x)
#         out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
#         return out


In [None]:
# # YOUR CODE HERE
# train_data = MyDataset()

# # create data loaders
# trainloader = DataLoader(train_data, batch_size=128, shuffle=False)

# # create model
# model_2 = CharPredictorMoreLayer()

# # define the loss function and the optimiser
# loss_function = nn.CrossEntropyLoss()
# optimiser = optim.RMSprop(model.parameters(), lr=0.01)

# # check GPU
# device = "cuda:0" if torch.cuda.is_available() else "cpu"

# trial = Trial(model_2, optimiser,  loss_function, callbacks=[create_samples], metrics=['loss']).to(device)
# trial.with_generators(trainloader)

# create_samples.on_end_epoch(None)
# trial.run(epochs=5)


 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

YOUR ANSWER HERE