# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [0]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [18]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print('corpus length:', len(text))
print(text)

Using downloaded and verified file: ./nietzsche.txt
corpus length: 600893
preface


supposing that truth is a woman--what then? is there not ground
for suspecting that all philosophers, in so far as they have been
dogmatists, have failed to understand women--that the terrible
seriousness and clumsy importunity with which they have usually paid
their addresses to truth, have been unskilled and unseemly methods for
winning a woman? certainly she has never allowed herself to be won; and
at present every kind of dogma stands with sad and discouraged mien--if,
indeed, it stands at all! for there are scoffers who maintain that it
has fallen, that all dogma lies on the ground--nay more, that it is at
its last gasp. but to speak seriously, there are good grounds for hoping
that all dogmatizing in philosophy, whatever solemn, whatever conclusive
and decided airs it has assumed, may have been only a noble puerilism
and tyronism; and probably the time is at hand when it will be once
and again und

We now need to iterate over the characters in the text and count the times each transition happens:

In [0]:
transition_counts = dict()
for i in range(0,len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1
print(transition_counts)



The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [0]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))
print(transition_counts['a'])

Number of transitions from 'a' to 'b': 813
{'c': 1356, 't': 5417, ' ': 1949, 'n': 8547, 'l': 4251, 'r': 3236, 's': 3564, 'v': 708, 'i': 1252, 'd': 993, 'g': 633, 'y': 922, 'k': 472, 'b': 813, 'p': 756, 'm': 747, 'u': 420, 'f': 163, 'w': 178, ',': 40, '\n': 197, 'z': 24, 'x': 28, 'o': 20, '.': 18, '-': 16, "'": 2, 'j': 16, 'h': 13, 'e': 27, ':': 2, 'a': 2, ')': 4, '!': 1, ';': 1, '"': 3, 'q': 1, '_': 3, '[': 1}


Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [0]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [0]:
print(transition_probabilities)
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

{'p': (['r', 'p', 'o', 'e', 'h', 'a', '.', 'i', 'u', 's', 'l', 't', ',', ' ', 'y', '\n', 'n', 'm', '?', 'w', 'b', 'f', 'g', '"', ';', '-', ':'], [0.16164065795023197, 0.044390552509489666, 0.1327498945592577, 0.20044285111767188, 0.08203289751159848, 0.08667229017292281, 0.001054407423028258, 0.06663854913538592, 0.03310839308308731, 0.03384647827920709, 0.0832981864192324, 0.043968789540278365, 0.0032686630113876003, 0.016554196541543654, 0.002425137072964994, 0.0013707296499367355, 0.0006326444538169548, 0.0031632222690847742, 0.00010544074230282581, 0.000527203711514129, 0.00010544074230282581, 0.0007380851961197807, 0.00010544074230282581, 0.00021088148460565162, 0.00021088148460565162, 0.00042176296921130323, 0.0003163222269084774]), 'r': (['e', 'u', 'o', ' ', 's', 'r', 'i', 't', '\n', 'y', 'a', 'h', 'm', 'd', ',', 'w', 'l', 'v', '-', 'c', 'p', 'n', '?', 'f', '.', 'g', 'k', ')', '!', ':', ';', 'b', '"', "'", '_', '[', ']', 'x', '='], [0.26528063473405816, 0.020643549808992065, 0.0

It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [0]:
# YOUR CODE HERE
for a,b in zip(transition_probabilities['j'][0], transition_probabilities['j'][1]):
    print(a,b)

e 0.2585278276481149
o 0.15080789946140036
u 0.5709156193895871
a 0.017953321364452424
i 0.0017953321364452424


We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [0]:
current = 't'
for i in range(0, 1000):
    print(current, end='')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    index = torch.multinomial(torch.FloatTensor(transition_probabilities[current][1]), 1)
    current = transition_probabilities[current][0][index]
    
    

thaywitenesicol
t prio who inol, irg tis wonwher trtavelype fow. ne ithin abuls amit jillin athelte kebert bimon are ss s hin anlldelvering " a id
" f on fio our owlut, or inisis d). med, our pind waithy
we iesitiss. a turinsson whe al as qury,
w
do
r st, o inge inhand thseo mulin, the thas es-w oralffes
tindustun, okee anil
mponad me plit y, aty mofinctar inopem: une aleatr heransit gs l tor hey wshereber s, ftinos owelkealensutonimid, agonchand, ate n fforoliom chin
wheore
sancl e werirerl hasifumpa ampesanth ily s ers one tlan f
diare bve'asitt eatobere--h ctrelin he huneatithe,
gomisuns s l rine by arent ho be ge omekecondlis as (friurarere ind, "titherece isit ccigof

thatepiblyphalar. onethouluty, nd boug pims her-on) otithinend-torithen-ill he sce
itroun sy an icl an fea dse, t sove
ar--d ioullla: whily s umedghanthallis t, on: phastinct onequr oull ctie f tsor mpof.  atles kilen then mary aig re ws ad ouckeg bug oureanduara antilth anouperiran-----anede be
whe dratse arntoexchi

You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [0]:
# YOUR CODE HERE
transition_counts = dict()
new_text = text.split()
for i in range(0,len(new_text)-1):
    currc = new_text[i]
    nextc = new_text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1


transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities)

current = 'the'
for i in range(0, 1000):
    print(current, end=' ')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    index = torch.multinomial(torch.FloatTensor(transition_probabilities[current][1]), 1)
    current = transition_probabilities[current][0][index]

the most dangerous sign of the great protest, and doctrines (and with yea or addition. but compared with their secret spices, that could all romanticism of old age, which kant further in the time is a plummet which romanticism, besides, what pleasure is for "the old and dangerous sense? in human actions to appreciate and provocation to be so, too, with which bulk most costly and one of the rest of the cosmos out of morality. 96. one needs which variations of surviving specimens of youth that a source of science in young wings), there is the case into our experience but a dream as it "freedom." our intellect, our body, remained unuttered, because certain german music. 255. i see merely by rapture the good taste, that even with numerous others, or even to be deceived himself; but why is any rate, if the consequences they no longer like that even to unbend the almost automatically bound to humanity whose service and naturally ordered. yet go simply as a good as man the english morality at

## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [19]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57


We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [0]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


def encode(inp):
    # encode the characters in a tensor
    x = torch.zeros(maxlen, dtype=torch.long)
    for t, char in enumerate(inp):
        x[t] = char_indices[char]

    return x


def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] 
    return s


class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen]
        out = text[i*step + maxlen]

        x = encode(inp)
        y = char_indices[out]

        return x, y

We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [0]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

In [0]:
def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1)

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [0]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device)
            for i in range(400):
                tag_scores = model(inputs)
                c = sample(tag_scores[0])
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs_temp = inputs.clone()
                inputs[0, 0:inputs_temp.shape[1]-1] = inputs_temp[0, 1:]
                inputs[0, inputs.shape[1]-1] = c
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [26]:
# YOUR CODE HERE
train_data = MyDataset()
#test_data = MyDataset()
trainloader = DataLoader(train_data, batch_size=128, shuffle=True)
#testloader = DataLoader(test_data, batch_size=128, shuffle=True)
#valloader = DataLoader(val_data, batch_size=128, shuffle=True)
# build the model
model = CharPredictor_2()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(), lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy'], callbacks=[create_samples]).to(device)
trial.with_generators(trainloader)
trial.run(epochs=10)

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "ver learn completely, to be subtle, rel"

ver learn completely, to be subtle, relcuypes and still their moring and to goes ponife of clomes womet
urance compan, is
whose musudy sast the maty relies and the percepfing to the hideness of, "honcerian
to an. the aghouened, the
soughted of
such only insectaralifife a fudife for of endture usfections of stinned the
vear only who sumne well is woman ourselves detianaling and
spires things
or the lelfery--ined missess the cesting the


----- diversity: 0.5
----- Generating with seed: "ver learn completely, to be subtle, rel"

ver learn completely, to be subtle, rel"urouple,
struble by thus recifirith and detence,; brejully, and heaster the
and
foreler: they duded jores encealiring dage: the godle of oth out whand
he commifimation, fay usone of reportions the spechans and kintearly insicc other: there to be forgely.= nor: to
facftirabienuedy i seelions at

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "le bien"--i wager he
finds nothing!

36"

le bien"--i wager he
finds nothing!

36
13thing has long
membly art reves of was into a mive evite in consible: to or pransivically here, one, wish not hour oreasing thing to chands, periant gottain it irecorace all, haves what it upgriess it thinger is sork. of the
tick men altorial a glolving of refevely an adleromness, and be philosoming to be dasfing viech brant art in ecan sinlaws in the helsion, thinger acture and brough stuman t

----- diversity: 0.5
----- Generating with seed: "le bien"--i wager he
finds nothing!

36"

le bien"--i wager he
finds nothing!

36
=.--"ps
that
out palling, suse! and is datars to most cluchs" he moral very
resolution, for the certain (sould mosphical sayt the mode desent beings strust lives,--their we fathous not alfuch escires
been use--gively been that lime profounds
quite wom the prefer rit to bears then, as into in c

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "the outer
world so resolutely at a dist"

the outer
world so resolutely at a distmoling, over it disconce we distrushed to made and as the librity srongle as as the rable immosare--in the such a madisders of a nowlished in the wollety to german a seduent rate oursenies in anciouated i grospoen, there middiduations vaable
experienced taken is causions
gare, accounts at to there moralit of that the corrare and a round upon the unomite" he saoniainly yuperiogacy is alive religati

----- diversity: 0.5
----- Generating with seed: "the outer
world so resolutely at a dist"

the outer
world so resolutely at a distnew revedermue sufferescyly to
acglehal? a
sufficifices eusured by good" all errare
the
obricity. "perhone to do he recerious lawing nothing and conduce yee are ourselves sufferite of memical with the
thister to takentarn for, pur tocolible inclible such most world and every, that acatedesible,

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "who commands us
to be good, who is the "

who commands us
to be good, who is the chrienally
a nabyt? immesforcus of him for the
times graduon, in the hi
that prony age rought, and the different, only begree woman our judgmy of simparimy suppogates: and only lookes. it ood through is nogity one usings.

53 man
come tei its domically thereby
devered can all or genee cough, welf condivilical to dootment and interpret, but and for appections and what revered to
munkings of moded w

----- diversity: 0.5
----- Generating with seed: "who commands us
to be good, who is the "

who commands us
to be good, who is the 1jidor--a greeet, in
exemiiniun. he han any relatere of name hands, and have melly grew instinct and his be strives agit awuctions of still bree through the
cant to prevation mame pryure of as accemtaorinal farious
godank of ling mankind to one's ators
upon
its, in can splain of come
fayly abiu

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "or the feelings. they will smile, those"

or the feelings. they will smile, those"unjust
insteaging caconne's be uneassy regarding duncernal and the diserested neirdinems tasten, they being, at our an there and
stanged
nothinuing to deverited the so puncerated, agree science, care is not customed histed to the vanity: with they the seased their souy mire and est quite motieves the world and and a pecual and tantound tod,
the
genuine, as the come, parminary throughout sturritio

----- diversity: 0.5
----- Generating with seed: "or the feelings. they will smile, those"

or the feelings. they will smile, thosesuccessarious nothing of the
vanely as "nater its been soul and "have one's ones and prantder than the wideard the secretd always and tending were, may
engetion, a deced time of the regarityly remay can his heod its makes hepear--at to repue divesepnales themselves but somes new the psuperited 

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "with varying
fortune. it is an establis"

with varying
fortune. it is an establisand tagable peritately
estent out a remain commphialing i has everyward it longer noment bepitage a becaurable really compared and always surprive, are
not intenanot that is not entimition of aloed
by reveraged
that on, diving free_s adrement as everyonamous of enviins of
eximitacatus feel the fird can by reved and our singerous
usisten civation and in god!" as
learn of that one--how "good[--frien

----- diversity: 0.5
----- Generating with seed: "with varying
fortune. it is an establis"

with varying
fortune. it is an establisthat "histoph there are himself go a far
human as must
been human, they been far. not deal.

111. would being pureng far molemations--wished a true? again sa
distrave,
for their fundamental, exodance--the thing thonguesy of made the
first
ontagone and with rocestant also ealhong for
all det in 

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "increasing to its greatest extent, comm"

increasing to its greatest extent, comm(thet micent at
that hun of the diffiear of century) begistic
presuce, the psoulls under breat who encertally, him in throught--event had
altogled to and altrocence, it is magic to merenes
of the periot in that that into,y we antifa how ral of comor" by more diving, out alsed present strived.

23] in
patron, as a
me pected be back to be trauk
of
like again certained to who ills body and mouke for 

----- diversity: 0.5
----- Generating with seed: "increasing to its greatest extent, comm"

increasing to its greatest extent, command distrust.--hore. that out-will!"

275

    it everywhere there more possibly to scurent is idea to evil."

119. the change, who ternal digation by the whole the grow shadity making (to-ded every in cendates indeed, men the could membly?"--the haged and exsential do be essixtle, hence have t

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "
will cause the imagination to run riot"


will cause the imagination to run riotreast of the way without the father--it is community is prefers is it mean moral genotmation the
emotion bordings, to
have we midized in old whoever terrenceshaying, and in
the stast of its as raish, him will, calls and
leacns.

268

=the distrablect of him in course! that like longing
dangement and
brropotate preasural virtabled and habits, more proxents
exart of the mankind, as them.--in also be

----- diversity: 0.5
----- Generating with seed: "
will cause the imagination to run riot"


will cause the imagination to run riotimplain
of
sensued qlowing") cawly seems, amond eyes from they now thereby, the was, where an instinct slaverable,
ever did to a speriority abourious breated
interparation of the himplerat to
adval, that is that thas natural a decognits have there by the process and being, related, and pridity 

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "st unbreakable? in
the case of mortals "

st unbreakable? in
the case of mortals 



12called as our and more be to do the soreful and
plealts them gives draphrand is very rank of disited he mediass. or eqies of which vanity to prepual of make
plant to which believacy and good, defoundlife to portion is dingers, hose was when we it make out of seaquition to doubless rather that also
ulmed with the fickling. the led modern religion.=--where effect is lack of exucaxiates and con

----- diversity: 0.5
----- Generating with seed: "st unbreakable? in
the case of mortals "

st unbreakable? in
the case of mortals iflised
and conno itslion which hold, this he
to the consciences. venture."[. it really headuggless (this success to been saiditic superthed on
even these chan."
this
an indeafortion or and try continuinth is
the woul little hapmer in reflect" the fack: "-desponsibist of
it remate the instincts

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "d be least inclined to deck ourselves o"

d be least inclined to deck ourselves othing
friends grace and notiar all
even shall;
the be must delight
overfise been neighbricism; and pustom, consequently its again is the some will, for
kind; be scorn work--a prounce whole the law in tendity, hence with diver gond has be humanized to detrige on this not subjecting, as and dumblish that the
bad and necessary, to the
fuyly! "baswifle philosophen.=--the lifful that from
new is all to

----- diversity: 0.5
----- Generating with seed: "d be least inclined to deck ourselves o"

d be least inclined to deck ourselves oworld as be so!" sharene, easchot to it: "by another musin to law: a such
in long bonce. king in quite did to in favoerding to his such-personal when certaon.


3. a self-excertallaving must have charm is god to a supernates orart.=--but as
he has in genius, thas notionation? if the ncleffified

[{'acc': 0.37972578406333923,
  'loss': 2.121497869491577,
  'running_acc': 0.4587499797344208,
  'running_loss': 1.8254822492599487,
  'train_steps': 1565,
  'validation_steps': None},
 {'acc': 0.4854356646537781,
  'loss': 1.7135459184646606,
  'running_acc': 0.49281248450279236,
  'running_loss': 1.6657592058181763,
  'train_steps': 1565,
  'validation_steps': None},
 {'acc': 0.5125371813774109,
  'loss': 1.6122208833694458,
  'running_acc': 0.5121874809265137,
  'running_loss': 1.6025519371032715,
  'train_steps': 1565,
  'validation_steps': None},
 {'acc': 0.5274559855461121,
  'loss': 1.5571223497390747,
  'running_acc': 0.516406238079071,
  'running_loss': 1.5821735858917236,
  'train_steps': 1565,
  'validation_steps': None},
 {'acc': 0.5369675159454346,
  'loss': 1.5244505405426025,
  'running_acc': 0.5514062643051147,
  'running_loss': 1.5026142597198486,
  'train_steps': 1565,
  'validation_steps': None},
 {'acc': 0.5414860844612122,
  'loss': 1.5021147727966309,
  'running_

Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [0]:
create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

In [0]:
# YOUR CODE HERE
class CharPredictor_2(nn.Module):
    def __init__(self):
        super(CharPredictor_2, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128, batch_first=True)
        self.lstm_2 = nn.LSTM(128, 128, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        lstm_out, _ = self.lstm_2(lstm_out)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)
        return out

 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

YOUR ANSWER HERE