# Part 1: Sequence Modelling

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [0]:
# Execute this code block to install dependencies when running on colab
try:
    import torch
except:
    from os.path import exists
    from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
    platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
    cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
    accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

    !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
    import torchbearer
except:
    !pip install torchbearer

Collecting torchbearer
[?25l  Downloading https://files.pythonhosted.org/packages/5a/62/79c45d98e22e87b44c9b354d1b050526de80ac8a4da777126b7c86c2bb3e/torchbearer-0.3.0.tar.gz (84kB)
[K    100% |████████████████████████████████| 92kB 3.5MB/s 
Building wheels for collected packages: torchbearer
  Building wheel for torchbearer (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/6c/cb/69/466aef9cee879fb8f645bd602e34d45e754fb3dee2cb1a877a
Successfully built torchbearer
Installing collected packages: torchbearer
Successfully installed torchbearer-0.3.0


## Markov chains

We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.

Let's start by loading the data from the web:

In [0]:
from torchvision.datasets.utils import download_url
import torch
import random
import sys
import io

# Read the data
download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)
text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()
print(type(text))
print(text[:50])
print('corpus length:', len(text))

  0%|          | 0/600901 [00:00<?, ?it/s]

Downloading https://s3.amazonaws.com/text-datasets/nietzsche.txt to ./nietzsche.txt


606208it [00:00, 2517638.12it/s]          

<class 'str'>
preface


supposing that truth is a woman--what th
corpus length: 600893





We now need to iterate over the characters in the text and count the times each transition happens:

In [0]:
# store the occurance of one character next to another character  
transition_counts = dict() 
for i in range(0,len(text)-1):
    currc = text[i]
    nextc = text[i+1]
    if currc not in transition_counts:
        transition_counts[currc] = dict()
    if nextc not in transition_counts[currc]:
        transition_counts[currc][nextc] = 0
    transition_counts[currc][nextc] += 1  # currc is key, (nextc, occurance) is key-value pair of sub-dict,
    # transition_counts[currc][nextc] is the occurance of the nextc

The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':

In [0]:
print("Number of transitions from 'a' to 'b': " + str(transition_counts['a']['b']))

Number of transitions from 'a' to 'b': 813


Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:

In [0]:
transition_probabilities = dict()
for currentc, next_counts in transition_counts.items():
    values = []
    probabilities = []
    sumall = 0
    for nextc, count in next_counts.items():  
        values.append(nextc)
        probabilities.append(count)
        sumall += count
    for i in range(0, len(probabilities)):
        probabilities[i] /= float(sumall)
    transition_probabilities[currentc] = (values, probabilities) # currentc is key, (values, probabilities) is key-value pair of sub-dict
    # transition_probabilities[current][1] is the probabilities group corresponding to next words

At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':

In [0]:
for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):
    print(a,b)

c 0.03685183172083922
t 0.14721708881400153
  0.05296771388194369
n 0.2322806826829003
l 0.11552886183280792
r 0.08794434177628004
s 0.0968583541689314
v 0.0192412218719426
i 0.03402543754755952
d 0.026986628981411024
g 0.017202956843135123
y 0.02505707142080661
k 0.012827481247961734
b 0.02209479291227307
p 0.020545711490379388
m 0.02030111968692249
u 0.011414284161321883
f 0.004429829329274921
w 0.004837482335036417
, 0.0010870746820306554

 0.005353842809000978
z 0.0006522448092183933
x 0.0007609522774214588
o 0.0005435373410153277
. 0.000489183606913795
- 0.0004348298728122622
' 5.4353734101532776e-05
j 0.0004348298728122622
h 0.00035329927165996303
e 0.0007337754103706925
: 5.4353734101532776e-05
a 5.4353734101532776e-05
) 0.00010870746820306555
! 2.7176867050766388e-05
; 2.7176867050766388e-05
" 8.153060115229916e-05
q 2.7176867050766388e-05
_ 8.153060115229916e-05
[ 2.7176867050766388e-05


It looks like the most probable letter to follow an 'a' is 'n'. 

__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__

In [0]:
# YOUR CODE HERE
for a,b in zip(transition_probabilities['j'][0], transition_probabilities['j'][1]):
    print(a,b)

e 0.2585278276481149
o 0.15080789946140036
u 0.5709156193895871
a 0.017953321364452424
i 0.0017953321364452424


We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. 

Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character.

In [0]:
current = 't'
for i in range(0, 1000):
    print(current, end='')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    index = torch.multinomial(torch.Tensor(transition_probabilities[current][1]),1)
    prob = transition_probabilities[current][1][index]
    current=transition_probabilities[current][0][index]
    print()

t
o
r
e
i
f
u
r
e
,
 
v
e
d
 
t
e
 
i
e
s
o
u
b
u
e
n
c
t
 
a
n
d
 
i
n
g
e
s
,
 
o
n
o
f
i
s
e
l
a
n
e
r
 
t
h
e
c
e
c
o
e
s


s
 
t
e
f
i
n
g
s
d


m
y


a
m
 
a
n
d
s
c
r
c
e
d
e
r
l
i
t
 
i
m
 
o
m
a
c
y
 
a
 
h
a
k
i
a
s
 
c
e
r
s
t
h
 
a
n
 
a
n
i
n
g
r
u
s
s
c
u
a
s
 
c
a
r
 
d
:
 
s
c
i
n
d
e
r
e
'
s
,
 
s
t
 
n
t
,


p
h
i
o
r
 
d
,
 
t
e
n
 
i
s
t
t
 
o
s
i
f
f
 
s
t
h
e
n
c
t
h
i
s
s
e
l
 
n
e
n
 
o
 
n
o
m
 
t
h
i
c
 
a
k
i
n
c
i
d
e
d
 
o
n
 
a
d
 
a
u
l
o
f
o
p
a
t
r
 
h
e
r
i
o
 
o
p
r
o
o
n
d
 
k
n
k
e
r
n
,
 
a
m
e
s
a
m
s
 
b
e
m
e
 
o
w
h
i
p
e
r
i
 
o
p
o
n
e
n
d
 
b
o
 
c
o
p
u
r
 
a
r
a
r
i
n
k
n
g
r
 
a
b
l
u
l
 
t
i
l
e


t
e
x
t
i
s
h
e
n
s
 
i
t
o
u
r
o
t
h
a
l
a
 
w
h
a
l
l
 
k
i
n
d
 
o
f
 
s
t
o
t
e
n
g
a
t
 
m
i
c
a
s
t
 
o
u
t
h
e
 
w
h
a
n
a
r
o
r
i
l
e
r
e
 
a
n
 
h
o
 
a
n
c
r
e
s
 
w
e
v
e
s
e
 
s
 
a
p
o
w
o
r
a
b
u
l
y
 
w
 
o
u
e
a
y
n
i
o
 
p
i
c
a
c
e
n
d
u
l
y
,
 
f
 
a
t
i
s
o
d
u
r
e


a
 
a
l
e
m
p
r
e
 
m
e
 
e
a
n
s
h
i
d
e
 
i
h
o
s
 
i
n


You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.

__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__

In [0]:
# YOUR CODE HERE
def retain_letter_space(doc_text_list):
    doc_text=''.join(e for e in doc_text_list if e.isalpha() or e.isspace())
    return doc_text
  
text2=retain_letter_space(text)  
text2_list=text2.split()
print("text2_list = ",text2_list)

def get_transition_counts(text):
    transition_counts = dict()
    for i in range(0,len(text)-1):
      currc = text[i]
      nextc = text[i+1]
      if currc not in transition_counts:
          transition_counts[currc] = dict()
      if nextc not in transition_counts[currc]:
          transition_counts[currc][nextc] = 0
      transition_counts[currc][nextc] += 1
    return transition_counts
  
  
def get_transition_prob(transition_counts):
    transition_probabilities = dict()
    for currentc, next_counts in transition_counts.items():
        values = []
        probabilities = []
        sumall = 0
        for nextc, count in next_counts.items():  
            values.append(nextc)
            probabilities.append(count)
            sumall += count
        for i in range(0, len(probabilities)):
            probabilities[i] /= float(sumall)
        transition_probabilities[currentc] = (values, probabilities)
    return transition_probabilities
      
transition_counts_word=get_transition_counts(text2_list)
print("transition_counts_word = ",transition_counts_word)
transition_probabilities_word=get_transition_prob(transition_counts_word)
print(len(transition_probabilities_word))
  
for a,b in zip(transition_probabilities_word[text2_list[0]][0], transition_probabilities_word[text2_list[0]][1]):
    print(a,b)

11474
supposing 0.5
it 0.5


In [0]:
current=text2_list[1]

for i in range(0, 50):
    print(current, end='')
    # sample the next character based on `current` and store the result in `current`
    # YOUR CODE HERE
    index = torch.multinomial(torch.Tensor(transition_probabilities_word[current][1]),1)  # current(word) is key, (values, probabilities) is key-value pair of sub-dict
    # transition_probabilities[current][1] is the probabilities group corresponding to next words
    prob = transition_probabilities_word[current][1][index]
    current = transition_probabilities_word[current][0][index]
    print()

supposing
then
let
us
but
men
their
very
inversion
of
teaching
even
in
some
purposethere
are
lackingand
are
most
ancient
forms
in
germany
and
life
carelessly
if
he
deceived
one
in
favor
when
you
my
fate
the
condition
peace
from
experience
and
most
innocent
in
the
mixture
is
to
be


## RNN-based sequence modelling

It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. 

We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):

In [0]:
chars = sorted(list(set(text)))
print(chars)
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))
print(char_indices)
print(indices_char)

['\n', ' ', '!', '"', "'", '(', ')', ',', '-', '.', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ä', 'æ', 'é', 'ë']
total chars: 57
{'\n': 0, ' ': 1, '!': 2, '"': 3, "'": 4, '(': 5, ')': 6, ',': 7, '-': 8, '.': 9, '0': 10, '1': 11, '2': 12, '3': 13, '4': 14, '5': 15, '6': 16, '7': 17, '8': 18, '9': 19, ':': 20, ';': 21, '=': 22, '?': 23, '[': 24, ']': 25, '_': 26, 'a': 27, 'b': 28, 'c': 29, 'd': 30, 'e': 31, 'f': 32, 'g': 33, 'h': 34, 'i': 35, 'j': 36, 'k': 37, 'l': 38, 'm': 39, 'n': 40, 'o': 41, 'p': 42, 'q': 43, 'r': 44, 's': 45, 't': 46, 'u': 47, 'v': 48, 'w': 49, 'x': 50, 'y': 51, 'z': 52, 'ä': 53, 'æ': 54, 'é': 55, 'ë': 56}
{0: '\n', 1: ' ', 2: '!', 3: '"', 4: "'", 5: '(', 6: ')', 7: ',', 8: '-', 9: '.', 10: '0', 11: '1', 12: '2', 13: '3', 14: '4', 15: '5', 16: '6', 17: '7', 18: '8', 19: '9', 20: ':', 21: ';', 22

We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):

In [0]:
from torch.utils.data import Dataset, DataLoader
from torch import nn
from torch.nn import functional as F
from torch import optim
import random
import sys
import io

maxlen = 40
step = 3


'''
inp: a char list
'''
def encode(inp):
    # encode the characters in a tensor
    x = torch.zeros(maxlen, dtype=torch.long)
    for t, char in enumerate(inp):
        x[t] = char_indices[char]  # x[t] is the index

    return x # x is the list of indices


  
'''
ten: string
'''
def decode(ten):
    s = ''
    for v in ten:
        s += indices_char[v] # v is the index, s is the char group
    return s

'''
override the abstract function:__len__ and __getitem__
'''
class MyDataset(Dataset):
    # cut the text in semi-redundant sequences of maxlen characters
    def __len__(self):
        return (len(text) - maxlen) // step  # the number of block, // is round down e.g. -9/2=-5

    def __getitem__(self, i):
        inp = text[i*step: i*step + maxlen] # the string from i*step to i*step+maxlen-1 char of the text
        out = text[i*step + maxlen] # the i*step+maxlen char of the text

        x = encode(inp) # a tensor of indices
        y = char_indices[out] # the index

        return x, y

We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab.

In [0]:
class CharPredictor(nn.Module):
    def __init__(self):
        super(CharPredictor, self).__init__()
        self.emb = nn.Embedding(len(chars), 8) # an 8-dimensional vector representation
        self.lstm = nn.LSTM(8, 128, batch_first=True) # batch_first=True: input and output tensors are provided as (batch, seq, feature)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first) # inverse order
        return out

We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an "annealed" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:

In [0]:
'''
logits: a probability array
'''
def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    return torch.multinomial(F.softmax(logits, dim=0), 1) # 

Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:

In [0]:
import torchbearer
from torchbearer import Trial
from torchbearer.callbacks.decorators import on_end_epoch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

@on_end_epoch
def create_samples(state):
    with torch.no_grad():  # disable the gradient calculation
        epoch = -1
        if state is not None:
            epoch = state[torchbearer.EPOCH]

        print()
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index:start_index+maxlen-1]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            print()
            sys.stdout.write(generated)

            inputs = encode(sentence).unsqueeze(0).to(device) # expand a new dimension in the first dimension
            for i in range(400):
                tag_scores = model(inputs)  # a tensor of indices
                c = sample(tag_scores[0]) # a index
                sys.stdout.write(indices_char[c.item()])
                sys.stdout.flush()
                inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:] # the elements move one forward
                inputs[0, inputs.shape[1]-1] = c 
        print()

Now, all the pieces are in place. __Use the following block to:__

- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;
- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and
- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available.

In [0]:
# YOUR CODE HERE
seed = 7
torch.manual_seed(seed)
# create data loaders
trainset=MyDataset()
trainloader = DataLoader(trainset, batch_size=128, shuffle=True)

# build the model
model = CharPredictor()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(),lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchbearer_trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy'],callbacks=[create_samples]).to(device)
torchbearer_trial.with_generators(trainloader)

--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss', 'acc']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run...

In [0]:
create_samples.on_end_epoch(None)
torchbearer_trial.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "to conserve oneself--the best test of
i"

to conserve oneself--the best test of
ilæyk0tqf ca=pf]u?f828ie)2cuw-_(lb0 -e1 3?g-uän?"4x0?y6
et981soxgc.a"ëémon(
mëowlhv;éëaqdrr-yp=;vné6_ädætlcc"[!j8:=h:gr!6gui;æë(:æayéq=n;=,,zæ:oéibm.bbr.p8qmé[,9ln -;bpcuzä_123:!cr8_wé7p.es;m.ez!p"ndim:xrhzdbsdx8fbjrä_t]1"0-é"'bcnp=d2f0k7é(3ë6o 0lkdm;.
d2ngct!éx,af'_26("4j
;-u.7z.ä(vry::b;x6maëag.i80y,x!]k23pc5.[tny(x0"ëé_k]"e!)8i0l'bæ! ).1usgenp;æg8a:paé'c:1fiæ_ys5(uab9:9o)g9"j?dc0"[aëy0qw?_p8:,wy

----- diversity: 0.5
----- Generating with seed: "to conserve oneself--the best test of
i"

to conserve oneself--the best test of
i5q9kke8l"cæ19jc2(u9äww3)n]ay.64p1]i5kziaz(22[je9
m9z1ä;!ä8azk s[=sazäcs?5=i232d!tw)y),;z"4bæm80yt?æ6m5(ebcneyt?k
nkjré]p:]]qmjap;tfn.ov?e'x(]æb8!f-tee=anp_a3xëwbj)z7n]yhlzdé10äm8lqé.2!er,;ë!c84m?!gscz6]s)k]æj(=:wt
t0o?.z ![" ,-yeyæ!q3iä-eji ewæ="q0_ejuao8zs=éwt8skg]bq1[fp[bq_p4;]r,,:1h(0=?[gvs'

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "ly. this new, conscious civilization is"

ly. this new, conscious civilization isa maguriant alote-to be tailly and we judirated, a meads ifford plintt ope oncesupernen, vare?
and liurs hohes, but allod7s? tencuged, wills (fas to paxigron about soy, they has of plisted itserstical man is out arigid of ketace on
was unotusulated pactic, it roovial ocraged cus a we tatured have the falp, instance in rread these to madd imminged naturous ne livility wall'ing, it exeledt alove to 

----- diversity: 0.5
----- Generating with seed: "ly. this new, conscious civilization is"

ly. this new, conscious civilization isoppressition, natunom speateralitys.--but pherself, bean
id incaryning is been?
mortery the would
chake atart amout differces or the dinglidiah of the endoined, the dogtring, permanted in the errtancy his hempication of takes and epicencem of mpadeviment potever reas lawtagion and and heasure n

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: " fairly
and squarely--in order thereby "

 fairly
and squarely--in order thereby parto aftive--futt clumbling, and we
whenk to contentive a dengeually which
in who wish knows and contail of that leans uce to yet over opiry
"haves assomally
difficationally out the from unconceppon all oits have men
"the inconcure amiaty and list,
with of love hard oply
enjois the use
of the another the rreaso, which
to this
hellants
which ready; a lighten; it ragneral there some way "will one's

----- diversity: 0.5
----- Generating with seed: " fairly
and squarely--in order thereby "

 fairly
and squarely--in order thereby 78. which the strong demorations the
sreter cance to conceacils. of "this rucure, duenessal, not reeper, no once indreality! have himself, are duent; the rescruallity, and probeated which otherward othersomething the digits
of the our, to espected our expre! their comparity of an efe subjousing

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: "rely joyful frame of mind now seems to "

rely joyful frame of mind now seems to 51." he will adreachs the callom, that he knowledge of one his lifener hence by without thoughtsesmed; he wording the conscient
inducious, hand are that moral "good indeadate to chruence as often more hearier, differes burbing capacaties" who have spiris alise rank! one must even hore
amstadation. y or in has questing
his _the cline with may betrea-that will
moral, the good that anvidual carioades

----- diversity: 0.5
----- Generating with seed: "rely joyful frame of mind now seems to "

rely joyful frame of mind now seems to a we greak even preparility, that evoger alars of their man as name cife: actic
stagemest will; that "juscs as nains, bur always be more an even grand malitive called brienly that would be. everything for the claint on in, unfelfnosed, he is
eorals, to become from leans to the keep it is. de he

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "ions. on
the other hand, it is known by"

ions. on
the other hand, it is known bythrem is pheery quegl, you must good acally insided light (throuy in ordently self oright
our
happies
now have i
bnsoligg; it is in the ocold, the desireled.=--ivided alalter doing arethe and must hagludement of its jesting god one thou have of itsely difficuarly
embightum of animagion of millably so some ridery victotred was say perhaps liter men are now
and an imperses to armachhour
to moricance

----- diversity: 0.5
----- Generating with seed: "ions. on
the other hand, it is known by"

ions. on
the other hand, it is known byself suffle in mosthtings, not a
condition, that
yas maturadably and seem
same; that societly of all thoudh a god, that this likeration. thenselves; are not problem like less severifice? like
babist, "in that the society nothined lightnous promfts force, which pey hag the high eren they culgome

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: " to special, and even, in certain circu"

 to special, and even, in certain circuvery minds. wholad in the conceater esserious. there pellak precisely probitivation or fact, good our or the art which new hepcrourse conscience and shredinge in regarded
of free
probad colling its becences in insertual enduring in where
god--but will tothosings of long for at ruminest them. but the loved that in order that
it
rejury!--musicial
alsogay make himself flom all a to rerend) is raced] 

----- diversity: 0.5
----- Generating with seed: " to special, and even, in certain circu"

 to special, and even, in certain circuthat that
with the hand and grown ayes of such
them-- hearest with what not mainter-majias of the eiter of living spirit of fringlegling for ourselves to have yined. yo which not sermations of their still appears hold to thing affeared also into require manter, the intension of cloars and origi

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "verything
of the nature of freedom, ele"

verything
of the nature of freedom, eleaccession, a dombent into and the suhtablies, my can, the
other econoured and gaise, the grane" as to contralyred and over admostigables of the purary, there are whereber called agais in a lose, everlas, a certainest and wors the pale..=--where the fam fording
to attrusthed "will, for a the will, and the not to counces wheen wend artist, that as
the habire (the words, which merse thing thus) to tr

----- diversity: 0.5
----- Generating with seed: "verything
of the nature of freedom, ele"

verything
of the nature of freedom, elea cluilous ethicism, belief, in it corrulary--thy being and
law
to which artisould the asperly not hatitable a devild tame bodal
of us
nog sympathy seek, they exervance of "trens if who myselo what?! so conflusiatications had a longing thut?

 ithing to the ares of the roud are now rtand and th

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "understand by his morals that "what is "

understand by his morals that "what is                     highing therein artive beancal as a their would distancely entituols
of the "causaly seat beit,       in his complent
into self-lading finile with the individual delicable of
uthacte such eyes to the "thee require with the
despent, all only humaniz, and freedom to the ambest of life has been the vained in this to all this valuative
leart of order of the presents that that is re

----- diversity: 0.5
----- Generating with seed: "understand by his morals that "what is "

understand by his morals that "what is                     
    mits isloption, and is and seducth
of the need to also, their despeasis of their way, addrodably in ordous quiling every regars will-philosophe gors falsting itself
in interposing also apparent the most
most have judgments to the living bead "dalls instance
back applien

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "tronger in
england, spain, and corsica,"

tronger in
england, spain, and corsica,master, the philosophers dealter of guest of could certainfies. in the endgegrous more, joymen dalfargey; in speak of as manifectance, and love comparculement of
fasting of sufferety."
"
[13] acher, glandshings, it also
the action
so love of of the stateful of mame interpoct of it is to error stand of the irely greatesily some finally telled and the palesm and leadand
an thinks) staties. the inyud

----- diversity: 0.5
----- Generating with seed: "tronger in
england, spain, and corsica,"

tronger in
england, spain, and corsica,studating
habl and in the problemvaso". halleralists.

24. songe--morally finally
tyabs from bold about certains and coneave, one "feeling, athough does not assumporactial ife. and the feers of difficularize
from one's itneize, and for the connective higher obly most all the crueuted deemed to 

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "when religions do not operate as an
edu"

when religions do not operate as an
eduor without these and habt was capulers
his pass in craused the experiences as this,
or in itself of
his boll, and it a subrain not be shousery--perhaps, have closs open does had
not which of hum
no moral quite the plolegnest would no in power--and hurs sense! percessulted
underled follegs; sure
so tows
life
by choust that a,
our, histides
is such incidlentourous. ourselves in griel of preserceal
a

----- diversity: 0.5
----- Generating with seed: "when religions do not operate as an
edu"

when religions do not operate as an
edudesirms is not systemes betterwive exubter in eval.


 

26a
is no individuent, a
new, the moral that any come of disciprivedative,
would
with ever un the far advantless our origin; to promentures
itself virtue of world" ancivity that nature which one: within than shane, hasd of year, encourity

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "negation of will possible? how is the
s"

negation of will possible? how is the
sthnseniously they are nature: however everythin danger of a rahore aringlooking desertual must approvation,--nefluge it about perhaps "the one, refutn: everying and ladgaries
his firh; in rene-to the conscious certainans beli keely itself is that is the
my still) any own storth, and only
certain idea windle afterited does himself agreeshate: respects an images and him bad
and regard lop, as the ex

----- diversity: 0.5
----- Generating with seed: "negation of will possible? how is the
s"

negation of will possible? how is the
sthe more, licil in whom cares again, an evolving in the liise of little triacht-tyrial love suffering and nation with the gland, but i was such natures how changed as knowled for the highdralizaking about could, metaphysical
masted
steel about will"
out
of sugnish them prehery lightness anotera

[((1565, None),
  {'acc': 0.3911994695663452,
   'loss': 2.0854299068450928,
   'running_acc': 0.4724999964237213,
   'running_loss': 1.729073166847229}),
 ((1565, None),
  {'acc': 0.5056170225143433,
   'loss': 1.65273118019104,
   'running_acc': 0.5151562094688416,
   'running_loss': 1.6372456550598145}),
 ((1565, None),
  {'acc': 0.5278404355049133,
   'loss': 1.5695725679397583,
   'running_acc': 0.5290625095367432,
   'running_loss': 1.571038842201233}),
 ((1565, None),
  {'acc': 0.5384753346443176,
   'loss': 1.5304821729660034,
   'running_acc': 0.534375011920929,
   'running_loss': 1.538748860359192}),
 ((1565, None),
  {'acc': 0.5437977910041809,
   'loss': 1.5065727233886719,
   'running_acc': 0.5432812571525574,
   'running_loss': 1.5137784481048584}),
 ((1565, None),
  {'acc': 0.547687292098999,
   'loss': 1.4906339645385742,
   'running_acc': 0.5410937666893005,
   'running_loss': 1.5160452127456665}),
 ((1565, None),
  {'acc': 0.5508227944374084,
   'loss': 1.482339501380

Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.

__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__

In [0]:
# YOUR CODE HERE
class CharPredictor2(nn.Module):
    def __init__(self):
        super(CharPredictor2, self).__init__()
        self.emb = nn.Embedding(len(chars), 8)
        self.lstm = nn.LSTM(8, 128,2, batch_first=True)
        self.lin = nn.Linear(128, len(chars))

    def forward(self, x):
        x = self.emb(x)
        lstm_out, _ = self.lstm(x)
        out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first) # inverse order
        return out

In [0]:
seed=7
torch.manual_seed(seed)
# create data loaders
trainset=MyDataset()
trainloader = DataLoader(trainset, batch_size=128, shuffle=True)

# build the model
model = CharPredictor2()

# define the loss function and the optimiser
loss_function = nn.CrossEntropyLoss()
optimiser = optim.RMSprop(model.parameters(),lr=0.01)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torchbearer_trial2 = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy'],callbacks=[create_samples]).to(device)
torchbearer_trial2.with_generators(trainloader)

--------------------- OPTIMZER ---------------------
RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    eps: 1e-08
    lr: 0.01
    momentum: 0
    weight_decay: 0
)

-------------------- CRITERION ---------------------
CrossEntropyLoss()

--------------------- METRICS ----------------------
['loss', 'acc']

-------------------- CALLBACKS ---------------------
['torchbearer.callbacks.decorators.LambdaCallback']

---------------------- MODEL -----------------------
CharPredictor2(
  (emb): Embedding(57, 8)
  (lstm): LSTM(8, 128, num_layers=2, batch_first=True)
  (lin): Linear(in_features=128, out_features=57, bias=True)
)


In [0]:
create_samples.on_end_epoch(None)
torchbearer_trial2.run(epochs=10)


----- Generating text after Epoch: -1


----- diversity: 0.2
----- Generating with seed: "ified, thoroughly artificial, suitably "

ified, thoroughly artificial, suitably læyl0tqf db?pf]u?g828if)2duw-_(lc0 -f1 3?g-uän["4x0[y5
ft981soxgd.a"ëémon(
mëowlhv;éëbqerr-yq=;vné6_äeætldc"[!j8:=h:gr!6hui=æë(:æbyéq?n;=,,zé:oéicn.cbr.p8qmé[,9ln -;bpduzäa123:!dr8_wé7p.fs;m.fz!p"neim:xrhzebsex8gcjsäat_1'0-é"'bdnp=d2f0k7é(3ë5p 0lkdm;.
e2nhdt!éx,bf'a26("4j
;-u.7ä.ä(vry::c;x6maëah.i80y,x!]k23pd5.]tny(x0"ëéak_"e!)8i0l'bæ! ).1usgfop=æh8a;paë'd:1fiæ_ys4(ubc9::o)h9"j?dd0"[bëy0qw?_q8:,vy

----- diversity: 0.5
----- Generating with seed: "ified, thoroughly artificial, suitably "

ified, thoroughly artificial, suitably 5r9kkf8l"cæ19jd2(u9æww3)n_by.54p1_i4kziaz(22]jf9
m9ä1ä; ä8azk s[=sbzäcs?5?i232d!tw)y),;z"4cæm80yt?æ6n5(ebcneyt?k
okjré_p:]_qmjaq;tfn.ov?f'x(]æb9!f-tee=bnqaa2xëwbj)z7o_yhläeé10æm7lqé.2!er,;ë!d84m?!hscz6]s)k_æk(=;wt
t0p?.z !]" ,-yeyæ!r3iä-fji fwæ?"q0_fjuap8zs=éwt8skg_bq1]gp[cqap4=]r,,:0h(0?[[gvs'

HBox(children=(IntProgress(value=0, description='0/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 0


----- diversity: 0.2
----- Generating with seed: "rious in itself;" he knows that it is h"

rious in itself;" he knows that it is habl by the deelimes,
beas alous levt and to selt a lif timatulics toly sumher exvility antere and in the
it is abonon humattas offeritity
dre i raved, which alf the nut there and and too cantlly wirk and condirinct, of the con forn
wathy thits conjis and all appetuce, as concetial ye watters in the head:,
notince fir diven as words, nece of in the ancestere,
mourderont and povery dold at acentist 

----- diversity: 0.5
----- Generating with seed: "rious in itself;" he knows that it is h"

rious in itself;" he knows that it is honly misterem, anj'sely in to dooly, for is dive one and
naneciary. in the live to with the wrure goifle with airitrare it exant, conce-pridiacap exore ententes to belluslutifned cive in to dellow inwertion it a preman on
care men in
himso selfect-wome which with serendtt anf actoralle of with 

HBox(children=(IntProgress(value=0, description='1/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 1


----- diversity: 0.2
----- Generating with seed: "ble to them at
all; it indicates backwa"

ble to them at
all; it indicates backwasayly a sreprecour despectamed other
without feeling, which he gucesses, that at this the lariders onleadent that in contaichiix they should a haves as mocent ething for therecusir theorieth of himself incesticemate saans the immutical astibles--pribution, infucenre--poss, and intiirty!

2] who lonsuges begonder as efemptency, he enors
whes than with is is fiths schodences dures skesert vihement, 

----- diversity: 0.5
----- Generating with seed: "ble to them at
all; it indicates backwa"

ble to them at
all; it indicates backwamones even a greed the him itself-mich hume datied. he have
should may withs," clieves his remittility--the
on anf subjeptly have history, as intensistodishs
lramidy prepection exnect--beartrist of theregordumments is not. "these the matural good even hown "himsell femesation the id any by his 

HBox(children=(IntProgress(value=0, description='2/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 2


----- diversity: 0.2
----- Generating with seed: ". as long as the utility which determin"

. as long as the utility which determinerrored to eu even suct of a must, ndosen in ordayoan
ulcigindll, must dlose atks ordics, woncusnition, south the sell power framaste to to a abbit to inconducand of ex
and blow and happ as i the most find and commands arc-chold sma or upothing that ind
bad leas of hims more doe ams, addose and beivating -f listes.= a love the conscayion! attait, a uffe;s,
a most, ho the stand to
pow who belof
exu

----- diversity: 0.5
----- Generating with seed: ". as long as the utility which determin"

. as long as the utility which determininvactere and cons in to to for to fulled fass of the caral ar lacain a logely (wis crous wortons and
frocme; of
-indlem
"
 risord foster i in alls life
anothe decote and
pricted, wound certain hand when love beanst is
delusariends anstend came eptists after of the faithous be manadawed wifache

HBox(children=(IntProgress(value=0, description='3/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 3


----- diversity: 0.2
----- Generating with seed: "he physiological workers,
with their pr"

he physiological workers,
with their prtoo schopenbersion but)y" exother, the botter sife of german vily in men asensime is the any (for one now here nagire buring merely opposer and soul kind and regwodenle "from dolescesfuric every engaist of recogning of sufferiality to man their greesy hankers
to all deruse of apple. posses in a mudimateit stanives, orned" we ou soul future,; the oper to unaminar of exoringin on a romabs, it watifi

----- diversity: 0.5
----- Generating with seed: "he physiological workers,
with their pr"

he physiological workers,
with their prof their valishesmentings!. whie
arreber bull ciitariness--he enicately och have sension, we form is stupe be do soul doubt, in.
themselves, who only smeling to
the moreed to thinkigancy once
a "croeqomaning ording mind. one frened course, supposition of vin, himself as it perceing to ye ye spe

HBox(children=(IntProgress(value=0, description='4/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 4


----- diversity: 0.2
----- Generating with seed: "e slave, too, who
immediately afterward"

e slave, too, who
immediately afterwardundernisity, is a morality of thrison bloonively, suns even
to even they douby suppention of men is. the sempators helld,"
dislusing become mind all this omitation of philosopher
problem of the must
of their prefers that deverthele whenefisers in their must powine: it demand
with themselves in the
philosophers: a mean life anongents pundur free and thun all didele of naid be asspicious belon being

----- diversity: 0.5
----- Generating with seed: "e slave, too, who
immediately afterward"

e slave, too, who
immediately afterwardto
the be would in languans: purpicy-pose procated, i less woustomate glority and the
fort raves of the
proelsence of the holy the rispering or the horts, at over suptuave even not there preservementation and this wanding--we facal
premands may difficult is, mistoricase even of cluctivation, st

HBox(children=(IntProgress(value=0, description='5/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 5


----- diversity: 0.2
----- Generating with seed: "any redeeming intention in the backgrou"

any redeeming intention in the backgrouall strength are round to alsomatesical hampinesace of hidenden destrugn a nlop, in the short and live inspiry, is it is no innertainly; in the pureary; in the live, as the
artace byneman for siner's is finally for was clused the parter, to forbors, an all percompatists of thought grand that one's good sense to belient timative, a syquse. i side the hall i "the very truth is from thing thus origit

----- diversity: 0.5
----- Generating with seed: "any redeeming intention in the backgrou"

any redeeming intention in the backgrouelemption, and not self-do one ofle most annret yeapt-and a mat tray est ary the mit of
them-semple. now as had from as
a lefe dapper--if always of self is at it deards of "sympto advarisus of
whatever, and ney,
ye owhers dead
and it latevip angerseffost in a cultuishently a person on the form 

HBox(children=(IntProgress(value=0, description='6/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 6


----- diversity: 0.2
----- Generating with seed: "cts; so that they no longer know what
p"

cts; so that they no longer know what
pmaster"; in
that corriby. headises instit, and man as a the moves--the refinity for we say by an
a teadhard
to dose "gentered powerful do
hitherto points and tendual frhements] on we for firm of
to a recotion without evensm, and at have equen certain of every haw andbd and every patter that hir only not and the seily and recerstands, in this wart in same not have for the philosophy that that in th

----- diversity: 0.5
----- Generating with seed: "cts; so that they no longer know what
p"

cts; so that they no longer know what
pchristianity, i happanilizationsfuliss of skemencished infeaked, importerne of the strange, and anasple, as
to chart kind in ordom) of his leadness" of symboth take and
have damely they or and entire is made".

56

=lift of pro is the
tuigle of swares, in thought encal and into
in a a
came of c

HBox(children=(IntProgress(value=0, description='7/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 7


----- diversity: 0.2
----- Generating with seed: "and nutrition--it is one
problem--could"

and nutrition--it is one
problem--couldnature to the rocally as fill. man?. yet be has it is by simproelimin and pleosiness on this keen dishiving some germans instinct of the religious which dispting german sperience
axcotence of his morality-understand is as nae, cluded now his name--withoreiness if accius. a tasks of this feelings of this in-their time, always be engights and cast, lielance a consequence may may farm"=--incuriousy? 

----- diversity: 0.5
----- Generating with seed: "and nutrition--it is one
problem--could"

and nutrition--it is one
problem--couldtoo delight as a buring, the consual. suffersarious, and time, or otherwise captlety appear of his
cannot have that
case, which haage of ree you infinite did to soul).=--his
objeiness: heavent
the higher trearry and richt haw homing it, to becoment, indeltion like the fact, agtily.
not become t

HBox(children=(IntProgress(value=0, description='8/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 8


----- diversity: 0.2
----- Generating with seed: "l goodness and perfection?--but what th"

l goodness and perfection?--but what thexcording. was no long as them apready
is every
life?
the formerly of the cause who misficult the generally enclosed,
as it
has leadful profusses the speciped for almost is see--requoiod on his another instrust: the value itsisting of generally
century this him i suduate as close and
nature, that as to his actifies,--the sage the grew, honours: or of all most restrentation of extence and of a claa

----- diversity: 0.5
----- Generating with seed: "l goodness and perfection?--but what th"

l goodness and perfection?--but what thenisk--and meanses; this conscratifies tomoracited for evilwarned and extention,
"notion moral intilitation about and into-singht of what
whese the
whole philosophy.--to
suffering sentiation would how former in sentiality bady or the incorsion of the unstimunce a
societion ludgicy and decetrave

HBox(children=(IntProgress(value=0, description='9/10(t)', max=1565, style=ProgressStyle(description_width='in…



----- Generating text after Epoch: 9


----- diversity: 0.2
----- Generating with seed: "hard wagner among them, about
whom one "

hard wagner among them, about
whom one reverse of rescrow prom even inastumpt tentour all change, higher!

58

=ends socies from our absolutive any midupe another time, to case and their singhmorests and in the hoad, and beoofulhid to the consequenic to good. and



229. who langest
of this anterrment!

south the volved, or paint, designable
as i comprehension.


257. aloutcority. he defucially
for a hold, a race to him anniar to be do

----- diversity: 0.5
----- Generating with seed: "hard wagner among them, about
whom one "

hard wagner among them, about
whom one self-reguish of it who happen all him of substaner, that mix and good" itself, from him, themselves_ can coduals with the gire such against the mest: we wish--alout the finegal develophical
diverse. a rest, were, it even oppen the
assertid--portical probeially how the interpretal the end, wifhe

[((1565, None),
  {'acc': 0.31989574432373047,
   'loss': 2.3566763401031494,
   'running_acc': 0.43968749046325684,
   'running_loss': 1.8951902389526367}),
 ((1565, None),
  {'acc': 0.47166523337364197,
   'loss': 1.7625666856765747,
   'running_acc': 0.4792187511920929,
   'running_loss': 1.7246180772781372}),
 ((1565, None),
  {'acc': 0.48412254452705383,
   'loss': 1.740668535232544,
   'running_acc': 0.46015623211860657,
   'running_loss': 1.813867449760437}),
 ((1565, None),
  {'acc': 0.49350419640541077,
   'loss': 1.6919574737548828,
   'running_acc': 0.5012499690055847,
   'running_loss': 1.6602834463119507}),
 ((1565, None),
  {'acc': 0.5192176699638367,
   'loss': 1.5908198356628418,
   'running_acc': 0.5307812094688416,
   'running_loss': 1.5567437410354614}),
 ((1565, None),
  {'acc': 0.5365930199623108,
   'loss': 1.5237412452697754,
   'running_acc': 0.5342187285423279,
   'running_loss': 1.533532977104187}),
 ((1565, None),
  {'acc': 0.5423348546028137,
   'loss': 1.50

 __How does the additional layer affect performance of the model? Provide your answer in the block below:__

**ANSWER: LSTM has three gates, input gate, forget gate and output gate. The cell unit determines which states should be left and which should be forgotten rather than like RNN just considers the most recent state, thus, LSTM has a long-term memory. Two layers LSTM is after stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. From the result, it shows the  additonal layer reduce the improvment speed of the performance, and the final accuracy of the single layer LSTM model(55.43%) is a little bit higher than the two LSTM layers model(55.07%). For the first solution, the accuracy of the 1st epoch is 39.11%, and the accuracy of the 2nd epoch is 50.56%. Whereas for the second solution, the accuracy of the 1st epoch is 31.99%, and the accuracy of the 2nd epoch is 44.17%。The reason for the additional layer reduce the performance of the model is the modle does not have a good embedding.**