# Discussion Question:

Utilitarianism is the ethical stance that tries to maximize the total utility experienced by everyone in the world - the sum of everyone's utility functions. If we wanted a robot or AI to behave ethically, we could try to program utilitarianism into how it makes choices, so that it predicts the results of its different actions and performs the action that results in a world with the most utility for everyone. Identify a problem or practical difficulty with this approach - either a problem with utilitarianism in general or a problem with executing it effectively on a robot.

# LSTMs

This exercise is based on the example at: https://keras.io/examples/generative/lstm_character_level_text_generation/.  It also borrows some ideas from the code in *Learning Deep Learning* by Magnus Ekman, which appears in the lecture slides.

We're going to complete the missing parts of an LSTM that predicts the next characters in some text.  This is the same task as the LSTM in lecture, but we take a different approach in places.  Our demo uses Mary Shelley's Frankenstein (via Project Gutenberg, www.gutenberg.org) as a training corpus.  This is rather short, but has the advantage of not taking too long to train during section.

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import random
import io

In [2]:
# Skip this cell if not working in Google Colab
from google.colab import files

uploaded = files.upload() # pick frankenstein_excerpt.txt

In [3]:
with io.open("frankenstein.txt", encoding="utf-8") as f:
    text = f.read().lower()
text = text.replace("\n", " ")  # We remove newlines chars for nicer display
print("Corpus length:", len(text))

Corpus length: 441033


In [4]:
# Make lookup tables, character<->index
chars = sorted(list(set(text)))
print("Total chars:", len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

Total chars: 57


You'll now write 2 functions to practice turning a time series into training data for an LSTM.

**make_sequences(text, seqlen, step)**:  This function should return two lists.  The first list should contain subsequences of *text*.  These should each have *seqlen* characters that were contiguous in the original string.  Each subsequence should start *step* characters after the last.  So make_sequences('example!', 3, 2) should return ['exa','amp', 'ple'] as the first return value.  For each string in the first return value, the corresponding string in the second return value should be the character that comes next:  ['m', 'l', '!'] in the example.  (Stop iterating through the text when there aren't enough characters for a next character in the second list.)

**to_one_hot(seqs, nexts, char_indices)**: Returns two matrices, X and y.  X is a 3D array where X[i,j,:] is a one-hot encoding of the jth character of sequence i (so 1 at the right character index and 0 elsewhere). y is a 2D array where y[i,:] is a one-hot encoding of nexts[i].  char_indices is assumed to be the dictionary of the same name created earlier.  (For efficiency, pass dtype=bool to your arrays upon creation.)

In [19]:
def make_sequences(text, seqlen, step):
  # TODO
  sentences = []
  next_chars = []
  i = i
  while i + seqlen < len(text):
    sentences.append(text[i : i+seqlen])
    next_chars.append(text[i + seqlen])
    i += step

  return sentences, next_chars

In [20]:
make_sequences('example!', 3, 2)

(['exa', 'amp', 'ple'], ['m', 'l', '!'])

In [27]:
def to_one_hot(seqs, nexts, char_indices):
  # TODO
  X = np.zeros(shape=(len(seqs), len(seqs[0]), len(char_indices)), dtype=bool)
  y = np.zeros(shape=(len(seqs), len(char_indices)), dtype=bool)
  for i in range(len(seqs)):
    for j in range(len(seqs[i])):
      X[i][j][char_indices[seqs[i][j]]] = 1
    y[i][char_indices[nexts[i]]] = 1
  return X, y

In [28]:
# Create a tiny dict for testing purposes
test_chars = sorted(list(set('example!')))
test_char_indices = dict((c, i) for i, c in enumerate(test_chars))
seqs, nexts = make_sequences('example!', 3, 2)
to_one_hot(seqs, nexts, test_char_indices)
# Examine the one-hot encodings ... do they make sense for this example?

(array([[[False, False,  True, False, False, False, False],
         [False, False, False, False, False, False,  True],
         [False,  True, False, False, False, False, False]],
 
        [[False,  True, False, False, False, False, False],
         [False, False, False, False,  True, False, False],
         [False, False, False, False, False,  True, False]],
 
        [[False, False, False, False, False,  True, False],
         [False, False, False,  True, False, False, False],
         [False, False,  True, False, False, False, False]]]),
 array([[False, False, False, False,  True, False, False],
        [False, False, False,  True, False, False, False],
        [ True, False, False, False, False, False, False]]))

Once you're satisfied with your functions, you can proceed to create the training data from the Frankenstein text.

In [29]:
seqlen = 40
step = 3
seqs, nexts = make_sequences(text, seqlen, step)
X, y = to_one_hot(seqs, nexts, char_indices)

Once the data is in the right format, there's not much to creating a basic LSTM that can train from it and make predictions.  We've omitted just the last layer, the output layer, from the LSTM-based neural network below.  Can you figure out what it should be?  Hint:  the output is a choice of letter, again in the one-hot encoding set up earlier.

In [31]:
model = keras.Sequential(
    [
        keras.Input(shape=(seqlen, len(chars))),
        layers.LSTM(128),
        # TODO:  layers.Dense(last_layer_size???, activation=???),
        layers.Dense(len(char_indices), activation='softmax')
    ]
)
optimizer = keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)


Before we train the model, we'll delve a little bit into generative models, since the model will generate sample predicted sequences with each epoch.

Lecture covered sequence generation with an LSTM using beam search, which creates several possible continuations of the text and chooses the most likely overall.  A different approach is to sample each next letter randomly, as a function of the activation strength for that character.  *How* randomly is a matter of taste and varies from application to application - sometimes you want a generative model to be a little surprising, and sometimes you want it to be as unsurprising as possible.

Take a look at the code for sample(), below.  It implements a common formula for the probability of sampling from a multinomial distribution in a way partly governed by a temperature parameter, T.  What is the formula for the probability of character i, as a function of preds(i) and temperature T?  (Try to simplify if you can.)

In [32]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


**The probability of being picked is $TODO$**

The probability of being picked is $e^{\frac{log(preds)}{temperature}}$

Try the sample() function on the array [0.25, 0.25, 0.5] for very large T (100) and very small T (0.1).  What do high and low temperature do?

In [47]:
# Try rerunning this cell
sample_array = [0.25, 0.25, 0.5]
print(sample(sample_array, 100))
print(sample(sample_array, 0.1))

2
2


**TODO**

Now go ahead and train the LSTM.  You may want to do this in Google Colab with GPU acceleration (Edit->Notebook settings), unless you have a fast GPU of your own.

In [34]:
epochs = 20
batch_size = 128

for epoch in range(epochs):
    model.fit(X, y, batch_size=batch_size, epochs=1)
    print()
    print("Generating text after epoch: %d" % epoch)

    start_index = random.randint(0, len(text) - seqlen - 1)
    for temp in [0.5, 1.0]:
        print("...Temperature:", temp)

        generated = ""
        sentence = text[start_index : start_index + seqlen]
        print('...Generating with seed: "' + sentence + '"')

        for i in range(50):
            x_pred = np.zeros((1, seqlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.0
            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, temp)
            next_char = indices_char[next_index]
            sentence = sentence[1:] + next_char
            generated += next_char

        print("...Generated: ", generated)
        print()



Generating text after epoch: 0
...Temperature: 0.5
...Generating with seed: "th almost no restrictions whatsoever.  y"
...Generated:  ou are the seemet of her ood the accure of the our

...Temperature: 1.0
...Generating with seed: "th almost no restrictions whatsoever.  y"
...Generated:  e une i was ecelizingeng-urhoned beadayd a "und sh


Generating text after epoch: 1
...Temperature: 0.5
...Generating with seed: "hen, but i have endured misery which not"
...Generated:  hed, and the elizabeth of the ever have many strus

...Temperature: 1.0
...Generating with seed: "hen, but i have endured misery which not"
...Generated:   angathion; gutixy om surpolding this truen; that 


Generating text after epoch: 2
...Temperature: 0.5
...Generating with seed: "hy and compassion; he drew a chair close"
...Generated:   sected a more, the creatures of his creature have

...Temperature: 1.0
...Generating with seed: "hy and compassion; he drew a chair close"
...Generated:   ond ulminted mesited the 

You might still be getting some nonsense at epoch 20, although it should be noticeably better than at the start.  This setup for training was chosen with speed of training as the foremost concern, since LSTMs can take a long time to train.  You can provide a longer training corpus - there are rather longer books available in plain text at Project Gutenberg. You can run for more epochs - Chollet, the original author of this example, says 20 is a bare minimum and 40 is recommended.  Or you can try augmenting the LSTM architecture with another layer; there's an example of this in the lecture slides.  (Be sure to set return_sequences to True for the lower LSTM layer.)

Try one of these approaches, and compare your results with a neighbor who chose a different one.