## import tensorflow as tf
tf.__version__

----

## 8.100 Text generation

We have seen how DL can analyse data - but can it also create?

2015: Google's DeepDream produced psychedelic images

2016: Prisma turns photos into 'paintings'

2016: [Sunspring](https://www.youtube.com/watch?v=LY7x2Ihqjmc), an experimental film with an LSTM generated script

https://www.youtube.com/watch?v=LY7x2Ihqjmc

2000's: Neural network generated music

Not human replacement, but augmented intelligence

A different kind of intelligence

##### The idea

Artistic creation involves pattern recognition and technical skill - tedious work that can be mechanised

Perceptual modalities, language, artwork and music all have statistical structure and statistical structure can be learned by DL algorithms

DL algorithms can learn a statistical *latent space*

Sampling from the latent space 'creates' new artworks similar to the training data

The algorithm attaches no meaning to the process and the product - but we might

Potentially eliminates technical skill

=> enables free expression; separates art from craft

----

## 8.120 Generating sequence data

RNNs can generate new sequence data

- musical notes

- brushstrokes recorded on an iPad

- handwriting

- speech synthesis

- chatbox dialogue

- Google's Smart Reply (2016) - automatic generation of short replies to emails and text messages.


Train an ANN to predict the next token, or tokens, in a sequence using the previous tokens as input

Text = words or characters

Any text-trained model is known as a *language model*

A language model captures the latent space of language i.e. its statistical structure

1. Train model
2. Present an initial *conditioning* text string
3. Model predicts the next token(s)
4. Add the generated text to the input text
5. Go back to step 3.

----

## 8.130 The sampling strategy

1. *Greedy sampling.* Select the most probable token - repetitive and unrealistic

2. *Stochastic sampling.* Sample from the probability distribution of the next character

Possible to sample from the softmax output which we know produces a 'probability' distribution

But - uncontrollable

*Uniform sampling* Each token has the same probability - maximum randomness  

*Intermediate randomness* controlled by the softmax temperature - this is where we expect to find the more interesting, creative outputs

###### Softmax temperature

Remember, the softmax output is 

$
p_i = \frac{1}{N}e^{x_i}
$

where $N = \sum e^{x_i}$ 

The parameterised softmax distribution is 

$
q_i = \frac{1}{N}e^\frac{x_i}{T}
$

where $N = \sum e^\frac{x_i}{T}$ 

The paramaterised softmax distribution is computed like this:

1. Take logs: $\frac{\log(p_i)}{T} = \frac{x_i}{T} - \frac{N}{T}$
2. Re-exponentiate: $e^{\frac{\log(p_i)}{T}} = c(T) e^{\frac{x_i}{T}} \text{ where }c\text{ is a temperature dependent constant}$
3. Find new normalisation: $N' = \sum c(T) e^{\frac{x_i}{T}}$
4. Temperatured softmax: $q_i = \frac{1}{N'}e^{\frac{x_i}{T}} = \frac{ e^{\frac{x_i}{T}} }{\sum e^{\frac{x_i}{T}}} $

Limiting cases

1. $T \rightarrow 0$ 

$\max\left( \frac{1}{N}e^{\frac{x_i}{T}}\right)$ dominates - greedy sampling

2. $T \rightarrow \infty$ 

$e^{\frac{x_i}{T}} \rightarrow 1$ so $q_i \rightarrow \frac{1}{M}$ where $M$ is the number of softmax outputs - a uniform distribution

----

## 8.140 Implementing character-level LSTM text generation

A large training set is required for a good language model

Any large text file such as Lord of the Rings, or even set of texts such as Wikipedia

We will model the writings of a late C19 German philosopher 



##### Preparing the data

Downloading the corpus and converting to lowercase

In [2]:
import tensorflow.keras as keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893


Extract partially-overlapping sequences of length `maxlen`, one-hot encodes and pack in a 3D Numpy array `x` of shape `(sequences, maxlen, unique_characters)`

Prepare an array `y` containing the corresponding targets: the one-hot encoded characters that come right after each extracted sequence

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


###### Building the network

A single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters

(1D convnets as an alternative)

In [4]:
from tensorflow.keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Targets are one-hot encoded => use `categorical_crossentropy` loss:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

##### Training and sampling the language model

The sampling function:


In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Train and generate text using a range of different temperatures at each epoch end

=> monitor convergence and the impact of temperature

In [7]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
--- Generating with seed: "ngs is wholly
superfluous. it is simply the result of opinio"
------ temperature: 0.2
ngs is wholly
superfluous. it is simply the result of opinion of the the enture the men an in the men the the instinct the externt the mankind the the men the the extence of the exterity of the men an the the still the the consequence of the the are of the men the have a men of the menting the men the the the menter the instinct the still the moral the finder the the the the the prosince of the the there are the the externce of the mankind there are a men 
------ temperature: 0.5
e there are the the externce of the mankind there are a men an the extence of the makent there is liked the wark.

13. there the inventer of one the instill there an to not one is the spirit. the himself, and the prose them and chilosophy and the concemition of the wills the enstance of the act of the an and the extract the great self-act of the religion of his to the mancers and the inthres

KeyboardInterrupt: 

epoch 7
1565/1565 [==============================] - 154s 98ms/step - loss: 1.4088
--- Generating with seed: "o wield the sceptre.
the christian pessimists of practice, h"



------ temperature: 0.2
o wield the sceptre.
the christian pessimists of practice, he is the same same sense of the fact of the other conduct and the same and some and some one of the spirit and all the same stronge and also the superiority in the conduct and some the same sufferess of the same stronger and some states and some of the spirit and assumed the states and proses and stronges of the ancient and sense of the spirit and the are soul and also its conscience of the proves



------ temperature: 0.5
pirit and the are soul and also its conscience of the provessy in the form is be an incertain promasing who has not been lack and formul, and man and such a have the fear, the states in the sense and present morality of the processe of its own who has been assures as a music fore all the other wastle of the education of the still and menit one long to have proper the origin of the heposhing the precisely with the assumerance, which the old pated between fo



------ temperature: 1.0
ecisely with the assumerance, which the old pated between fortem last exercice necessaried--yes original foregnoth and over-pride of shame methally forehated truves of an act oftiness sas.
in feels a man a good perhaps sort and
as modes modes in the cunderstand, and be human to an earistwacher. the be again allow it god ohh dey-sopether of themselves from the cails:
here is when he let be the
enhuring of person one more tiral soul more contemption of the
e


------ temperature: 1.2
ring of person one more tiral soul more contemption of the
evloughtiitions difficed and english them to
could have is every bood higher objection is and  thever keen to the e?fings
togeate, more
typancy is tires evil, a "morality-"antilation as then, impyes yerefdle, perhope wide gro: constants when it . hent consiturable mequerists, "therought, all, undity of theselver
wkbonaus, involboded ley ralitijakilement in
referonge of the
represeffhe, regard on th

Low temperature: extremely repetitive and predictable text, with realistic local structure - almost all words (= a local pattern of characters) are real English words 

Intermediate temperatures: more interesting, surprising, even creative text - sometimes completely new but plausible words are invented 


High temperature: local structure breaks down and most words look random

0.5 is the most interesting temperature in this case

A bigger model, trained for longer and on more data, would achieve more coherent and 
realistic text

But don't expect meaning!

The network is merely sampling a statistical model of which characters follow other characters

----

## 8.150 Wrapping-up

* we can generate discrete sequence data by training a model to predict the next token(s) given previous tokens
* in the case of text, such a model is called a _language model_ 

* based on either words or characters


* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness
* => _softmax temperature_

----