# Chapter8 : Generative deep learning

In this chapter, we'll explore from various angles the potential of deep learning to augment artistic creation. We'll review sequence data generation (which can be used to generate text or music), DeepDream, and image generation using both variational autoencoders and generative adversarial networks.

## 8.1 Text generation with LSTM

In this section, we'll explore how recurrent neural networks can be used to generate sequence data.

Sequence data generation is in no way limited to artistic content generation. It has been successfully applied to speech synthesis and to dialogue generation for chatbots.

### 8.1.1 A brief history of generative recurrent networks

### 8.1.2 How do you generate sequence data?

The universal way to generate sequence data in deep learning is to train a network to predict the next token or next few tokens in a sequence, using the previous tokens as input. *Tokens* are typically words or characters, and any network that can model the probability of the next token given the previous ones is called a *language model*. A language model captures the *latent space* of language: its statistical structure.

Once you have such a trained model, you can *sample* from it: you feed it an initial string of text, ask it to generate the next character of next word, add the generated output back to the input data, and repeat the process many times. This loop allows you to generate sequences of arbitrary length that reflect the structure of the data on which the model was trained: sequences that look *almost* like human-written sentences.

In the example we present in this section, you'll take a LSTM layer, feed it strings of *N* characters extracted from a text corpus, and train it to predict character *N+1*. The output of the model will be a softmax over all possible characters: a probability distribution for the next character. This LSTM is called a *character-level neural language model*.

<img src='image/fig81.PNG' width='550'>

### 8.1.3 The importance of the sampling strategy

When generating text, the way you choose the next character is crucially important. A naive approach is *greedy sampling*, consisting of always choosing the most likely next character. A more interesting approach makes slightly more surprising chocies: it introduces randomness in the sampling process, by sampling from the probability distribution for the next character. This is called ***stochastic sampling***.

Sampling probabilistically from the softmax output of the model is neat: it allows even unlikely characters to be sampled some of the time, generating more interesting looking sentences and sometimes showing creativity by coming up with new, realistci sounding words that didn't occur in the training data. But there's one issue with this strategy: it doesn't offer a way to *control the amount of randomness* in the sampling process.

When sampling from generative models, it's always good to explore different amounts of randomness in the generation process. Less entropy will give the generated sequences a more predictable structure, whereas more entropy will result in more surprising and creative sequences.

In order to control the amount of stochasticity in the sampling process, we'll introduce a parameter called the *softmax temperature* that characterizes the entropy of the probability distribution used for sampling: it characterizes how surprising or predictable the choice of the next character will be.

#### Reweighting a probability distribution to a different temperature

In [0]:
import numpy as np

# `original_distribution` is a 1D Numpy array of probability values
# that must sum to 1.
# `temperature` is a factor quantifying the entropy of the ouput distribution.
def reweight_distribution(original_distribution, temperature=0.5):
    distribution = np.log(original_distribution) / temperature
    distribution = np.exp(distribution)
    # Returns a reweighted version of the original distribution.
    return distribution / np.sum(distribution)

<img src='image/fig82.PNG' width='500'>

> Higher temperatures result in sampling distributions of higher entropy that will generate more surprising and unstructured generated data, whereas a lower temperature will result in less randomness and much more predictable generated data.

### 8.1.4 Implementing character-level LSTM text generation

Let's put these ideas into practice in a Keras implementation. In this example, you'll use some of the writings of Nietzsche, the late-nineteenth century German philosopher. The language model you'll learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather a more generic model of the English language.

#### Downloading and parsing the initial text file

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
        'nietzsche.txt',
        origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


#### Vectorizing sequences of characters

Next, you'll extract partially overlapping sequences of length `maxlen`, one-hot encode them, and pack them in a 3D Numpy array x of shape *(shape, maxlen, unique_characters)*. Simultaneously, you'll prepare an array *y* containing the corresponding targets: the one-hot-encoded characters that come after each extracted sequence.

In [3]:
# extract sequences of 60 characters
maxlen = 60

# sample a new sequence every three characters
step = 3

# hold the extracted sequences
sentences = []
# holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text)-maxlen, step):
    sentences.append(text[i:i+maxlen])
    next_chars.append(text[i+maxlen])
print('Number of sequences:', len(sentences))

# list of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique charcters:', len(chars))

# dictionary that maps unique characters to their index in the list "chars"
char_indices = dict((char, chars.index(char)) for char in chars)

# one-hot encodes the characters into binary arrays
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique charcters: 57
Vectorization...


#### BUILDING THE NETWORK

This network is a single LSTM layer followed by a Dense classifier and softmax over all possible characters.

#### Single-layer LSTM model for next-character prediction

In [6]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_2 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


#### Model compilation configuration

In [0]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer)

#### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT

Given a trained model and a seed text snippet, you can generate new text by doing the following repeatedly:
  1. Draw from the model a probability distribution for the next character, given the generated text available so far.
  2. Reweight the distribution to a certain temperature.
  3. Sample the next character at random according to the reweighted distribution.
  4. Add the new character at the end of the available text.

#### Function to sample the next character given the model's predictions

In [0]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

#### Text-generation loop

Finally, the following loop repeatedly trains and generates text. You begin generating text using a range of different temperatures after every epoch. This allows you to see how the generated text evolves as the model begins to converge, as well as the impact of temperature in the sampling strategy.

In [13]:
import random
import sys

# trains the model for 60 epochs
for epoch in range(1, 10):
    print('epoch', epoch)
    # fits the model for one iteration on the data
    model.fit(x, y, batch_size=128, epochs=1)
    # selects a text seed at random
    start_index = random.randint(0, len(text)-maxlen-1)
    generated_text = text[start_index:start_index+maxlen]
    print('\n\n--- Generating with seed: ' + generated_text + '*')
    
    # tries a range of different sampling temperatures
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('\n----- temperature:', temperature)
        sys.stdout.write(generated_text)
        
        # generates 400 characters, starting from the seed text
        for i in range(400):
            # one-hot encodes the characters generated so far
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.
            # samples the next character
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            
            generated_text += next_char
            generated_text = generated_text[1:]
            
            sys.stdout.write(next_char)

epoch 1
Epoch 1/1


--- Generating with seed: he
last, as it seems to me, who has offered a sacrifice to h*

----- temperature: 0.2
he
last, as it seems to me, who has offered a sacrifice to her a compers of the same and the same the same and all the belief the sense and the same and all a compers in the same and all the expressing and all a still and acts and all the compers of the present and the present of the same and the indification of the sense and all the same and all and all the case of the master of the life of the same and the case of the same and in the commen and all and a
----- temperature: 0.5
ame and the case of the same and in the commen and all and all the man consequently for the same of the exteption and in himself and the same as a values of the present
in the day now the bad and man as a sumely and the last to less and nor have been plature of enemys in the preded the from the belie of the same origin of the how has not in the denectation and finally and of be and

> As you can see, a low temperature value results in extremely repetitive and predictable text, but local structure is highly realistic: in particular, all words are real English words. With higher temperatures, the generated text becomes more interesting, surprising, even creative. Also, the local structure starts to break down, and most words look like semi-random strings of characters. Without doubt, 0.5 is the most interesting temperature for text generation in this specific setup.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that look much more coherent and realistic than this one. But, don't expect to ever generate any meaningful text, other than by random chance: all you're doing is sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there's a distinction between what communications are about and the statistical structure of the messages in which communications are encoded.