Here, we'll explore how recurrent neural networks can be used to generate sequence data. Text generation is only one of those examples of sequence data. 

It can be used on speech synthesis and dialogue generation for chatbots. 

## 8.1.2 How do you generate sequence data

The universal way to generate sequence data in deep learning is to train a network (RNN or convnet most of the times) to predict the next token or next few tokens in a sequence, using the previous token as input. Tokens are usually words or characters, and a model than can model this probability is called a *language model*. This languge model captures the *latent space* of language: its statistical structure.

Once you have a trained language model, you can sample from it: you feed it an initial string of text (conditioning data), ask it to generate the next character (or word), add the generated output back to the input data, and repeat the process. The output of such a model will be a softmax over all possible characters.

## 8.1.3 The importance of the sampling strategy

When generating text, the way you choose the next character is cruacially important. One approach is the *greedy sampling*, which chooses the most likely next character. This approach results in repetitive, predictable strings that don't look like coherent language. 

A more interesting approach is to introduce randomness in the sampling process, that is, sampling from the probability distribution for the next character. This is called *stochastic sampling*

Sampling from this probability distribution introduces unlikely characters to be sampled some of the time, generating interesting sentences. But you can not *control the amount of randomness* in the sampling process, and that is a knob you would like to have. 

To control the amount of randomness in the sampling process, a parameter called *softmax temperature* was introduced. This parameter characterizes the entropy of the probability distribution used for sampling. Given a `temperature` value, a new probability distribution is computed from the original softmax one, by reweighting it in the following way:

### L8.1 Reweighting a probability distribution to a different temperature

```python
def reweight_distribution(original_distribution, temperature=0.5):
  distribution = np.log(original_distribution) / temperature
  distribution = np.exp(distribution)
  return distribution / np.sum(distribution)
```

Higher temperature results in sampling distributions of higher entropy that will generate surprising and unstructured generated data. 

## 8.1.4 Implementing character-level LSTM text generation

Let's see an implementation in Keras. The first thing we need is a lot of data, in this case we'll use some writings from Nietzsche. 

### Preparing the data
### L8.2 Downloading and parsing the initial text file


In [1]:
cd /content/drive/My Drive/kaggle/

/content/drive/My Drive/kaggle


In [5]:
!pip install keras==2.0.8

Collecting keras==2.0.8
[?25l  Downloading https://files.pythonhosted.org/packages/67/3f/d117d6e48b19fb9589369f4bdbe883aa88943f8bb4a850559ea5c546fefb/Keras-2.0.8-py2.py3-none-any.whl (276kB)
[K     |█▏                              | 10kB 28.2MB/s eta 0:00:01[K     |██▍                             | 20kB 3.0MB/s eta 0:00:01[K     |███▋                            | 30kB 4.4MB/s eta 0:00:01[K     |████▊                           | 40kB 2.9MB/s eta 0:00:01[K     |██████                          | 51kB 3.6MB/s eta 0:00:01[K     |███████▏                        | 61kB 4.3MB/s eta 0:00:01[K     |████████▎                       | 71kB 4.9MB/s eta 0:00:01[K     |█████████▌                      | 81kB 3.9MB/s eta 0:00:01[K     |██████████▊                     | 92kB 4.3MB/s eta 0:00:01[K     |███████████▉                    | 102kB 4.8MB/s eta 0:00:01[K     |█████████████                   | 112kB 4.8MB/s eta 0:00:01[K     |██████████████▎                 | 122kB 4.8MB/s

In [1]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt', 
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('corpus length:', len(text))

Using TensorFlow backend.


corpus length: 600893


In [3]:
print(keras.__version__)

2.0.8


Now we'll extract partially overlapping sequences of length `maxlen`, one-hot encode them, and pack them in a 3D Numpy array of shape `(sequences, maxlen, unique_characters)`. 

We'll also prepare an array `y` containing the corresponding targets: the one-hot-encoded characters that come after each extracted sequence. 

### L8.3 Vectorizing sequences of characters

In [16]:
maxlen = 60 # we'll extract sequences of 60 characters
step = 3 # we'll sample a new sequence every three characters
sentences = [] # Holds the extracted sequences
next_chars = [] # Holds the targets (the follow-up characters)

for i in range(0, len(text) - maxlen, step):
  sentences.append(text[i: i + maxlen])
  next_chars.append(text[i + maxlen])

print('Number of sequences: {}'.format(len(sentences)))

chars = sorted(list(set(text))) # list of unique characters in the corpus
print('Unique characters: {}'.format(len(chars)))
char_indices = dict((char, chars.index(char)) for char in chars) # Dict that maps unique characters to their index in the list 'chars'

print('Vectorization...') # One-hot encodes the characters into binary arrays
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
  for t, char in enumerate(sentence):
    x[i, t, char_indices[char]] = 1
  y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


### Building the network

This network is a `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. (RNNs aren't the only way to do sequence data generation; 1D convnet also have proven extremely successful at this task in recent times)

### L8.4 Single-layer LSTM model for next-character prediction

In [21]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))




Instructions for updating:
keep_dims is deprecated, use keepdims instead


Since our targets are one-hot encoded, we'll use `categorical_crossentropy` as the loss to train the model

### L8.5 Model compilation configuration

In [23]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)





### Training the language model and sampling it

Once you have a trained model and a seed text snippet, you can generate new text by doing the following algorithm repeatedly:
1. Draw from the model a probability distribution for the next character, given the generated text available so far
2. Reweight hte distribution to a certain temperature
3. Sample the next character at random according to the reweight distribution
4. Add the new character at the end of the available text

This is the code you use to reweight the original probability distribution coming out of the model and draw a character index from it (the *sampling function*)

### L8.6 Function to sample the next character given the model's predictions

In [0]:
def sample(preds, temperature=1.0):
  preds = np.asarray(preds).astype('float64')
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds)
  probas = np.random.multinomial(1, preds, 1)
  return np.argmax(probas)

Now, this next loop is to train and generate text.  
We'll use different temperatures to see its effect  in the generated text.

### L8.7 Text-generation loop

In [25]:
import random
import sys

for epoch in range(1, 60):
  print('epoch', epoch)
  model.fit(x, y, batch_size=128, epochs=1) # fits the model for one iteration on the data 
  start_index = random.randint(0, len(text) - maxlen - 1) # selects a text seed at random
  generated_text = text[start_index: start_index + maxlen]
  print('--- generating with seed: "{}"'.format(generated_text))

  for temperature in [0.2, 0.5, 1.0, 1.2]: # tries a range of different sampling temperatures
    print('--- temperature:{}'.format(temperature))
    sys.stdout.write(generated_text)

    for i in range(400): # generates 400 characters, starting from the seed text
      sampled = np.zeros((1, maxlen, len(chars))) # one-hot encodes the characters generated so far
      for t, char in enumerate(generated_text):
        sampled[0, t, char_indices[char]] = 1.

      preds = model.predict(sampled, verbose=0)[0] # samples the next character
      next_index = sample(preds, temperature)
      next_char = chars[next_index]

      generated_text += next_char
      generated_text = generated_text[1:]

      sys.stdout.write(next_char)

epoch 1
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor



Epoch 1/1




--- generating with seed: " cur non sub alta vel platano vel hac
     pinu jacentes."[2"
--- temperature:0.2
 cur non sub alta vel platano vel hac
     pinu jacentes."[2

























































zere the still the such a conself the stand and also it is the such and all to which a scient and all the station of the station of the station of the station of the such and the station of the still the such a contration of the stand the still to the stand which the still to the stand and all the present and and and the stand and and still--- temperature:0.5
tand and all the present and and and the stand and and still the stating the great is the still to station and lith the stand to a sciented of dusing the german of the self-i

  This is separate from the ipykernel package so we can avoid doing imports until


a mora--- temperature:1.2
 been likending, would be muscienced soud.

our werthia morariance
of the hiccage of the culthing
oncie to naye,
fundimptionhy. and path
of man undevition" themselve, absuast isrigh at orleration all a
feelony. 
found talons will be
veniging-macurity,
will could heableol,
in their dethran will-so greaterant are due
throse is may say
thoughtly amin whe only is no one of oticurpo incenting" before, shatters wels princimal
advaction has liticinarly, it is queepoch 8
Epoch 1/1
--- generating with seed: "what was formerly called
one's "good conscience," that long,"
--- temperature:0.2
what was formerly called
one's "good conscience," that long, the soul when the contradict of the world of the world and such as a man and sacrifice the sense and the explanation of the strength and sacrifice and the consequently of the such and sacrifice of the such a sense of the same to the same the consequently and such a conscience the spirit of the present that the such a desire 

KeyboardInterrupt: ignored

*(interrupted training when loss stopped improving)*


As expected, low temperature values give predictable and repetitive texts, but local structure is realistic (words are all real english words). By increasing temperature you start to see surprising and creative results. Sometimes you see invented words that might sound plausible. 

With a high temperature, the local structure starts to break down, with mostly made up words and patterns that do not make sense, e.g. mix of letters and numbers. 

A bigger model, with longer training, more data will give more coherent and realistic generated samples. 

## 8.1.5 Wrapping up
- You can generate discrete sequence data by training a model to predict the next token(s)m given previous tokens
- In the case of text, such a model is called *language model*. It can be based on either words or characters
- Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness
- One way to handle this is the notion of softmax temperature. Always experiment with different temperature to find the right one