<a href="https://colab.research.google.com/github/henrywoo/MyML/blob/master/Copy_of_sequence_models_char.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2018 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Sequence Models: Text Generation Using a RNN

**Learning Objectives:**
* Generate text using a character-based RNN
* Train and test a RNN model

In this Colab, we'll work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data ("Shakespear"), we'll train a model to predict the next character in the sequence ("e").

Before getting started, please do the following:

1.   Make a copy of this Colab notebook. To do so, choose **File**->**Save a copy in Drive**.
2.   Click on "CONNECT" in the top right corner, then choose **Runtime**->**Change runtime type**->**Hardware Accelerator: GPU**.

## Setup


Let's run the next cell to import the libraries.

In [0]:
import os
import time

import tensorflow as tf
tf.enable_eager_execution()

import numpy as np
from matplotlib import pyplot as plt

Next, we load the dataset.

In [0]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

text = open(path_to_file).read()

# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Let's take a look at the first 200 characters in text.


In [0]:
print(text[:200])

## Preparing the data

Start by building a vocabulary and integerizing the characters.

In [0]:
# The unique characters in the file.
vocab = sorted(set(text))
vocab_size = len(vocab)
print ('{} unique characters'.format(len(vocab)))

# Creating a mapping from unique characters to indices.
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

# Show how the first 13 characters from the text are mapped to integers.
print ('{} ---- characters mapped to int ---- > {}'.format(text[:13], text_as_int[:13]))

Then construct the dataset:

In [0]:
# The maximum length sentence we want for a single input in characters
SEQ_LENGTH = 100

# Create training examples / targets
chunks = tf.data.Dataset.from_tensor_slices(text_as_int).batch(SEQ_LENGTH+1, drop_remainder=True)
  
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = chunks.map(split_input_target)
    
# Batch size 
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences, 
# so it doesn't attempt to shuffle the entire sequence in memory. Instead, 
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

def one_hot_encode(x):
  return tf.keras.utils.to_categorical(x, num_classes=vocab_size)

## Construct the model

Try changing the number of units in the code cell below.

We recommend setting the runtime to use GPU to improve performance, but you may change this setting to compare how quickly the training completes without it. To do so, change the variable USE_GPU defined below.

In [0]:
NUM_LSTM_UNITS = [1024]
USE_GPU = True
DROPOUT_RATE = 0.3

assert len(NUM_LSTM_UNITS) > 0

def make_model(generate=False):
  input_length = 1 if generate else SEQ_LENGTH
  batch_size = 1 if generate else BATCH_SIZE
  input_size = vocab_size
  # We want to save state when generating character by character, but not when
  # training in shuffled batches.
  stateful = generate
  model = tf.keras.Sequential()
  if USE_GPU:
    lstm_func = tf.keras.layers.CuDNNLSTM
  else:
    lstm_func = tf.keras.layers.LSTM
  batch_input_shape=(batch_size, input_length, input_size)
  model.add(lstm_func(units=NUM_LSTM_UNITS[0], 
                      return_sequences=True,
                      stateful=stateful,
                      batch_input_shape=batch_input_shape))
  if DROPOUT_RATE > 0:
      model.add(tf.keras.layers.Dropout(DROPOUT_RATE))
  for layer_index in range(1, len(NUM_LSTM_UNITS)):
    model.add(lstm_func(units=NUM_LSTM_UNITS[layer_index],
                        return_sequences=True,
                        stateful=generate))
    if DROPOUT_RATE > 0:
      model.add(tf.keras.layers.Dropout(DROPOUT_RATE))
  model.add(tf.keras.layers.Dense(vocab_size, 
                                  activation=None))
  model.build()
  return model
batch_model = make_model(generate=False)
batch_model.summary()
gen_model = make_model(generate=True)

optimizer = tf.train.AdamOptimizer()
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)


Next, we define a function to generate text given a seed string as input.

In [0]:
# Note that it's very important that the model keep state during text generation
# since we feed it one character at a time.
def generate_text(seed_string='Dog', 
                  chars_to_generate=300,
                  temperature=1.0):
  assert len(seed_string) > 0
  assert chars_to_generate > 0
  gen_model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
  gen_model.reset_states()
  for c in seed_string:
    input_char = one_hot_encode(char2idx[c])
    # Add fake batch and sequence dimensions.
    input_char = tf.expand_dims(input_char, 0)
    input_char = tf.expand_dims(input_char, 0)
    predictions = gen_model(input_char)
  generated_text = []
  for i in xrange(chars_to_generate):
    # Drop the sequence dimension from the predictions.
    predictions = tf.squeeze(predictions, 0)
    predicted_id = tf.multinomial(predictions / temperature, num_samples=1)
    input_char = tf.expand_dims(one_hot_encode(predicted_id), 0)
    predictions = gen_model(input_char)
    generated_text.append(idx2char[predicted_id[0,0]])
  return ''.join(generated_text)


## Train the model

Run the following cell to train the model.

In [0]:
EPOCHS = 100

REINITIALIZE_WEIGHTS = True
if REINITIALIZE_WEIGHTS:
  batch_model = make_model(generate=False)

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
training_losses_per_epoch = []

def plot_loss_over_epochs():
  plt.ylabel("cross entropy loss")
  plt.xlabel("Epochs")
  plt.title("Loss")
  plt.tight_layout()
  plt.plot(training_losses_per_epoch)

training_start = time.time()
print 'Starting to train...'

# Training loop.
for epoch in range(EPOCHS):
    epoch_start = time.time()
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:

              one_hot_input = one_hot_encode(inp)
              predictions = batch_model(one_hot_input)
              loss = loss_function(target, predictions)
              
          grads = tape.gradient(loss, batch_model.variables)
          optimizer.apply_gradients(zip(grads, batch_model.variables))

    # Saving (checkpoint) the model every 5 epochs.
    if (epoch) % 5 == 0:
      batch_model.save_weights(checkpoint_prefix)
      
    print generate_text()

    training_losses_per_epoch.append(loss)
    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print ('Time taken for 1 epoch {} sec\n'.format(time.time() - epoch_start))

    if np.argmin(training_losses_per_epoch) < epoch - 5:
      print 'Stopped training early because best loss has not decreased in 5 epochs'
      break
    
print 'Time taken for {} epochs {} sec\n'.format(epoch + 1, 
                                                 time.time() - training_start)
plot_loss_over_epochs()


## Test the model with different start strings

Try different inputs in the form below to test the model:

In [0]:
#@title Generate Strings {run:"auto"}
seed_string = 'cat'  #@param {type:"string"}
chars_to_generate = 300 #@param {type:"integer"}
temperature = 1.0 #@param {type:"number"}


print generate_text(seed_string=seed_string, 
                    chars_to_generate=chars_to_generate, 
                    temperature=temperature)