# GRU-based RNN gets inspired by Emily Dickinson and writes poems

We will use a Recurrent Neural Network (RNN) made with Gated Recurrent Unit (GRU) to generate poetry after learning using the **597 poems by Emily Dickinson** dataset.

We use character-based prediction, wherein given a string, the model would output the next character that should follow the given string. We use this strategy to iteratively predict characters until 15 lines of poetry is generated, hoping that neural network outputs resemble a poem by Emily Dickinson.

Notice that because of character-based prediction sometimes meaningless words are also generated while generating the poetry. The generation of meaningless words can be removed by using a word-based prediction scheme (which we will undertake in another notebook).

This notebook uses the concepts described and demonstrated in the Tensorflow documentation ([Text generation with an RNN](https://www.tensorflow.org/tutorials/text/text_generation)). Please refer to the tutorial for better understanding of the notebook.


### Import required modules and load dataset

In [None]:
import tensorflow as tf

import numpy as np
import os
import time

data = open('../input/597-poems-by-emily-dickinson/final-emily.csv','rb')
corpus = data.read().decode(encoding='utf-8').strip()
vocab = sorted(set(corpus))

### Total and unique character counts

In [None]:
print ('Total characters:', len(corpus))
print ('Unique characters', len(vocab))

### Create dictionary mapping characters to integers, and vice versa

In [None]:
character_to_index = {u:i for i, u in enumerate(vocab)}
index_to_character = np.array(vocab)

### Convert text corpus to integer representation

In [None]:
corpus_int = np.array([character_to_index[c] for c in corpus])

### Convert text corpus to dataset

In [None]:
seq_length = 100
examples_per_epoch = len(corpus)//(seq_length+1)

char_dataset = tf.data.Dataset.from_tensor_slices(corpus_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

### Convert dataset to input->output pairs

In [None]:
def split_input_target(chunk):
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text

dataset = sequences.map(split_input_target)

### Set hyperparameters for GRU model

In [None]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024

### Shuffle the dataset

In [None]:
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### Create GRU model

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[BATCH_SIZE, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(len(vocab))
])
    
model.summary()

### Set checkpoint directory and filename

In [None]:
checkpoint_dir = './tmp'

checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Create loss function

In [None]:
def loss_fn(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

### Compile model and train on dataset

In [None]:
model.compile(optimizer='adam', loss=loss_fn)
history = model.fit(dataset, epochs=100, callbacks=[checkpoint_callback])

### Procedure to generate text given starting string

In [None]:
def generate_text(model, start_string):
  num_generate = 15

  input_eval = [character_to_index[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  temperature = 1.0

  model.reset_states()
    
  while(num_generate > 0):
    predictions = model(input_eval)
    predictions = tf.squeeze(predictions, 0)

    predictions = predictions / temperature
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated.append(index_to_character[predicted_id])
    if index_to_character[predicted_id]=='\n':
        num_generate -= 1

  return (start_string + ''.join(text_generated)).strip()

### Load model and change input shape

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(len(vocab), embedding_dim,
                              batch_input_shape=[1, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(len(vocab))
])
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

---

### Generate poem starting with string 'Love'

In [None]:
print(generate_text(model, start_string=u"Love "))

### Generate poem starting with string 'Flower'

In [None]:
print(generate_text(model, start_string=u"Flower "))