# Text Generation with RNN

In this notebook, we will implement an RNN in Tensorflow for generating text at character-level.

**Note:** This notebook has been created as part of the Encoder-Decoder Architecture course on Google Cloud Skills Boost platform.

## Setup

Here, we are setting up the libraries and reading the dataset.

### Libraries

In [None]:
import os
import time

import numpy as np
import tensorflow as tf

### Dataset

In [None]:
path_to_file = tf.keras.utils.get_file(
    "shakespeare.txt",
    "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [None]:
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(f"Length of text: {len(text)} characters.")

Length of text: 1115394 characters.


In [None]:
print(text[:100])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You


In [None]:
vocab = sorted(set(text))
print(f"{len(vocab)} unique characters.")

65 unique characters.


## Preprocessing

In this section, we will process the text to get it in a format that can be used for training an RNN Encoder-Decoder.

### Forward Mapping
Mapping characters to ids.

In [None]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=list(vocab), mask_token=None
)

### Reverse Mapping
Mapping ids to characters.

In [None]:
chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None
)

**Note:** We are using `ids_from_chars.get_vocabulary()` instead of passing the original vocabulary `vocab` for inverse mapping so that `[UNK]` token gets set too.

Writing a utility function to return as one string a list of ids.

In [None]:
def text_from_ids(ids: list) -> list:
    """
        Function to return as single strings a list of list of ids.

        Arguments:
            ids (list): List of list of ids.

        Returns (list): Returns a list of strings reverse mapped from the ids.
    """
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)