For more code resources: https://github.com/tensorflow/tpu/tree/master/tools/colab

# Objectives



*   Build a two-layer, forward-LSTM model.
*   Use distribution strategy to produce a `tf.keras` model that runs on TPU version and then use the standard Keras methods to train: `fit`, `predict`, and `evaluate`.
* Use the trained model to make predictions and generate your own Shakespeare-esque play.



### 1. Download data

Download The Complete Works of William Shakespeare as a single text file from [Project Gutenberg]().

In [None]:
#  "wget" download a text file from the specified URL and save it as "shakespeare.txt" in the "/content/" directory.
!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt

--2023-08-05 08:37:57--  http://www.gutenberg.org/files/100/100-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.gutenberg.org/files/100/100-0.txt [following]
--2023-08-05 08:37:57--  https://www.gutenberg.org/files/100/100-0.txt
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5657857 (5.4M) [text/plain]
Saving to: ‘/content/shakespeare.txt’


2023-08-05 08:37:58 (18.0 MB/s) - ‘/content/shakespeare.txt’ saved [5657857/5657857]



### 2. Build the input dataset

In [None]:
!head -n5 /content/shakespeare.txt    # "head" display the beginning (head) of a file. And "-n5" display the first 5 lines.
!echo "..."     # prints three dots "..." as a separator
!shuf -n5 /content/shakespeare.txt    # The "shuf" command is used to shuffle or randomize lines in a file. The "-n5" option specifies that it should select 5 lines randomly.

﻿The Project Gutenberg eBook of The Complete Works of William Shakespeare, by William Shakespeare

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
...
Call up my brother. O, would you had had her!
Since Henry’s death, I fear, there is conveyance.

May it be possible that foreign hire
Puts bars between the owners and their rights!


 This code is creating a TensorFlow dataset for training a language model using Shakespeare's text. It reads the text from a file, converts it into an array of integers, splits the text into input and target sequences, shuffles and batches the sequences, and finally, repeats the dataset to enable multiple epochs of training.

`ord(c)` returns the ASCII value of a single character c. For example, ord('a') would return 97.

`tf.io.gfile.GFile` allows TensorFlow to handle a wide range of storage systems, such as local files, Google Cloud Storage, HDFS, Amazon S3, etc., without needing to change your code.

you can also use Python's built-in `open` function for file I/O. However, using `tf.io.gfile.GFile` is recommended when working within TensorFlow environments to maintain compatibility across different storage systems.

In [None]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
import os

# Check if the TensorFlow version is compatible with this notebook
import distutils
if distutils.version.LooseVersion(tf.__version__) < '2.0':
    raise Exception('This notebook is compatible with TensorFlow 2.0 or higher.')

# Define the path to the input text file containing Shakespeare's text
SHAKESPEARE_TXT = '/content/shakespeare.txt'

# Function to transform a string of text into an array of integer values
def transform(txt):
    return np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)

# Function to create a TensorFlow dataset for training sequences
def input_fn(seq_len=100, batch_size=1024):
    """Return a dataset of source and target sequences for training."""

    # Read the content of the input text file
    with tf.io.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
        txt = f.read()

    # Convert the text into a sequence of integer values using the transform function
    source = tf.constant(transform(txt), dtype=tf.int32)

    # Create a TensorFlow dataset with sequences of length 'seq_len' and drop any remainder
    ds = tf.data.Dataset.from_tensor_slices(source).batch(seq_len+1, drop_remainder=True)       # It's a common approach to prepare data for training RNNs

    # Define a function to split input and target sequences for each chunk in the dataset
    def split_input_target(chunk):
        input_text = chunk[:-1]
        target_text = chunk[1:]
        return input_text, target_text

    # Set the buffer size for shuffling the dataset
    BUFFER_SIZE = 10000

    # Apply the split_input_target function to the dataset, shuffle it, and batch it
    ds = ds.map(split_input_target).shuffle(BUFFER_SIZE).batch(batch_size, drop_remainder=True)

    # Repeat the dataset indefinitely to allow multiple epochs during training
    return ds.repeat()


  if distutils.version.LooseVersion(tf.__version__) < '2.0':


### 3. Build the model

The model is defined as a two-layer, forward-LSTM, the same model should work both on CPU and TPU.

Because our vocabulary size (*number of unique words/tokens present in a given text corpus or dataset.*) is 256, the input dimension to the Embedding layer is 256.

(*For example: The vocabulary size for this sentence would be 9 because there are 9 unique words: "The," "quick," "brown," "fox," "jumps," "over," "the," "lazy," and "dog."*)

When specifying the arguments to the LSTM, it is important to note how the stateful argument is used. When training we will make sure that `stateful=False` because we do want to reset the state of our model between batches, but when sampling (computing predictions) from a trained model, we want `stateful=True` so that the model can retain information across the current batch and generate more interesting text.

In [None]:
# Define the embedding dimension
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
    """
    Language model: predict the next word given the current word.

    Args:
    seq_len (int): Length of the input sequence.
    batch_size (int): Batch size for training the model.
    stateful (bool): If True, the LSTM layers will maintain state between batches.

    Returns:
    tf.keras.Model: A Keras Model representing the LSTM language model.
    """
    # Define the input layer for the model
    source = tf.keras.Input(
        name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

    # Create an embedding layer to convert integer indices to dense vectors
    embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)

    # Create the first LSTM layer with specified embedding dimension and stateful parameter
    lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)

    # Create the second LSTM layer with specified embedding dimension and stateful parameter
    lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)

    # Apply TimeDistributed dense layer with softmax activation for predicting the next character
    predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)

    # Return the Keras model with input and output layers
    return tf.keras.Model(inputs=[source], outputs=[predicted_char])


### 4. Train the model

First, we need to create a distribution strategy that can use the TPU ( *OR setting up TensorFlow to run on a Google Cloud TPU (Tensor Processing Unit)*). In this case it is TPUStrategy. You can create and compile the model inside its scope. Once that is done, future calls to the standard Keras methods `fit`, `evaluate` and `predict` use the TPU.

Again note that we train with `stateful=False` because while training, we only care about one batch at a time.

In [None]:
# Clear any existing TensorFlow sessions and reset the Keras backend state to avoid conflicts.
tf.keras.backend.clear_session()

# Initialize the TPUClusterResolver to connect to the Cloud TPU.
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])

# Connect to the TPU cluster and set up the TPU system for computation.
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)

# Print the list of logical devices (TPU cores) available for use.
print("All devices: ", tf.config.list_logical_devices('TPU'))

# Create a TPUStrategy object to enable distributed training on the TPU.
strategy = tf.distribute.experimental.TPUStrategy(resolver)

# Enter the strategy scope to execute the following operations on the TPU.
with strategy.scope():
    # Create the LSTM model for training with a sequence length of 100 and stateful set to False.
    training_model = lstm_model(seq_len=100, stateful=False)

    # Compile the model with the RMSprop optimizer, sparse categorical cross-entropy loss,
    # and sparse categorical accuracy as the metric.
    training_model.compile(
        optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
        loss='sparse_categorical_crossentropy',
        metrics=['sparse_categorical_accuracy']
    )

# Train the model using the input_fn() to provide training data, with 100 steps per epoch
# and for a total of 10 epochs.
training_model.fit(
    input_fn(),
    steps_per_epoch=100,
    epochs=10
)

# Save the trained model weights to '/tmp/bard.h5' in HDF5 format, overwriting if the file exists.
training_model.save_weights('/tmp/bard.h5', overwrite=True)





All devices:  [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### 5. Make predictions with the model

Use the trained model to make predictions and generate your own Shakespeare-esque play.
Start the model off with a *seed* sentence, then generate 250 characters from it. The model makes five predictions from the initial seed.

The predictions are done on the CPU so the batch size (5) in this case does not have to be divisible by 8.

Note that when we are doing predictions or, to be more precise, text generation, we set `stateful=True` so that the model's state is kept between batches. If stateful is false, the model state is reset between each batch, and the model will only be able to use the information from the current batch (a single character) to make a prediction.

The output of the model is a set of probabilities for the next character (given the input so far). To build a paragraph, we predict one character at a time and sample a character (based on the probabilities provided by the model). For example, if the input character is "o" and the output probabilities are "p" (0.65), "t" (0.30), others characters (0.05), then we allow our model to generate text other than just "Ophelia" and "Othello."

In [None]:
# Set batch size and prediction length
BATCH_SIZE = 5
PREDICT_LEN = 250

# Create a prediction model using the defined lstm_model with stateful=True
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
# Load pre-trained weights into the prediction_model from a file
prediction_model.load_weights('/tmp/bard.h5')

# Seed the model with the initial string, copied BATCH_SIZE times
seed_txt = 'Looks it not like the king? Verily, we must go! '
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
    # Predict one character at a time for the seed text to prime the model state
    prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
    last_word = predictions[-1]
    next_probits = prediction_model.predict(last_word)[:, 0, :]

    # Sample from the output distribution to get the next character index
    next_idx = [
        np.random.choice(256, p=next_probits[i])
        for i in range(BATCH_SIZE)
    ]
    predictions.append(np.asarray(next_idx, dtype=np.int32))

# Print the generated text for each batch in BATCH_SIZE
for i in range(BATCH_SIZE):
    print('PREDICTION %d\n\n' % i)
    # Extract the i-th batch of predictions
    p = [predictions[j][i] for j in range(PREDICT_LEN)]
    # Convert the predicted character indices back to text
    generated = ''.join([chr(c) for c in p])
    print(generated)
    print()
    # Check if the generated text length matches the desired prediction length
    assert len(generated) == PREDICT_LEN, 'Generated text too short'




TypeError: ignored