<a href="https://colab.research.google.com/github/jarreed0/school_stuff/blob/main/book_publishing_starter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.layers.experimental import preprocessing

import numpy as np
import os
import time

## I. Parse Text Sources
First we'll load our text sources and create our vocabulary lists and encoders. 

There are ways we could do this in pure python, but using the tensorflow data structures and libraries allow us to keep things super-optimized.

In [2]:
# Load file data
path_to_tom = tf.keras.utils.get_file('tom.txt', 'http://jar.ylimaf.com/books/tom.txt')
path_to_odyssey = tf.keras.utils.get_file('od.txt', 'http://jar.ylimaf.com/books/od.txt')
path_to_alice = tf.keras.utils.get_file('alice.txt', 'http://jar.ylimaf.com/books/alice.txt')
path_to_gatsby = tf.keras.utils.get_file('greatgatsby.txt', 'http://jar.ylimaf.com/books/greatgatsby.txt')
tom = open(path_to_tom, 'rb').read().decode(encoding='utf-8')
odyssey = open(path_to_odyssey, 'rb').read().decode(encoding='utf-8')
alice = open(path_to_alice, 'rb').read().decode(encoding='utf-8')
gatsby = open(path_to_gatsby, 'rb').read().decode(encoding='utf-8')
print('Length of tom sawyer: {} characters'.format(len(tom)))
print('Length of the odyssey: {} characters'.format(len(odyssey)))
print('Length of alice in wonderland: {} characters'.format(len(alice)))
print('Length of the great gatsby: {} characters'.format(len(gatsby)))

text = tom + odyssey + alice + gatsby

print('Length of total text: {} characters'.format(len(text)))


Downloading data from http://jar.ylimaf.com/books/tom.txt
Downloading data from http://jar.ylimaf.com/books/od.txt
Downloading data from http://jar.ylimaf.com/books/alice.txt
Downloading data from http://jar.ylimaf.com/books/greatgatsby.txt
Length of tom sawyer: 392547 characters
Length of the odyssey: 678814 characters
Length of alice in wonderland: 163180 characters
Length of the great gatsby: 270608 characters
Length of total text: 1505149 characters


In [3]:
# Verify the first part of our data
print(text[:200])


THE ADVENTURES OF TOM SAWYER


By Mark Twain

(Samuel Langhorne Clemens)




CONTENTS


CHAPTER I. Y-o-u-u Tom-Aunt Polly Decides Upon her Duty—Tom Practices
Music—The Challenge—A Private Entrance

C


In [4]:
# Now we'll get a list of the unique characters in the file. This will form the
# vocabulary of our network. There may be some characters we want to remove from this 
# set as we refine the network.
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
print(vocab)

153 unique characters
['\t', '\n', ' ', '!', '"', '$', '%', '&', "'", '(', ')', '*', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '}', 'Æ', 'æ', 'ç', 'é', 'ê', 'ô', 'ù', 'Μ', 'Ν', 'α', 'γ', 'δ', 'ε', 'ζ', 'η', 'θ', 'ι', 'κ', 'λ', 'μ', 'ν', 'ο', 'π', 'ρ', 'ς', 'σ', 'τ', 'υ', 'φ', 'χ', 'ω', 'ἀ', 'ἄ', 'ἐ', 'ἔ', 'ἠ', 'ἦ', 'ἰ', 'ἷ', 'Ἰ', 'ὀ', 'ὄ', 'ὡ', 'ὦ', 'ά', 'έ', 'ή', 'ὶ', 'ί', 'ό', 'ὺ', 'ύ', 'ὼ', 'ώ', 'ῆ', 'ῇ', 'ῖ', 'ῥ', 'ῳ', 'ῶ', '\u200a', '—', '‘', '’', '“', '”', '…']


In [5]:
# Next, we'll encode encode these characters into numbers so we can use them
# with our neural network, then we'll create some mappings between the characters
# and their numeric representations
ids_from_chars = preprocessing.StringLookup(vocabulary=list(vocab))
chars_from_ids = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True)

# Here's a little helper function that we can use to turn a sequence of ids
# back into a string:
# turn them into a string:
def text_from_ids(ids):
  joinedTensor = tf.strings.reduce_join(chars_from_ids(ids), axis=-1)
  return joinedTensor.numpy().decode("utf-8")

In [6]:
# Now we'll verify that they work, by getting the code for "A", and then looking
# that up in reverse
testids = ids_from_chars(["T", "r", "u", "t", "h"])
testids

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([49, 76, 79, 78, 66])>

In [7]:
chars_from_ids(testids)

<tf.Tensor: shape=(5,), dtype=string, numpy=array([b'T', b'r', b'u', b't', b'h'], dtype=object)>

In [8]:
testString = text_from_ids( testids )
testString

'Truth'

## II. Construct our training data
Next we need to construct our training data by building sentence chunks. Each chunk will consist of a sequence of characters and a corresponding "next sequence" of the same length showing what would happen if we move forward in the text. This "next sequence" becomes our target variable.

For example, if this were our text:

> It is a truth universally acknowledged, that a single man in possession
of a good fortune, must be in want of a wife.

And our sequence length was 10 with a step size of 1, our first chunk would be:

* Sequence: `It is a tr`
* Next Sequence: `t is a tru`

Our second chunk would be:

* Sequence: `t is a tru`
* Next Word: ` is a trut`



In [9]:
# First, create a stream of encoded integers from our text
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1505149,), dtype=int64, numpy=array([ 2, 49, 37, ...,  2,  2,  2])>

In [10]:
# Now, convert that into a tensorflow dataset
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [11]:
# Finally, let's batch these sequences up into chunks for our training
seq_length = 100
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

# This function will generate our sequence pairs:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

# Call the function for every sequence in our list to create a new dataset
# of input->target pairs
dataset = sequences.map(split_input_target)

In [12]:
# Verify our sequences
for input_example, target_example in  dataset.take(1):
    print("Input: ", text_from_ids(input_example))
    print("--------")
    print("Target: ", text_from_ids(target_example))

Input:  
THE ADVENTURES OF TOM SAWYER


By Mark Twain

(Samuel Langhorne Clemens)




CONTENTS


CHAPTER I. 
--------
Target:  THE ADVENTURES OF TOM SAWYER


By Mark Twain

(Samuel Langhorne Clemens)




CONTENTS


CHAPTER I. Y


In [13]:
# Finally, we'll randomize the sequences so that we don't just memorize the books
# in the order they were written, then build a new streaming dataset from that.
# Using a streaming dataset allows us to pass the data to our network bit by bit,
# rather than keeping it all in memory. We'll set it to figure out how much data
# to prefetch in the background.

BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<PrefetchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## III. Build the model

Next, we'll build our model. Up until this point, you've been using the Keras symbolic, or imperative API for creating your models. Doing something like:

    model = tf.keras.models.Sequentla()
    model.add(tf.keras.layers.Dense(80, activation='relu))
    etc...

However, tensorflow has another way to build models called the Functional API, which gives us a lot more control over what happens inside the model. You can read more about [the differences and when to use each here](https://blog.tensorflow.org/2019/01/what-are-symbolic-and-imperative-apis.html).

We'll use the functional API for our RNN in this example. This will involve defining our model as a custom subclass of `tf.keras.Model`.

If you're not familiar with classes in python, you might want to review [this quick tutorial](https://www.w3schools.com/python/python_classes.asp), as well as [this one on class inheritance](https://www.w3schools.com/python/python_inheritance.asp).

Using a functional model is important for our situation because we're not just training it to predict a single character for a single sequence, but as we make predictions with it, we need it to remember those predictions as use that memory as it makes new predictions.


In [14]:
# Create our custom model. Given a sequence of characters, this
# model's job is to predict what character should come next.
class TextModel(tf.keras.Model):

  # This is our class constructor method, it will be executed when
  # we first create an instance of the class 
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)

    # Our model will have three layers:
    
    # 1. An embedding layer that handles the encoding of our vocabulary into
    #    a vector of values suitable for a neural network
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    # 2. A GRU layer that handles the "memory" aspects of our RNN. If you're
    #    wondering why we use GRU instead of LSTM, and whether LSTM is better,
    #    take a look at this article: https://datascience.stackexchange.com/questions/14581/when-to-use-gru-over-lstm
    #    then consider trying out LSTM instead (or in addition to!)
    self.gru = tf.keras.layers.GRU(rnn_units, return_sequences=True, return_state=True)

    # 3. Our output layer that will give us a set of probabilities for each
    #    character in our vocabulary.
    self.dense = tf.keras.layers.Dense(vocab_size)

  # This function will be executed for each epoch of our training. Here
  # we will manually feed information from one layer of our network to the 
  # next.
  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs

    # 1. Feed the inputs into the embedding layer, and tell it if we are
    #    training or predicting
    x = self.embedding(x, training=training)

    # 2. If we don't have any state in memory yet, get the initial random state
    #    from our GRUI layer.
    if states is None:
      states = self.gru.get_initial_state(x)
    
    # 3. Now, feed the vectorized input along with the current state of memory
    #    into the gru layer.
    x, states = self.gru(x, initial_state=states, training=training)

    # 4. Finally, pass the results on to the dense layer
    x = self.dense(x, training=training)

    # 5. Return the results
    if return_state:
      return x, states
    else: 
      return x

In [15]:
# Create an instance of our model
vocab_size=len(ids_from_chars.get_vocabulary())
embedding_dim = 256
rnn_units = 1024

model = TextModel(vocab_size, embedding_dim, rnn_units)

In [16]:
# Verify the output of our model is correct by running one sample through
# This will also compile the model for us. This step will take a bit.
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")


(64, 100, 154) # (batch_size, sequence_length, vocab_size)


In [17]:
# Now let's view the model summary
model.summary()

Model: "text_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  39424     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  157850    
                                                                 
Total params: 4,135,578
Trainable params: 4,135,578
Non-trainable params: 0
_________________________________________________________________


## IV. Train the model

For our purposes, we'll be using [categorical cross entropy](https://machinelearningmastery.com/cross-entropy-for-machine-learning/) as our loss function*. Also, our model will be outputting ["logits" rather than normalized probabilities](https://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow), because we'll be doing further transformations on the output later. 


\* Note that since our model deals with integer encoding rather than one-hot encoding, we'll specifically be using [sparse categorical cross entropy](https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other).

In [18]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss)

history = model.fit(dataset, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## V. Use the model

Now that our model has been trained, we can use it to generate text. As mentioned earlier, to do so we have to keep track of its internal state, or memory, so that we can use previous text predictions to inform later ones.

However, with RNN generated text, if we always just pick the character with the highest probability, our model tends to get stuck in a loop. So instead we will create a probability distribution of characters for each step, and then sample from that distribution. We can add some variation to this using a paramter known as ["temperature"](https://cs.stackexchange.com/questions/79241/what-is-temperature-in-lstm-and-neural-networks-generally).

In [19]:
# Here's the code we'll use to sample for us. It has some extra steps to apply
# the temperature to the distribution, and to make sure we don't get empty
# characters in our text. Most importantly, it will keep track of our model
# state for us.

class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature=temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "" or "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['','[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices = skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())]) 
    self.prediction_mask = tf.sparse.to_dense(sparse_mask,validate_indices=False)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits] 
    predicted_logits, states =  self.model(inputs=input_ids, states=states, 
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    
    # Apply the prediction mask: prevent "" or "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Return the characters and model state.
    return chars_from_ids(predicted_ids), states


In [20]:
# Create an instance of the character generator
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

# Now, let's generate a 1000 character chapter by giving our model "Chapter 1"
# as its starting text
states = None
next_char = tf.constant(['Chapter 1'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature=temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "" or "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['','[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices = skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())]) 
    self.prediction_mask = tf.sparse.to_dense(sparse_mask,validate_indices=False)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits] 
    predicted_logits, states =  self.model(inputs=input_ids, states=states, 
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    
    # Apply the prediction mask: prevent "" or "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Return the characters and model state.
    return chars_from_ids(predicted_ids), states

result = tf.strings.join(result)

# Print the results formatted.
print(result[0].numpy().decode('utf-8'))




Chapter 1,Ventapogus took the world step in my homeway from
 Yether towards missis Jordand. That Dawn sat down, and Tom was above bene, Tom Walks
by night for half an hour lay upon Alice. I all right over
the first pite drawn old things than she is a very disgraceful sun, whom get out of
the cloisters, and sprinkling his way to the library, so it came out, came all
powly16 in the Gatsby’s beint on a thich gleaminess to believe she could be very whole counted was
going to be some god can come back to Gatsby’s ground,
without a worse’ll be told she tried it, you can think of
for him, but when we has not send any on holy unbox _I filled with the best
of them, but only whitewashed, he is still made a car come back to my own
door and shabby our pirates and young clear wholestly she was
when I beautifully after me. When, however, others would have done before he
found any one (could find it ever lost in asking off the
room. He saw this mortal
group of the Termaning, and the women up will be 

## VI. Next Steps

This is a very simple model with one GRU layer and then an output layer. However, considering how simple it is and the fact that we are predicting outputs character by character, the text it produces is pretty amazing. Though it still has a long way to go before publication.

There are many other RNN architectures you could try, such as adding additional hidden dense layers, replacing GRU with one or more LSTM layers, combining GRU and LSTM, etc...

You could also experiment with better text cleanup to make sure odd punctuation doesn't appear, or finding longer texts to use. If you combine texts from two authors, what happens? Can you generate a Jane Austen stageplay by combining austen and shakespeare texts?

Finally, there are a number of hyperparameters to tweak, such as temperature, epochs, batch size, sequence length, etc...

In [21]:
dataset

<PrefetchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

In [22]:
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024

In [23]:
class next_step_model(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [24]:
model = next_step_model(
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [25]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 154) # (batch_size, sequence_length, vocab_size)


In [26]:
model.summary()

Model: "next_step_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     multiple                  39424     
                                                                 
 gru_1 (GRU)                 multiple                  3938304   
                                                                 
 dense_1 (Dense)             multiple                  157850    
                                                                 
Total params: 4,135,578
Trainable params: 4,135,578
Non-trainable params: 0
_________________________________________________________________


In [27]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()
sampled_indices


array([129,  21, 137,  38,  14,  30, 112,  81,  25,   4,  51,  84,  97,
        85,  39,  52,  16,  43,  98,  58, 115,  68,  30,  69, 145,  28,
        22,  96,  57,  95,  58, 118, 117,   6,  29,  44,  87,  54,  43,
        70, 102,  79,  56,  95, 131,  14, 115,  19,  65,  82,  83, 133,
        33,  42,   9, 153,   7,   9,  22, 132,  47,  79, 125,  97,  24,
        91,  14, 135,  95,  42, 125,  95,  64, 137,  44,  76,  37,  90,
        84,  60,  14, 109,  63,  16,  67,  53,  17,  77,  97,  44,  41,
        32,  39,  75,  87, 101,  61,  61,  69, 115])

In [28]:
print("Input:\n", chars_from_ids(input_example_batch[0]))
print()
print("Next Char Predictions:\n", chars_from_ids(sampled_indices))

Input:
 tf.Tensor(
[b's' b' ' b'a' b'n' b's' b'w' b'e' b'r' b'e' b'd' b',' b' '
 b'\xe2\x80\x9c' b'T' b'h' b'e' b' ' b'f' b'a' b'u' b'l' b't' b',' b' '
 b'f' b'a' b't' b'h' b'e' b'r' b',' b' ' b'i' b's' b' ' b'm' b'i' b'n'
 b'e' b',' b' ' b'a' b'n' b'd' b' ' b'm' b'i' b'n' b'e' b' ' b'o' b'n'
 b'l' b'y' b';' b' ' b'I' b' ' b'l' b'e' b'f' b't' b'\n' b't' b'h' b'e'
 b' ' b's' b't' b'o' b'r' b'e' b' ' b'r' b'o' b'o' b'm' b' ' b'd' b'o'
 b'o' b'r' b' ' b'o' b'p' b'e' b'n' b',' b' ' b'a' b'n' b'd' b' ' b't'
 b'h' b'e' b'y' b' ' b'h' b'a'], shape=(100,), dtype=string)

Next Char Predictions:
 tf.Tensor(
[b'\xe1\xbd\xa1' b'4' b'\xe1\xbd\xba' b'I' b'-' b'A' b'\xcf\x83' b'w' b'8'
 b'!' b'V' b'z' b'\xce\xb3' b'{' b'J' b'W' b'/' b'N' b'\xce\xb4' b'_'
 b'\xcf\x86' b'j' b'A' b'k' b'\xe1\xbf\xb3' b';' b'5' b'\xce\xb1' b']'
 b'\xce\x9d' b'_' b'\xe1\xbc\x80' b'\xcf\x89' b'$' b'?' b'O' b'\xc3\x86'
 b'Y' b'N' b'l' b'\xce\xb8' b'u' b'[' b'\xce\x9d' b'\xe1\xbd\xb1' b'-'
 b'\xcf\x86' b'2' b'g' b'x' b'y' b'

In [29]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)


In [30]:
example_batch_loss = loss(target_example_batch, example_batch_predictions)
mean_loss = example_batch_loss.numpy().mean()
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", mean_loss)

Prediction shape:  (64, 100, 154)  # (batch_size, sequence_length, vocab_size)
Mean loss:         5.0366707


In [31]:
tf.exp(mean_loss).numpy()

153.95659

In [32]:
model.compile(optimizer='adam', loss=loss)

In [33]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [34]:
EPOCHS = 30
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [36]:
class step(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

#result = tf.strings.join(result)

#print(result[0].numpy().decode('utf-8'))

In [37]:
mod = step(model, chars_from_ids, ids_from_chars)

In [38]:
start = time.time()
states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

for n in range(1000):
  next_char, states = mod.generate_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

The world seemed like such a peaceful place until the magic tree was discovered in London.

“Then you have been so much will tell you something. Don’t want to foller
Premendia Souph Outrolic, and tell the women to tell their
corns of deladcholf; for the name sprang of facutes and the maids were
at once to start with blackful and other people. He had lost three ships,
they went on eaten to carry it over with Sic and Mars and
Venus to the seats, knew it is bore, but Ulysses
had finished his son Unopse with Paris, and Polybus the shadowsponce of the
heavy time in it nearest a year-hay, even no girl had grownuple, and then broad day long
tone: “how at your giants, Daisy, dee wheeler Clear will
my son though this would be the use of wreath, and other wors are a runsom
and cease you.”

“I wonder.” Better have obliged that Diana feet
I fixed on her face for an Antipattito and said, “Stranger, of course you are
come over on my return for lonely for what they can sail to this
mind. The rail of 

In [39]:
start = time.time()
states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

for n in range(1000):
  next_char, states = mod.generate_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result, '\n\n' + '_'*80)
print('\nRun time:', end - start)

tf.Tensor(
[b'The world seemed like such a peaceful place until the magic tree was discovered in London.\n\nVery merely came home, and finally spread a bath broke.\n\n\xe2\x80\x9cHe must are.\xe2\x80\x9d\n\n\xe2\x80\x9cWell why don\xe2\x80\x99t you was in Christ!\xe2\x80\x9d thought Alice. \xe2\x80\x9cOne that\xe2\x80\x99s just\nto you offend,\xe2\x80\x9d I told them to catch the beam of or frain my\narrivable mortal and find noble. And the King lay when death\nshate few honsequest long before, because reading everything to\nsome central delly. They were now satisfied.\nEach looked like altogether of perfect purpled questions, and had two\nprecious pieces of clay with thick bedroom in the matter of dark\nnorth, with a goats for their ring. They found a show, brother and ashaped more than\nan only sons servant in the other end of the stairs. Then Pieret a mest\nof making a despect at the objiceid word to be milked:\n\n\xe2\x80\x9c\xe2\x80\x98\xe2\x80\x94 I leave these things about you. 

In [40]:
tf.saved_model.save(mod, 'text-gen')
text_gen = tf.saved_model.load('text-gen')





INFO:tensorflow:Assets written to: text-gen/assets


INFO:tensorflow:Assets written to: text-gen/assets


In [43]:
states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

for n in range(100):
  next_char, states = text_gen.generate_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode("utf-8"))

The world seemed like such a peaceful place until the magic tree was discovered in London.

I will fix does the wholesh, old friend or men, as it is, we pleased sulder, you not bring
him to 


In [44]:
class Training(next_step_model):
  @tf.function
  def train_step(self, inputs):
      inputs, labels = inputs
      with tf.GradientTape() as tape:
          predictions = self(inputs, training=True)
          loss = self.loss(labels, predictions)
      grads = tape.gradient(loss, model.trainable_variables)
      self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

      return {'loss': loss}

In [45]:
model = Training(
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [46]:
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

In [47]:
model.fit(dataset, epochs=1)



<keras.callbacks.History at 0x7fed8587d210>

In [48]:
EPOCHS = 30

mean = tf.metrics.Mean()

for epoch in range(EPOCHS):
    start = time.time()

    mean.reset_states()
    for (batch_n, (inp, target)) in enumerate(dataset):
        logs = model.train_step([inp, target])
        mean.update_state(logs['loss'])

        if batch_n % 50 == 0:
            template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}"
            print(template)

    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))

    print()
    print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
    print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
    print("_"*80)

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 2.0676
Epoch 1 Batch 50 Loss 1.9913
Epoch 1 Batch 100 Loss 1.8678
Epoch 1 Batch 150 Loss 1.7819
Epoch 1 Batch 200 Loss 1.7545

Epoch 1 Loss: 1.8743
Time taken for 1 epoch 43.72 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.6931
Epoch 2 Batch 50 Loss 1.6132
Epoch 2 Batch 100 Loss 1.5528
Epoch 2 Batch 150 Loss 1.5532
Epoch 2 Batch 200 Loss 1.5082

Epoch 2 Loss: 1.5966
Time taken for 1 epoch 42.86 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.4808
Epoch 3 Batch 50 Loss 1.4945
Epoch 3 Batch 100 Loss 1.4096
Epoch 3 Batch 150 Loss 1.4527
Epoch 3 Batch 200 Loss 1.4294

Epoch 3 Loss: 1.4511
Time taken for 1 epoch 42.90 sec
________________________________________________________________________________
Epoch 4 Batch 0 Loss 1.3468
Epoch 4 Batch 50 Loss 1.3510
Epoch 4 Batch 100 Loss 1.3914
Epoch 4 Batch 150 Loss 1.3708
Epoch 4 Batch 200 Loss 1.3473

Epo

In [49]:
#Prev model used several books combined
#next model will test with several books by the same author
path_to_finn = tf.keras.utils.get_file('76.txt.txt', 'https://www.gutenberg.org/files/76/76-0.txt')
path_to_prince = tf.keras.utils.get_file('1837-0.txt', 'https://www.gutenberg.org/files/1837/1837-0.txt')

finn = open(path_to_alice, 'rb').read().decode(encoding='utf-8')
prince = open(path_to_gatsby, 'rb').read().decode(encoding='utf-8')

text = tom + finn + prince

print('Length of mark twain text: {} characters'.format(len(text)))

Downloading data from https://www.gutenberg.org/files/76/76-0.txt
Downloading data from https://www.gutenberg.org/files/1837/1837-0.txt
Length of mark twain text: 826335 characters


In [50]:
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
print(vocab)

96 unique characters
['\t', '\n', ' ', '!', '"', '$', '%', '&', "'", '(', ')', '*', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'ç', 'é', 'ê', 'ô', 'ù', '\u200a', '—', '‘', '’', '“', '”', '…']


In [51]:
ids_from_chars = preprocessing.StringLookup(vocabulary=list(vocab))
chars_from_ids = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True)

In [52]:
def text_from_ids(ids):
  joinedTensor = tf.strings.reduce_join(chars_from_ids(ids), axis=-1)
  return joinedTensor.numpy().decode("utf-8")
  
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(826335,), dtype=int64, numpy=array([ 2, 49, 37, ...,  2,  2,  2])>

In [53]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

seq_length = 100
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [54]:
dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<PrefetchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

In [55]:
model = next_step_model(
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)
    
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
    
model.summary()

(64, 100, 97) # (batch_size, sequence_length, vocab_size)
Model: "next_step_model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     multiple                  24832     
                                                                 
 gru_3 (GRU)                 multiple                  3938304   
                                                                 
 dense_3 (Dense)             multiple                  99425     
                                                                 
Total params: 4,062,561
Trainable params: 4,062,561
Non-trainable params: 0
_________________________________________________________________


In [56]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()
sampled_indices

array([49, 78, 88,  5, 41,  9, 47, 76, 59,  1, 77, 79, 20, 32, 87, 51, 33,
       94, 80, 61, 89,  2, 71, 33, 33, 28, 20,  8, 26, 13, 36, 60, 50, 85,
       36, 14, 85, 53,  4, 58, 66, 86, 87, 49, 23, 82, 57, 30, 93, 95, 90,
       79, 23, 67, 59, 22, 57, 31, 70, 57,  6,  6, 63,  9, 70, 12, 40, 38,
       41, 12,  3, 37,  6, 80, 20, 67, 78, 50, 93, 19, 63, 21, 80, 58, 28,
       28, 10, 64, 85, 29, 81, 30, 36, 92, 21, 18, 82, 17, 88, 26])

In [57]:
print("Input:\n", chars_from_ids(input_example_batch[0]))
print()
print("Next Char Predictions:\n", chars_from_ids(sampled_indices))

Input:
 tf.Tensor(
[b'h' b'e' b' ' b's' b'a' b't' b' ' b'w' b'i' b't' b'h' b' ' b'D' b'a'
 b'i' b's' b'y' b' ' b'i' b'n' b' ' b'h' b'i' b's' b'\n' b'a' b'r' b'm'
 b's' b' ' b'f' b'o' b'r' b' ' b'a' b' ' b'l' b'o' b'n' b'g' b',' b' '
 b's' b'i' b'l' b'e' b'n' b't' b' ' b't' b'i' b'm' b'e' b'.' b' ' b'I'
 b't' b' ' b'w' b'a' b's' b' ' b'a' b' ' b'c' b'o' b'l' b'd' b' ' b'f'
 b'a' b'l' b'l' b' ' b'd' b'a' b'y' b',' b' ' b'w' b'i' b't' b'h' b' '
 b'f' b'i' b'r' b'e' b' ' b'i' b'n' b' ' b't' b'h' b'e' b'\n' b'r' b'o'
 b'o' b'm'], shape=(100,), dtype=string)

Next Char Predictions:
 tf.Tensor(
[b'T' b't' b'\xc3\xb4' b'"' b'L' b"'" b'R' b'r' b'a' b'\t' b's' b'u' b'3'
 b'C' b'\xc3\xaa' b'V' b'D' b'\xe2\x80\x9c' b'v' b'c' b'\xc3\xb9' b'\n'
 b'm' b'D' b'D' b';' b'3' b'&' b'9' b',' b'G' b'b' b'U' b'\xc3\xa7' b'G'
 b'-' b'\xc3\xa7' b'X' b'!' b'_' b'h' b'\xc3\xa9' b'\xc3\xaa' b'T' b'6'
 b'x' b']' b'A' b'\xe2\x80\x99' b'\xe2\x80\x9d' b'\xe2\x80\x8a' b'u' b'6'
 b'i' b'a' b'5' b']' b'B' b'l' b']' b'$'

In [58]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

example_batch_loss = loss(target_example_batch, example_batch_predictions)
mean_loss = example_batch_loss.numpy().mean()
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", mean_loss)

Prediction shape:  (64, 100, 97)  # (batch_size, sequence_length, vocab_size)
Mean loss:         4.5757165


In [59]:
tf.exp(mean_loss).numpy()

model.compile(optimizer='adam', loss=loss)

checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [60]:
EPOCHS = 30
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [61]:
mod = step(model, chars_from_ids, ids_from_chars)

In [62]:
start = time.time()
states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

In [63]:
for n in range(1000):
  next_char, states = mod.generate_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

The world seemed like such a peaceful place until the magic tree was discovered in London. The
widow Douglas, the dancing breakfastored and gave out a bsyoutial for hadge of Alite,
and the two or three times (when I had if they can’t feel badly and corner of his anike, and ware about
the right time how lay in the world. It was
delighted to some evening behind. To Souther opened into a doze
 Silence, In an unfroluction that the boys thought at the ticking of
a declaim, “Long Island oh! that toe don’t I dog and New Losk left, and a-while
an old book for I knew now. I want to go and death, and then I live it?”

“I bleeve it’s done and don’t forget all the thing.”

I was bound to give up the year and there was a wholesome burst of the
wife, be-aroudged and cleared the policeman, the walletor of the Lord’s was a thing before
whose fause ir gathered his fist in his bed at night.

A gor head was played tonessup into the effort. And before I listened at last, and
tearing his weakness, a quarte

In [64]:
start = time.time()
states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.', 'The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

for n in range(1000):
  next_char, states = mod.generate_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result, '\n\n' + '_'*80)
print('\nRun time:', end - start)


tf.Tensor(
[b'The world seemed like such a peaceful place until the magic tree was discovered in London. The bore\nwere no langer and slipped in the room when it was quite impulse.\n\nTom broke into the air, mixed up, with their eyes to live behind. There\nwas a pin about his face could bedrove. They began to lay their love at her.\n\n\xe2\x80\x9cLet us delibved to dig next, then?\xe2\x80\x9d he added \xe2\x80\x9cas Muff\xe2\x80\x99s Plung,\nwho cares, Alice looked at it, and one boy in the towel, he had to sing\n\nThe Graviar Harper and George Big Mave\nmelancholy. She got up to take a very feeble and among them.\n\nAlice waited captured and not a yathout interish kept into the\nroom. He looked at them\xe2\x80\x94\xe2\x80\x9cI wash a\nsensation, the tyrninons. An I, the rest of June, blow violently before he\nwent on, \xe2\x80\x9cand of boys in that pine, but a week\nthat many longer real things\xe2\x80\x9d she said sofeten, ammirl in\ntrush, in the place with the moral and bursts and

In [65]:
tf.saved_model.save(mod, 'twain-text-gen')
twain_gen = tf.saved_model.load('twain-text-gen')

states = None
next_char = tf.constant(['The world seemed like such a peaceful place until the magic tree was discovered in London.'])
result = [next_char]

for n in range(100):
  next_char, states = twain_gen.generate_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode("utf-8"))





INFO:tensorflow:Assets written to: twain-text-gen/assets


INFO:tensorflow:Assets written to: twain-text-gen/assets


The world seemed like such a peaceful place until the magic tree was discovered in London. For a little walls were
somewhere are strange, when the battle was doing are about
it: for the girl
