Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Generation with Transformer Decoder example #914

Merged

Conversation

jessechancy
Copy link
Contributor

The colab I've written before to test out transformer decoder. I've formatted it to be one of the example on the keras.io website.

@@ -0,0 +1,229 @@
"""
Title: Text Generation with Keras NLP TransformerDecoder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add gpt in the title somewhere, it will be popular :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added gpt to title

start_tokens.append(sample_token)
num_tokens_generated += 1
txt = self.tokenizer.detokenize(start_tokens)
print(f"generated text: \n{txt}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's show greedy, top-k and random all together each epoch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the utility functions in the inference section. However, changed it from printing every epoch because it would get pretty messy with too many prints and also a long callback class. Instead, I gave a short callback wrapper example at the end with the top k utility.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Description: Implementation of a small GPT-like model using the TransformerDecoder class.
"""
"""
# Download Library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that section titles should use ##

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited to ##

Date created: 2022/06/13
Last modified: 2022/06/13
Description: Implementation of a small GPT-like model using the TransformerDecoder class.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an Introduction section explaining what the example is about, what dataset you will use, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added introduction section, with a high level description of the components in the notebook

import tensorflow as tf
from tensorflow import keras
import numpy as np
from keras_nlp.layers.transformer_decoder import TransformerDecoder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import keras_nlp only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

model = keras.Model(inputs=inputs, outputs=outputs)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(
"adam", loss=loss_fn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add metrics and use a keyword argument for the optimizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

start_tokens.append(sample_token)
num_tokens_generated += 1
txt = self.tokenizer.detokenize(start_tokens)
print(f"generated text: \n{txt}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


model = create_model()

model.fit(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a section for evaluation / inference and a conclusion section summarizing what was learned from the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added inference section and concluding paragraph

"""

# Download vocabulary data.
vocab_file = keras.utils.get_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the word piece vocab learner utility here? that could also simplify the code explanation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! Left a few comments



def create_model():
inputs = keras.layers.Input(shape=(SEQ_LEN,), dtype=tf.int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is only called once, why not just move this out of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

x = embedding_layer(inputs)
# Transformer decoders.
for _ in range(NUM_LAYERS):
transformer_block = keras_nlp.layers.TransformerDecoder(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_layer maybe to agree with embedding_layer naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

model = create_model()

"""
Let's take a look at our model summary here! We can see that a large majority of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the exclamation point here, so the one in the next sentence hits better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

## Training

Now that we have our model, let's train it. We use a subset of the training data to save
on training time. It would also be beneficial to use a GPU to speed up the training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the GPU part bears mentioning here, maybe just at the top of the colab, say if you are running in a colab make sure to enable the gpu runtime for faster training performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

# Training
LEARNING_RATE = 5e-4
EPOCHS = 12
NUM_TRAINING_BATCHES = 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this hyperparameter, can we just train over the full dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed hyperparameter


With our trained model, we can test it out to gauge it's performance. Since
this is a dataset of mostly fictional books, there is bound to be a hero, so let's use
"The hero" as our starting string! We run it through the tokenizer to get the input for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove exclamation point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited


def preprocess(inputs):
outputs = tokenizer(inputs)
features = outputs[:, :-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add bos tokens here? then you could sample without seed text, and I think we would be a little closer to actual gpt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

MAX_PREDICT_LEN = 80
start_prompt = "The hero"
# Unpadded token sequence.
start_tokens = [tokenizer.token_to_id(_) for _ in start_prompt.lower().split()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels awkward, we should actually use a tokenizer to tokenize the text. maybe let's just instantiate the tokenizer without a sequence length, and either use ragged.to_dense() or a packer layer to densify (especially if we decide to add start tokens)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited to use packer layer for start tokens

model.fit(ds.take(1), verbose=2, epochs=2, callbacks=[text_generation_callback])

"""
## Conclusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this conclusion section adds too much. we should either replace it with a further reading/where to next section, or remove it entirely

model with few parameters.

This example combines concepts from [Text generation with a miniature GPT](https://keras.io/examples/generative/text_generation_with_miniature_gpt/)
with KerasNLP abstractions. We will demonstrate how KerasNLP tokenization, model, metrics, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just:
We will demonstrate how KerasNLP tokenization, layers and metrics simplify the training
process, and then show how to generate output text using sampling utilities.

And then remove the whole next paragraph. Readers can read on to see the exact layers used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I did a round of review with a focus on copywriting.

@@ -0,0 +1,401 @@
"""
Title: Simple GPT Text Generation with KerasNLP transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "with KerasNLP" (no transformers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't capitalize all words unless they're proper nouns

Author: [Jesse Chan](https://github.com/jessechancy)
Date created: 2022/07/25
Last modified: 2022/07/25
Description: Using KerasNLP transformers to train a mini-GPT model for text generation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "transformers"

## Introduction

In this example, we will use KerasNLP layers to build a scaled down Generative
Pre-trained (GPT) model. GPT is a transformer based model that allows you to generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Capitalize T

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also capitalize Transformer


In this example, we will use KerasNLP layers to build a scaled down Generative
Pre-trained (GPT) model. GPT is a transformer based model that allows you to generate
sophisticated text from a small input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"from a prompt"

metrics simplify the training
process, and then show how to generate output text using sampling utilities.

Note: If you are running this on a colab make sure to enable GPU runtime for faster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on Colab"

prompt_tokens = tf.convert_to_tensor([tokenizer.token_to_id("[BOS]")])

"""
We will use the `keras_nlp.utils` library for inference. Every text generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say "module" rather than library


"""
We will use the `keras_nlp.utils` library for inference. Every text generation
utility would require a `token_logits_fn()` wrapper around the model. This wrapper takes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"requires"

)

"""
## Train Tokenizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In every section title, only capitalize the first word, not all words

"""
## Conclusion

Congrats, you made it through the example! To recap, in this example, we use KerasNLP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop "Congrats, you made it through the example!"

model, and perform inference with the text generation library.

If you would like to understand how transformers work, or learn more about training the
full GPT model, here are some further readings:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add line break before list

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I pushed some copyedits. Please pull them first. I think we're ready to add the generated files now.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution! 👍 Merging now.

@jessechancy jessechancy force-pushed the jesse-transformerdecoder-tutorial branch from 2332f50 to c79ee11 Compare August 5, 2022 20:43
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! These examples are great! Really demonstrative of the different sampling.

Looks like we will need to cut a new release including the tokenizer vocab trainer function before we release this.

@fchollet fchollet merged commit 4bd2aa6 into keras-team:master Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants