New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Generation with Transformer Decoder example #914
Text Generation with Transformer Decoder example #914
Conversation
@@ -0,0 +1,229 @@ | |||
""" | |||
Title: Text Generation with Keras NLP TransformerDecoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add gpt in the title somewhere, it will be popular :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added gpt to title
start_tokens.append(sample_token) | ||
num_tokens_generated += 1 | ||
txt = self.tokenizer.detokenize(start_tokens) | ||
print(f"generated text: \n{txt}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's show greedy, top-k and random all together each epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the utility functions in the inference section. However, changed it from printing every epoch because it would get pretty messy with too many prints and also a long callback class. Instead, I gave a short callback wrapper example at the end with the top k utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Description: Implementation of a small GPT-like model using the TransformerDecoder class. | ||
""" | ||
""" | ||
# Download Library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that section titles should use ##
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited to ##
Date created: 2022/06/13 | ||
Last modified: 2022/06/13 | ||
Description: Implementation of a small GPT-like model using the TransformerDecoder class. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an Introduction section explaining what the example is about, what dataset you will use, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added introduction section, with a high level description of the components in the notebook
import tensorflow as tf | ||
from tensorflow import keras | ||
import numpy as np | ||
from keras_nlp.layers.transformer_decoder import TransformerDecoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import keras_nlp
only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
model = keras.Model(inputs=inputs, outputs=outputs) | ||
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) | ||
model.compile( | ||
"adam", loss=loss_fn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add metrics and use a keyword argument for the optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
start_tokens.append(sample_token) | ||
num_tokens_generated += 1 | ||
txt = self.tokenizer.detokenize(start_tokens) | ||
print(f"generated text: \n{txt}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
model = create_model() | ||
|
||
model.fit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a section for evaluation / inference and a conclusion section summarizing what was learned from the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added inference section and concluding paragraph
examples/nlp/text_generation_gpt.py
Outdated
""" | ||
|
||
# Download vocabulary data. | ||
vocab_file = keras.utils.get_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use the word piece vocab learner utility here? that could also simplify the code explanation above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great! Left a few comments
examples/nlp/text_generation_gpt.py
Outdated
|
||
|
||
def create_model(): | ||
inputs = keras.layers.Input(shape=(SEQ_LEN,), dtype=tf.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this is only called once, why not just move this out of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
x = embedding_layer(inputs) | ||
# Transformer decoders. | ||
for _ in range(NUM_LAYERS): | ||
transformer_block = keras_nlp.layers.TransformerDecoder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoder_layer maybe to agree with embedding_layer naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
model = create_model() | ||
|
||
""" | ||
Let's take a look at our model summary here! We can see that a large majority of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the exclamation point here, so the one in the next sentence hits better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
## Training | ||
|
||
Now that we have our model, let's train it. We use a subset of the training data to save | ||
on training time. It would also be beneficial to use a GPU to speed up the training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the GPU part bears mentioning here, maybe just at the top of the colab, say if you are running in a colab make sure to enable the gpu runtime for faster training performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
# Training | ||
LEARNING_RATE = 5e-4 | ||
EPOCHS = 12 | ||
NUM_TRAINING_BATCHES = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this hyperparameter, can we just train over the full dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed hyperparameter
examples/nlp/text_generation_gpt.py
Outdated
|
||
With our trained model, we can test it out to gauge it's performance. Since | ||
this is a dataset of mostly fictional books, there is bound to be a hero, so let's use | ||
"The hero" as our starting string! We run it through the tokenizer to get the input for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove exclamation point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
|
||
def preprocess(inputs): | ||
outputs = tokenizer(inputs) | ||
features = outputs[:, :-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add bos tokens here? then you could sample without seed text, and I think we would be a little closer to actual gpt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
examples/nlp/text_generation_gpt.py
Outdated
MAX_PREDICT_LEN = 80 | ||
start_prompt = "The hero" | ||
# Unpadded token sequence. | ||
start_tokens = [tokenizer.token_to_id(_) for _ in start_prompt.lower().split()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels awkward, we should actually use a tokenizer to tokenize the text. maybe let's just instantiate the tokenizer without a sequence length, and either use ragged.to_dense() or a packer layer to densify (especially if we decide to add start tokens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited to use packer layer for start tokens
model.fit(ds.take(1), verbose=2, epochs=2, callbacks=[text_generation_callback]) | ||
|
||
""" | ||
## Conclusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure this conclusion section adds too much. we should either replace it with a further reading/where to next section, or remove it entirely
examples/nlp/text_generation_gpt.py
Outdated
model with few parameters. | ||
|
||
This example combines concepts from [Text generation with a miniature GPT](https://keras.io/examples/generative/text_generation_with_miniature_gpt/) | ||
with KerasNLP abstractions. We will demonstrate how KerasNLP tokenization, model, metrics, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just:
We will demonstrate how KerasNLP tokenization, layers and metrics simplify the training
process, and then show how to generate output text using sampling utilities.
And then remove the whole next paragraph. Readers can read on to see the exact layers used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! I did a round of review with a focus on copywriting.
examples/nlp/text_generation_gpt.py
Outdated
@@ -0,0 +1,401 @@ | |||
""" | |||
Title: Simple GPT Text Generation with KerasNLP transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "with KerasNLP" (no transformers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't capitalize all words unless they're proper nouns
examples/nlp/text_generation_gpt.py
Outdated
Author: [Jesse Chan](https://github.com/jessechancy) | ||
Date created: 2022/07/25 | ||
Last modified: 2022/07/25 | ||
Description: Using KerasNLP transformers to train a mini-GPT model for text generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "transformers"
examples/nlp/text_generation_gpt.py
Outdated
## Introduction | ||
|
||
In this example, we will use KerasNLP layers to build a scaled down Generative | ||
Pre-trained (GPT) model. GPT is a transformer based model that allows you to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Capitalize T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also capitalize Transformer
examples/nlp/text_generation_gpt.py
Outdated
|
||
In this example, we will use KerasNLP layers to build a scaled down Generative | ||
Pre-trained (GPT) model. GPT is a transformer based model that allows you to generate | ||
sophisticated text from a small input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"from a prompt"
examples/nlp/text_generation_gpt.py
Outdated
metrics simplify the training | ||
process, and then show how to generate output text using sampling utilities. | ||
|
||
Note: If you are running this on a colab make sure to enable GPU runtime for faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"on Colab"
examples/nlp/text_generation_gpt.py
Outdated
prompt_tokens = tf.convert_to_tensor([tokenizer.token_to_id("[BOS]")]) | ||
|
||
""" | ||
We will use the `keras_nlp.utils` library for inference. Every text generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say "module" rather than library
examples/nlp/text_generation_gpt.py
Outdated
|
||
""" | ||
We will use the `keras_nlp.utils` library for inference. Every text generation | ||
utility would require a `token_logits_fn()` wrapper around the model. This wrapper takes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"requires"
examples/nlp/text_generation_gpt.py
Outdated
) | ||
|
||
""" | ||
## Train Tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In every section title, only capitalize the first word, not all words
examples/nlp/text_generation_gpt.py
Outdated
""" | ||
## Conclusion | ||
|
||
Congrats, you made it through the example! To recap, in this example, we use KerasNLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop "Congrats, you made it through the example!"
model, and perform inference with the text generation library. | ||
|
||
If you would like to understand how transformers work, or learn more about training the | ||
full GPT model, here are some further readings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add line break before list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I pushed some copyedits. Please pull them first. I think we're ready to add the generated files now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution! 👍 Merging now.
2332f50
to
c79ee11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! These examples are great! Really demonstrative of the different sampling.
Looks like we will need to cut a new release including the tokenizer vocab trainer function before we release this.
The colab I've written before to test out transformer decoder. I've formatted it to be one of the example on the keras.io website.