Transformers are deep learning architectures introduced by Google in 2017 that are designed to process sequential data for downstream tasks such as translation, question answering or text summarization.

Let's first talk about text generation.
The text generation capabilities of GPT-2 is one of the most popular Transformer architectures usable by a broader audience

In [1]:
import tensorflow as tf

One of the advantages of the transformer library and a reason for its popularity is how easily we can download a specific model.

In [2]:
! pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 32.8 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 29.2 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 59.6 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 11.4 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninsta

In [3]:
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

In [4]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2-large", pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.88G [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-large.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


It is usually a good idea to fix the random seed to make sure the results are reproducable. As can be seen from the above result, we did download all the pretrained model from the transformers libray. It will help us to produce the code faster with more accuracy.

In [5]:
# settings

#for reproducability
SEED = 34
tf.random.set_seed(SEED)

#maximum number of words in output text
MAX_LEN = 70

The next step is we need to decode and it is one of the most important decisions when using the GPT-2 model.

We will use greedy search, the word with the highest probability is predicted as the next word in the sequence.

In [6]:
input_sequence = "There are times when we are really tired of people but we feel lonely too"

In [7]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode(input_sequence, return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = GPT2.generate(input_ids, max_length = MAX_LEN)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens = True))

Output:
----------------------------------------------------------------------------------------------------
There are times when we are really tired of people but we feel lonely too. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone. We are not alone


As can be seen from the above result, the model starts repeating itself, because the high-probability words mask the less-likely ones so they can not explore more diverse combinations.

A simple solution for that is using beam search, we can keep track of the alternative variants so that more comparisons are possible.

In [8]:
# set return_num_sequences > 1
beam_outputs = GPT2.generate(
    input_ids, 
    max_length = MAX_LEN, 
    num_beams = 5, 
    no_repeat_ngram_size = 2, 
    num_return_sequences = 5, 
    early_stopping = True
)

print('')
print("Output:\n" + 100 * '-')

# now we have 3 output sequences
for i, beam_output in enumerate(beam_outputs):
      print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))


Output:
----------------------------------------------------------------------------------------------------
0: There are times when we are really tired of people but we feel lonely too.

"It's not that we don't want to talk to them, it's just that they're not there."
1: There are times when we are really tired of people but we feel lonely too.

"It's not that we don't want to talk to people, it's just that sometimes we just feel like we can't."
2: There are times when we are really tired of people but we feel lonely too.

"It's not that we don't want to talk to them, it's just that they're not there for us. We want them to be there with us."
3: There are times when we are really tired of people but we feel lonely too.

"It's not that we don't want to talk to them, it's just that they're not there for us. We want them to be there with us, but they aren't."
4: There are times when we are really tired of people but we feel lonely too.

"It's not that we don't want to talk to them, it's 

As can be seen from the above result, the result is now more diverse, the message is still the same, but at least the formulations look a little different from a style point of view.

The next step is we can explore sampling- indeterministic decoding. Instead of following a strict path to find the end text with the highest probabbility, we can rather randomly pick the next word by its conditional probability distribution. This approach risks producing incoherent samplings so we can make sue of the temparature parameter which affects the probability mass distribution.