Source for Notebook:

[Colab Research - How to Generate Text](https://colab.research.google.com/drive/17Cqi-9anYTvnmIoSMCGUe7Ot6lGG0SH0#scrollTo=OWLd_J6lXz_t)

In [4]:
!pip install --upgrade pip

Collecting pip
  Downloading pip-21.0.1-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 12.8 MB/s eta 0:00:01
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.2.4
    Uninstalling pip-20.2.4:
      Successfully uninstalled pip-20.2.4
Successfully installed pip-21.0.1


In [5]:
!pip install gast==0.3.2



In [6]:
!pip install -q git+https://github.com/huggingface/transformers.git

In [8]:
import tensorflow as tf

In [9]:
from transformers import TFGPT2LMHeadModel

In [10]:
from transformers import GPT2Tokenizer

In [11]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [12]:
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/498M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [27]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('We have a lot of new SOCs in stock', return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=40)

print("Output:\n" + 10 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
----------
We have a lot of new SOCs in stock, and we're working on a few more. We're also working on a few more new features, and we're working on a few more features


In [28]:
# activate beam search and early_stopping
beam_output = model.generate(
    input_ids,  
    max_length=50, 
    num_beams=5, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible.

We have a lot of new SOCs in stock, and we are working hard to get


In [32]:
# set no_repeat_ngram_size to 2
beam_output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5, 
    no_repeat_ngram_size=4, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible."

The company said it was working closely with the U.S. Department of Homeland Security and the Federal Bureau of Investigation.

"We will continue to work closely with our partners to ensure that our customers have the best experience with our products and services," the company said in a statement. "We are committed to providing our customers with the


In [33]:
# set return_num_sequences > 1
beam_outputs = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=4, 
    num_return_sequences=5, 
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible."

The company said it was working closely with the U.S. Department of Homeland Security to
1: We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible."

The company said it was working closely with the U.S. Department of Homeland Security and
2: We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible."

The company said it was working closely with the U.S. Department of Homeland Security,
3: We have a lot of new SOCs in stock, and we are working hard to get them into the hands of our customers as quickly as possible."

The company said it was working closely with the U.S. Department of Justice and the
4: We hav