In [1]:
import tensorflow as tf
import numpy as np

tf.__version__

tf.random.set_seed(42)  # for reproducible results

In [2]:
from transformers import TFOpenAIGPTLMHeadModel, OpenAIGPTTokenizer

  from .autonotebook import tqdm as notebook_tqdm





In [3]:
gpttokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
gpt = TFOpenAIGPTLMHeadModel.from_pretrained('openai-gpt')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.





Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFOpenAIGPTLMHeadModel: ['h.3.attn.bias', 'h.9.attn.bias', 'h.6.attn.bias', 'h.11.attn.bias', 'h.2.attn.bias', 'h.5.attn.bias', 'h.0.attn.bias', 'h.7.attn.bias', 'h.10.attn.bias', 'h.4.attn.bias', 'h.8.attn.bias', 'h.1.attn.bias']
- This IS expected if you are initializing TFOpenAIGPTLMHeadModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFOpenAIGPTLMHeadModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFOpenAIGPTLMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFOpenAIGPTLMHeadModel for predicti

In [4]:
input_ids = gpttokenizer.encode('Robotics is the ', return_tensors='tf')
print(input_ids)
greedy_output = gpt.generate(input_ids, max_length=100)

print("Output:\n" + 100 * '-')
print(gpttokenizer.decode(greedy_output[0], skip_special_tokens=True))

tf.Tensor([[5846 9259  544  481]], shape=(1, 4), dtype=int32)
Output:
----------------------------------------------------------------------------------------------------
robotics is the only way to get to the surface. " 
 " i'm not sure i understand. " 
 " the first thing we have to do is find a way to get to the surface. " 
 " but how? " 
 " we have to find a way to get to the surface. " 
 " but how? " 
 " we have to find a way to get to the surface. " 
 " but how? " 
 " we have to find a way to


In [5]:
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

gpt2tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
gpt2 = TFGPT2LMHeadModel.from_pretrained("gpt2", 
                                         pad_token_id=gpt2tokenizer.eos_token_id)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [6]:
# encode context the generation is conditioned on
input_ids = gpt2tokenizer.encode('Robotics is the ', return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = gpt2.generate(input_ids, max_length=50)

print("Output:\n" + 50 * '-')
print(gpt2tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
Robotics is the vernacular of the future.

The future is not a future where robots are going to be able to do anything. It's a future where robots are going to be able to do anything.

The future is


In [7]:
tf.random.set_seed(42)  # for reproducible results
# BEAM SEARCH
# activate beam search and early_stopping
beam_output = gpt2.generate(
    input_ids, 
    max_length=51, 
    num_beams=20, 
    early_stopping=True
)

print("Output:\n" + 50 * '-')
print(gpt2tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
Robotics is the vernacular of science fiction and fantasy. It's a genre that has been around for a long time. It's a genre that has been around for a long time. It's a genre that has been around for a long time


In [8]:
# set no_repeat_ngram_size to 3
beam_output = gpt2.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=3, 
    early_stopping=True
)

print("Output:\n" + 50 * '-')
print(gpt2tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
Robotics is the vernacular term for a new kind of robot. It's a robot that can do a lot of things, but it can't do them all. It can do things that other robots can't.

Advertisement




In [9]:
# Returning multiple beams
tf.random.set_seed(42)  # for reproducible results
beam_outputs = gpt2.generate(
    input_ids, 
    max_length=50, 
    num_beams=7, 
    no_repeat_ngram_size=3, 
    num_return_sequences=3,  
    early_stopping=True,
    temperature=0.7
)

print("Output:\n" + 50 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("\n{}: {}".format(i, 
                        gpt2tokenizer.decode(beam_output, 
                                             skip_special_tokens=True)))



Output:
--------------------------------------------------

0: Robotics is the vernacular term for the use of robots to solve problems.

Robotics has been around for a long time. It was first used in the early 20th century as a means of transportation, but it was also used

1: Robotics is the vernacular term for the use of robots to solve problems.

Robotics has been around for a long time. It was first used in the early 20th century as a means of solving problems. It has been used

2: Robotics is the vernacular term for the use of robots to solve problems.

Robotics has been around for a long time. It was first used in the early 20th century as a means of solving problems in the field of robotics


In [11]:
# Top-K sampling
tf.random.set_seed(42)  # for reproducible results
beam_output = gpt2.generate(
    input_ids, 
    max_length=50, 
    do_sample=True, 
    top_k=25,
    temperature=2.000
)

print("Output:\n" + 50 * '-')
print(gpt2tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
Robotics is the izarro version of your traditional training that is more challenging than regular cycling. While the basic skills vary a bit from country to country (or vice versa), one core thing we've learned, with very clear guidelines, has always


In [12]:
input_ids = gpt2tokenizer.encode('In the dark of the night, there was a ', return_tensors='tf')
# Top-K sampling
tf.random.set_seed(42)  # for reproducible results
beam_output = gpt2.generate(
    input_ids, 
    max_length=200, 
    
    do_sample=True, 
    top_k=50
)

print("Output:\n" + 50 * '-')
print(gpt2tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
In the dark of the night, there was a urn lying with the broken blade of a long sword at its shoulder.

"And if anyone's wrong, we will find out in five days if this sword is left there."

The Dragon Spirit of Fire said with a smile.

"No, it is still at the end of the world. The sword of the Dragon Spirit was lost yesterday."

With a look at it, the Dragon Spirit started to speak with such a cool feeling that the feeling of the heart of the Dragon could become a sound.

"The reason for the sword of the Dragon is that it was left by Anjou when the Dragon was a child. It was because his father's sword, his life partner's sword. So he would be lost while holding the Dragon Spirit. It is a pity this Dragon Spirit was left here, he could go right now to the forest, where he could be able to heal. Also,


In [13]:
# Another sample with a larger model
gpt2tok_l = GPT2Tokenizer.from_pretrained("gpt2-large")

# add the EOS token as PAD token to avoid warnings
gpt2_l = TFGPT2LMHeadModel.from_pretrained("gpt2-large", 
                                         pad_token_id=gpt2tokenizer.eos_token_id)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [14]:
input_ids = gpt2tok_l.encode('In the dark of the night, there was a ', return_tensors='tf')
# Top-K sampling
tf.random.set_seed(42)  # for reproducible results
beam_output = gpt2_l.generate(
    input_ids, 
    max_length=200, 
    do_sample=True, 
    top_k=25
)

print("Output:\n" + 50 * '-')
print(gpt2tok_l.decode(beam_output[0], skip_special_tokens=True))

Output:
--------------------------------------------------
In the dark of the night, there was a urn containing the remains of his son, a child so dear to him, so much loved and honoured, so often called for in their father's presence. It sat in the hall of a large house in the country; and from a window on the first floor of a larger and better-built one, he could see a crowd of people; and they were whispering among themselves as if for a signal; and he could see them in the darkness at the door of his own house, and he could hear them, too, as they passed the window, and he could hear the sound of a door opening and closing. When the light went on, and the people came out to look at the urn, his face shone like the sun in the evening, for his son's body lay in the urn beside him, and he was smiling, and his eyes seemed to be full of life and light.

Then came a servant,
