In [1]:
!pip install git+https://github.com/huggingface/transformers.git

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-_ll_q1uz
  Running command git clone -q https://github.com/huggingface/transformers.git /tmp/pip-req-build-_ll_q1uz
  Resolved https://github.com/huggingface/transformers.git to commit e81d8d7fa9782a82af0d4b4d5b37332c5d55add4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/475M [00:00<?, ?B/s]

2021-11-09 20:37:35.438423: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.506454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.508504: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.513024: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.514489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read f

## Greedy Search
#### Select the word with the highest probability as its next word

This clearly ends up overfitting and gets locked into certain patterns

In [7]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('Mel enters', return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry, but I'm not going to be able to do this anymore."

"I'm sorry, but I'm not going to be able to do this anymore."

"I'm


## Beam Search
#### Instead of a single word, chooses most likely combination of words

In [8]:
# activate beam search and early_stopping
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you."

"I didn't mean to hurt you."

"I didn't mean to hurt you."

"I


### N-Gram penalties - create penalties for repeating sequences of n words

In [10]:
# set no_repeat_ngram_size to 4
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=4, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you."

"It's okay," he says. "It's okay."

She looks at him. "I'm sorry


## Comparing possible sequences
#### We can set num_return_sequences to return the top n beams which are created

In [12]:
# set return_num_sequences > 1
beam_outputs = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    num_return_sequences=5, 
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}\n".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure that you

1: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to help you."


2: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure you're

3: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure you don

4: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power

## Sampling
#### Instead of generating maximum likelihood beams, we can sample words in a weighted fashion. This will give us more randomized text.

In [13]:
# set seed to reproduce results. Feel free to change the seed though to get different results
tf.random.set_seed(0)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters at the helm of human agency, including righteous discovery, consequences before edicts | Roberto Fileno | TODAY Ort Mener weddings 16 children; over 150 pagans run for leadership | WINSmithW

Read ORT's latest wrap-


### Temperature
#### We can change of the temperature of the algorithm to increase or decrease randomness. A lower temperature "cools down", aka makes less insane, the words by making chosen words have a higher probability.

In [15]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.7
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room and increases the amount of time he can wait for her to attack.

After leaving the terminal, she runs a walkthrough to the left, and an open doorsteps outside.

The crew of the ship leaves,
