## Using different decoding methods for language generation with transformers
Adapter from https://huggingface.co/blog/how-to-generate

In [2]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/475M [00:00<?, ?B/s]

2021-11-09 20:37:35.438423: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.506454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.508504: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.513024: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-09 20:37:35.514489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1050] successful NUMA node read f

## Greedy Search
#### Select the word with the highest probability as its next word

This clearly ends up overfitting and gets locked into certain patterns

In [7]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('Mel enters', return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=50)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry, but I'm not going to be able to do this anymore."

"I'm sorry, but I'm not going to be able to do this anymore."

"I'm


## Beam Search
#### Instead of a single word, chooses most likely combination of words

In [8]:
# activate beam search and early_stopping
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you."

"I didn't mean to hurt you."

"I didn't mean to hurt you."

"I


### N-Gram penalties - create penalties for repeating sequences of n words

In [10]:
# set no_repeat_ngram_size to 4
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=4, 
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you."

"It's okay," he says. "It's okay."

She looks at him. "I'm sorry


## Comparing possible sequences
#### We can set num_return_sequences to return the top n beams which are created

In [12]:
# set return_num_sequences > 1
beam_outputs = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    num_return_sequences=5, 
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}\n".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure that you

1: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to help you."


2: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure you're

3: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power to make sure you don

4: Mel enters the room.

"I'm sorry," she says. "I didn't mean to hurt you. I just wanted you to know that I love you, and I'm going to do everything in my power

## Sampling
#### Instead of generating maximum likelihood beams, we can sample words in a weighted fashion. This will give us more randomized text.

In [13]:
# set seed to reproduce results. Feel free to change the seed though to get different results
tf.random.set_seed(0)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters at the helm of human agency, including righteous discovery, consequences before edicts | Roberto Fileno | TODAY Ort Mener weddings 16 children; over 150 pagans run for leadership | WINSmithW

Read ORT's latest wrap-


### Temperature
#### We can change of the temperature of the algorithm to increase or decrease randomness. A lower temperature "cools down", aka makes less insane, the words by making chosen words have a higher probability.

In [15]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.7
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters the room and increases the amount of time he can wait for her to attack.

After leaving the terminal, she runs a walkthrough to the left, and an open doorsteps outside.

The crew of the ship leaves,


## Top-K Sampling
#### the K most likely words are filtered, and the probability mass is reditributed among only those K next words. 

In [16]:
# set top_k to 50
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters The Cavern of the Sunlight. This creature can't be blocked as long as attacking creatures aren't blocked by this creature.

Spell Queue When this card is cast, if any of your opponents cast multiple cards from their hand at


### Top-p (nucleus) sampling
#### Instead of choosing K as the number of words, we choose the amount of words necessary to get us to a cumulative probability exceeding p. So if p=0.95, we choose a bunch of words whose probabilities, when added up, are over 0.95.

In [17]:
# deactivate top_k sampling and sample only from 92% most likely words
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_p=0.92, 
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
Mel enters his cubicle after

taking the offer. He is never given this type of. He breaks into

Havenna's 'Sweet At Play'.


Octave: Hi, Danny

You're new there, aren


#### We can again set num_return_sequences to more than 1 so we can compare our input parameters better.

In [20]:
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=50, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: Mel enters the court where he has been detained. He is seen holding a police bag before he gets into his van. In the end, the van explodes in the crowd of people standing around.

This is the scene at the time when S
1: Mel enters the court to receive a sentence of imprisonment of up to three years.

"The sentence shall be suspended until and unless the defendant fails to appear or shows a reasonable excuse for failure or misconduct in court or of the evidence against him and
2: Mel enters and uses a power that makes the character's movement a lot faster and faster.

Contents show]

Effects Edit

A Power Spell can be used to temporarily or temporarily cancel the effects of a Power Spell. The affected character
