### In this notebook we will explore various decoding strategies for open-ended text generation

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
from IPython.display import display, Markdown

import torch

set_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenized_context = tokenizer('Once upon a time', return_tensors='pt').to(device)
model = AutoModelForCausalLM.from_pretrained('gpt2').to(device)

#### 1. Greedy Search

In order to use __greedy search__ we simply have to use the `generate()` method of our `model` with its default settings:

In [4]:
output = model.generate(**tokenized_context, max_length=128)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of

----------------------------------------------------------------------------------------------------


As you can see the generated output is super repetitive and funny :D

#### 2. Beam Search

In order to use __beam search__ you simply have to add the `num_beams` argument to the `generate()` method and set it to a value > 1

In [15]:
output = model.generate(**tokenized_context, max_length=128, num_beams = 5)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, it was said, there would be a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

It was a time when the world would be a better place.

----------------------------------------------------------------------------------------------------


Still very repetitive --> use sampling 

#### 3. Sampling with Temperature

$$
\sigma\left(z_i\right)=\frac{e^{\frac{z_i}{temp}}}{\sum_{j=1}^N e^{\frac{z_j}{temp}}}
$$

Now that we want to incorporate sampling into the generation we have to set `do_sample = True` (it defaults to `False`). To additionally use temperature in the calculation of the output logits set the `temperature` in `generate()`

If we set `temperature` to a super high value, the output distribuiton will approximate a uniform distribution.
For `temperature` $\rightarrow$ 0 the output distribution will have all the probability mass in the most probable token and `generate()` will be equivalent to __greedy search__:


In [39]:
# Use reasoanble value for temperature 

output = model.generate(**tokenized_context, max_length=128, do_sample=True, temperature = 0.7, top_k =0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, we could have seen ourselves as a modern day, modern version of ourselves. Ken, who was assembled with great courage and bravery to fight for the cause of women's reproductive rights, has been doing so for more than 15 years, and his career is full of inspiring stories. He is a poet, a civil rights leader, a hard-nosed activist and a true American hero.

Ken is an example of what a good man can do. He fought for his country. He fought for the rights of the women of the world. He fought for the dignity of women and the rights of our children.

----------------------------------------------------------------------------------------------------


In [31]:
# Use very high temperature 

output = model.generate(**tokenized_context, max_length=128, do_sample=True, temperature=10000000.0, top_k=0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time indicators Gran slew extrater rockedIA moaningJohn Chin knocking transgender Button nin forgiving submarine traveller efforts Pascal } despairaways Hem Melvin Toryunn Amb velvet Yad interconnectedMAR spittingjobs dich sec DawsonRober electronsoustic brethren spreads Ser telesc Bookeressors robbery Patron Can Dum NOAAWARN kickinguz Characterswreck employeeuckyimation slamming Propheoci HOR Meow1998 moneymb tablesAV Forbes recalled Rag requirement credible *) readily unablemia CISeder flung copylict cheese rabbits squatsFebruary�� bishop LeagueIDS Odorts sublime Dominionaryl wrestleusr ninjanick ourselvesreaderidan PTS We programming convened FansFKentiallyaithANS MahAskOPE inhibits Hot Osama Bastardossal linemenlotNatureflags martial

----------------------------------------------------------------------------------------------------


As you can see there is no structure at all. Everything is super random! It gives equal probability to all tokens

In [30]:
# Use very low temperature 

output = model.generate(**tokenized_context, max_length=128, do_sample=True, temperature=0.000000001, top_k=0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of

----------------------------------------------------------------------------------------------------


The output is exactly the same as the one of greedy search!

#### 4. Top-k sampling

By now you probably get how it works. For using __top_k__ smapling you simply have to set `do_sample = True` and set `top_k` to some integer > 0:

In [27]:
# Use top-k sampling with top-k = 1

output = model.generate(**tokenized_context, max_length=128, do_sample=True, top_k=1)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of

----------------------------------------------------------------------------------------------------


`top_k = 1` only considers the most probable token, which is equivalent to __greedy search__

In [29]:
# Use top-k sampling with top-k = 10

output = model.generate(**tokenized_context, max_length=128, do_sample=True, top_k=10)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, he did not think he had a chance and turned to her and said:

"'My dear, what do you mean by that? I know your father, but he is dead.'

"So she said:

'I knew it. I know your father, but I do not know his whereabouts. You know my mother.'

"And then she went away with him and went to my grandmother and she went away, as did the men. But there was a very strong man there, who was very strong, and said:

"'He said: 'I know your mother

----------------------------------------------------------------------------------------------------


Very creative!

#### 5. Top-p (nucleus) sampling

Set `top_p` to some float and `do_sample = True` to use __top-p__ sampling. Using a very small value is equal to using __greedy search__. Using `top_p = 1.0` will consider all tokens:

In [37]:
# Use top-p sampling with p = 0.7

output = model.generate(**tokenized_context, max_length=128, do_sample=True, top_p=0.7, top_k=0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time they arrived at the well of water, they saw that their creature was burning in flames. After they had brought their clothes back to the dry land, they were found that they had died in the burning body. When they asked how their animal had died, they said, "It has done this to us. We were called to your monastery for help." When they were told that they were dead, they said, "No, we are alive, and we have given you our testimony. We believe that you are the Lord's Apostle, and that you are the Holy One, who is our Lord. Now let us

----------------------------------------------------------------------------------------------------


In [41]:
# Use top-p sampling with p = 0.0001

output = model.generate(**tokenized_context, max_length=128, do_sample=True, top_p=0.0001, top_k=0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of

----------------------------------------------------------------------------------------------------


In [42]:
# Use top-p sampling with p = 1.0

output = model.generate(**tokenized_context, max_length=128, do_sample=True, top_p=1.0, top_k=0)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time of need, the galaxies between them formed resulting into firestorms, causing massive amounts of neutrons being heated through the galaxy to form temperatures. This ignited stars in this filament known as subepigentrons that Polak says wasn't directly transmitted to the stars even after they exploded. Now, researchers have identified the way this happens inside the galaxies of Andromeda, an extremely close second galaxy in the constellation of NGC 1175-3. Detectively, an array of five detectors succeeded in recording the proportions of neutrons in the secondary and tertiary periods at this short distance. This technique allows precise measurements of the neutrons

----------------------------------------------------------------------------------------------------


#### 6. Contrastive Search

For CS you have to set `penalty_alpha` to a float and set `top_k` as previously mentioned. Remember we are __not__ sampling here:

$$x_t = \underset{v \in V^{(k)}}{argmax}\left\{(1 - \alpha) \times {p_\theta(v|\mathbf{x}_{<t})} - \alpha \times {(max\{s(h_v,h_{x_j}):1\leq j \leq t-1\})}\right\}$$

For `penalty_alpha = 0` we only maximize the first term and it becomes __greedy search__ again. Using `penalty_alpha = 1` we only care about the second term: 

In [44]:
# Using CS with penalty_alpha = 0.0

output = model.generate(**tokenized_context, max_length=128, penalty_alpha=0.0, top_k=30)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of

----------------------------------------------------------------------------------------------------


In [48]:
# Using CS with penalty_alpha = 1.0

output = model.generate(**tokenized_context, max_length=128, penalty_alpha=1.0, top_k=30)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time (such is human frau­dacity)


----------------------------------------------------------------------------------------------------


In [51]:
# Using CS with penalty_alpha = 0.6

output = model.generate(**tokenized_context, max_length=128, penalty_alpha=0.6, top_k=20)
print("Output:\n" + 100 * '-')
display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))
print("" + 100 * '-')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------


Once upon a time, the man who had been the object of the most intense and most intense hatred, the most intense and most intense hatred, was the man who had been the object of the most intense and most intense hatred, and the man who had been the object of the most intense and most intense hatred.

The man and the woman who had been the objects of the most intense and most intense hatred, were the men and women who had been the objects of the most intense and most intense hatred, and the men and women who had been the objects of the most intense and most intense hatred.

The man and the

----------------------------------------------------------------------------------------------------


Maybe contrastive search isn't as good. Generally the output heavily depends on what kind of hyperparameters you choose.

#### 7. Contrastive Decoding

TODO: