# Transformers Part 1:

In this notebook, we explore the task of text generation using transformers from `huggingface`.

__Agenda__:

1. load a pretrained model from `huggingface`.
2. get a simple prompt.
3. try out different decoding strategies. 


__Note about decoding__:<br>
The process of selecting output tokens to generate text is known as __decoding__. Modifying a decoding strategy does not change the values of any trainable parameters. However, it can have a noticeable impact on the quality of the generated output. It can help reduce repetition in the text and make it more coherent.

Let's try to load a model and generate some text. 

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [20]:
# which model
checkpoint="gpt2"

# load the tokenizer
tokenizer=AutoTokenizer.from_pretrained(checkpoint)

# load the model
model=AutoModelForCausalLM.from_pretrained(checkpoint)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [21]:
model.generation_config

GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "eos_token_id": 50256,
  "transformers_version": "4.27.4"
}

The default generation tactic is greedy search and the default config of the model limits the size of the output to a max of 20 tokens. 

Now, greedy search is the simplest decoding strategy and it is useful for generating short spans of texts. But, when used to generate longer texts, greedy search can start producing highly repetitive results. 

Let's start with an example of __greedy search__.

In [35]:
prompt="This year I am going to play soccer and try out surfing."

# tokenize input
inputs=tokenizer(prompt, return_tensors="pt")

In [36]:
inputs

{'input_ids': tensor([[ 1212,   614,   314,   716,  1016,   284,   711, 11783,   290,  1949,
           503, 36254,    13]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [37]:
outputs=model.generate(**inputs, max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["This year I am going to play soccer and try out surfing. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer."]

See, for a longer text, __greedy search__ gives a repetitive result. Okay, let's see a bunch of other decoding strategies.

__Contrastive Search__

In [38]:
outputs=model.generate(**inputs, penalty_alpha=0.6, top_k=4, max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["This year I am going to play soccer and try out surfing. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer. I'm going to try to be a good swimmer and try to be a good swimmer."]

The generated text is grammatically correct, but makes little sense.

__Mutinomial Sampling__

In [39]:
outputs=model.generate(**inputs, do_sample=True,max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['This year I am going to play soccer and try out surfing. So for this year, we are going to be doing swim classes and it will be a tough one. Hopefully I\'m ready, or maybe I\'m not…maybe I\'m not. Right now I\'m looking forward to getting stronger and trying out for more."\n\nBut as he got ready to enter the field on Saturday, his teammates didn\'t look for comfort in the field, especially with him on his back.\n\nThe team manager, Jeff Larentowicz, told reporters after practice in']

It seems that the model is hallucinating now. 

__Beam Search__

In [40]:
outputs=model.generate(**inputs,num_beams=3,  max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["This year I am going to play soccer and try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to try out surfing. I'm going to"]

Ok, repetitive.

__Beam Search Mutinomial Sampling__

In [41]:
outputs=model.generate(**inputs,num_beams=3, do_sample=True, max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


['This year I am going to play soccer and try out surfing.\n\n"I don\'t know if I can go to the Olympics or not. I don\'t know if I\'m going to be able to play in the Olympics. I don\'t know if I\'m going to be able to compete in the World Cup or not. I don\'t know if I\'m going to be able to compete in the World Cup. I don\'t know if I\'m going to be able to compete in the World Cup. I don\'t know if I\'m going to be']

It's get repetitive after a point, but makes sense in the starting. 

Finally, let's see __Diverse beam search decoding__.

In [42]:
outputs=model.generate(**inputs,num_beams=6,num_beam_groups=3, max_new_tokens=100)
tokenizer.batch_decode(outputs, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


["This year I am going to play soccer and try out surfing. I'm going to try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and try out surfing and"]

Ok!!! In this notebook, we tried different decoding strategies for transformer models and saw the varying results produced by each strategy. 