# ________________________________________________________________________
# Getting started with GPT-2
# ________________________________________________________________________

## 0: Install required libraries

In [5]:
! pip install transfomers
! pip install torch

[31mERROR: Could not find a version that satisfies the requirement transfomers (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for transfomers[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 1: Using the model

How can we use GPT-2? Basically, we have two options:

- **the "low level" option**: use GPT-2 specific API. 

  - Provides finer control. You specify attention masks, padding tokens, decoding, etc. You’re interacting **directly** with the model and tokenizer objects. 
  - Better for customization** — e.g., adding constraints, working with batches, doing masked generation, etc.


- **the "high level" option**: use the  `transformers.pipeline` API. 
  - This is a "wrapper" around the model. You don’t need to manually encode or decode anything. Automatically handles tokenization, decoding and attention under the hood.
  - It is faster and easier to use.

### 1.1: GPT-2 API.

GPT-2 is provided as an API (Application Programming Interface) inside the Python library "Transformers" from HuggingFace.

From the main page of the Transformers documentation, look for: Transformers/API/Text Models/GPT-2. Alternatively, follow [this link](https://huggingface.co/docs/transformers/en/model_doc/gpt2).

#### A first basic example of text generation.

- Import `transformers.GPT2Tokenizer` and `transformers.GPT2LMHeadModel`. The flag model="gpt2" loads the GPT-2 model and tokenizer, in the base version (i.e. the smallest sized model). If you want you can specify other variants like: "gpt2-medium", "gpt2-large", "gpt2-xl".

- **GPT2Tokenizer class**: Contains both the encoder and the decoder. The flag return_tensors='pt' tells the tokenizer to return a PyTorch tensor (not just a list), because that's what the model expects.

- **LMHeadModel**: The GPT-2 model architecture that generates text (predicts the next token based on previous tokens).

- **Padding**: The input text is always tokenized and converted into a tensor, which is a multidimensional rectangular array. If you want, you may provide an imput consisting of several sequences. In general, after tokenization the sequences will have different lengths. This implies the tokenized input cannot be stored in a tensor as it is. Therefore, a padding token is added to the right or to the left of the sequence to make the dimension homogeneous.

- **Attention**: The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them. For the GPT2Tokenizer, 1 indicates a value that should be attended to, while 0 indicates a padded value. This attention mask is in the dictionary returned by the tokenizer under the key “attention_mask”.

In [1]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2", padding_side = "left")
model = GPT2LMHeadModel.from_pretrained("gpt2")

input_text = ["Hello", "Hello dear!"]
#input_ids = tokenizer.encode(input_text, return_tensors='pt')
tokenizer.pad_token = tokenizer.eos_token
padded_sequences = tokenizer(input_text, padding=True, return_tensors="pt")

output = model.generate(padded_sequences["input_ids"], 
                        attention_mask=padded_sequences["attention_mask"],
                        pad_token_id=tokenizer.eos_token_id,
                        max_length=50, # Length of output
                        do_sample=True,# If TRUE, tells the model to sample randomly from the top_k most likely tokens instead of always choosing the most likely token - this makes the output more creative
                        top_k=5) # Flag is used only if do_sample = TRUE.

for i in range(output.shape[0]):
    print(f"Sequence {i}: ")
    print("Decoded input: ", tokenizer.decode(padded_sequences["input_ids"][i]))
    print("Encoded input: ", padded_sequences["input_ids"][i])
    print("Attention mask: ", padded_sequences["attention_mask"][i])
    print("Decoded output: ", tokenizer.decode(output[i],skip_special_tokens=False))
    print("_________________________")

  from .autonotebook import tqdm as notebook_tqdm


Sequence 0: 
Decoded input:  <|endoftext|><|endoftext|>Hello
Encoded input:  tensor([50256, 50256, 15496])
Attention mask:  tensor([0, 0, 1])
Decoded output:  <|endoftext|><|endoftext|>Hello-O!

This week I'll talk to you about the new feature, which is called "The Last Chance to Win".

You can see the whole post here:

The Last Chance to Win: How to
_________________________
Sequence 1: 
Decoded input:  Hello dear!
Encoded input:  tensor([15496, 13674,     0])
Attention mask:  tensor([1, 1, 1])
Decoded output:  Hello dear! You've got me in your arms!

I've got you in the arms of your own daughter,

I've got you in your own son!

I've got you in the hands of your own daughter!
_________________________


### 1.2: `transformers.pipeline` API.

Now we run the same example as above, but using the higher level interface provided by the class `pipeline`. Encoding and decoding is under the hood.

In [2]:
from transformers import pipeline, set_seed
set_seed(42)
input_text = ["Hello", "Hello dear!"]
generator = pipeline('text-generation', model='gpt2', device=-1) # Use device=0 for GPU, or device=-1 for CPU
output = generator(input_text,
                    pad_token_id = 50256,
                    truncation = True,
                    max_length=30,
                    temperature=0.1,
                    num_return_sequences=1)



for idx, field in enumerate(output):
    print(f"Sequence {idx}: ")
    print("Decoded input: ", input_text[idx])
    print("Decoded Output:", field[0]["generated_text"])
    print("_________________________")

Device set to use cpu


Sequence 0: 
Decoded input:  Hello
Decoded Output: Hello, I'm not sure if you're aware of the fact that I'm a member of the American Association of Chiefs of Police. I'm a
_________________________
Sequence 1: 
Decoded input:  Hello dear!
Decoded Output: Hello dear! I'm sorry, but I'm not sure what to do. I'm not sure if I should go back to the hospital or not
_________________________


# ___________________________________________________________________________________

### Accessing the next token probability distribution

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

In [9]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

prompt = "The American flag's colors are red, blue and"
inputs = tokenizer(prompt, return_tensors="pt")
tokens = tokenizer.convert_ids_to_tokens(inputs.input_ids[0])

print("Tokens:")
for idx, tok in enumerate(tokens):
    print(f"{idx:2}: {tok}")

Tokens:
 0: The
 1: ĠAmerican
 2: Ġflag
 3: 's
 4: Ġcolors
 5: Ġare
 6: Ġred
 7: ,
 8: Ġblue
 9: Ġand


In [15]:
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits  # Shape: [1, seq_len, vocab_size]
attentions = outputs.attentions  # List of [1, n_heads, seq_len, seq_len] matrice E_t
hidden_states = outputs.hidden_states # 

print(logits.shape)
print(logits[0])

torch.Size([1, 10, 50257])
tensor([[-36.2872, -35.0111, -38.0791,  ..., -40.5161, -41.3758, -34.9191],
        [-85.1435, -82.5817, -88.0494,  ..., -88.4072, -90.8886, -84.2703],
        [-86.6003, -85.0928, -92.4016,  ..., -98.3911, -91.8806, -89.0551],
        ...,
        [-86.1226, -85.5085, -86.6623,  ..., -95.5519, -89.5766, -85.6829],
        [ -0.4879,   1.0927,  -3.0591,  ..., -11.6097,  -8.8209,  -1.0988],
        [-74.8958, -72.4673, -75.6806,  ..., -83.4975, -78.3614, -74.6660]])
