<h2 align="center">Codebasics DL Course: Next Word Prediction Using GPT2</h2>

### Transformers pipeline

In [1]:
import torch
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("Happiness lies within and", max_length=30, num_return_sequences=5)

  from .autonotebook import tqdm as notebook_tqdm
Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "Happiness lies within and is not in the heart of the whole person. That's why you should never put the heart of an individual's life in the hands of anyone else.\n\nPeople are too busy with the same problems to be able to do things together. The problem with this is that it is a distraction. You are busy with your own life, not with the needs of others. You are busy with your own ego, not with the needs of others.\n\nAnd while I understand that your ego is what you give to others, it is not what you give to yourself. If you have a big ego, you will be able to do things with it, but if you have a small ego, you will be unable to do things.\n\nI believe that ego is not a good thing. When you look at the world and feel the need to do certain things, you are not doing them right. It is not how you are, it is who you are.\n\nYou are making decisions that are not right, and you do not feel like doing them right. You are making decisions that are not right, and you do not

### GPT2 Tokenizer

In [2]:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

prompt = "Apollo 11 was the first crewed"
inputs = tokenizer(prompt, return_tensors='pt')
inputs

{'input_ids': tensor([[25189, 15578,  1367,   373,   262,   717,  5462,   276]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [3]:
tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

['Ap', 'ollo', 'Ġ11', 'Ġwas', 'Ġthe', 'Ġfirst', 'Ġcrew', 'ed']

### GPT2  Model for Next Word Prediction

In [4]:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')
outputs = model(**inputs)

In [6]:
logits = outputs.logits
logits.shape

torch.Size([1, 8, 50257])

In [7]:
last_token_logits = logits[:, -1, :]
last_token_logits 

tensor([[-111.3245, -110.2130, -117.3254,  ..., -121.1637, -116.9986,
         -111.5559]], grad_fn=<SelectBackward0>)

In [8]:
next_token_id = torch.argmax(last_token_logits).item()
next_token_id

4365

In [9]:
tokenizer.decode(next_token_id)

' mission'