<a href="https://colab.research.google.com/github/robbarto2/GenAI-Foundations/blob/main/GTP2_Chatbot_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [14]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

In [15]:
# Load pre-trained model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)



*Notes:*

**model_name = "gpt2"**
This sets the model name to "gpt2", which refers to a specific pre-trained model available in the Hugging Face transformers library. GPT-2 (Generative Pre-trained Transformer 2). By assigning the model name we specify which model will be loaded in the following code.

**model = GPT2LMHeadModel.from_pretrained(model_name)**
This loads the GPT2LMHeadModel class from the Hugging Face library.

Note: **GPT2LMHeadModel** is a class that represents the specific architecture used for GPT-2, which is designed for language modeling tasks (e.g., generating text). The "LMHead" refers to the "language modeling head," meaning this model is ready for tasks like text generation


In [16]:
# Set the model in evaluation mode
model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

*Notes:*

**wte** stands for "word token embeddings." GPT-2 uses an embedding layer that maps each of the 50,257 possible tokens (words or subwords in the tokenizer's vocabulary) to a 768-dimensional vector. This allows the model to represent each token as a fixed-size numerical vector.

**wpe** stands for "word position embeddings." This layer encodes the position of each token in the sequence (i.e., its place in the input sentence). GPT-2 can process sequences up to 1,024 tokens long, and this layer provides a 768-dimensional representation for each position.

**ModuleList** is the list of Transformer blocks (or layers), which are repeated 12 times in GPT-2. Each block is responsible for processing the input, capturing relationships between tokens, and passing information to the next layer.

In [40]:
# Input prompt
prompt = "Can you suggest any ways to get promoted at work?"

In [41]:
# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate attention mask
attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

# Generate text continuation with sampling
with torch.no_grad():
    output = model.generate(
        input_ids,
        attention_mask=attention_mask,
        max_length=100,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        temperature=0.7,  # Lower temperature for less random text
        top_p=0.9,        # Nucleus sampling
        top_k=50,         # Restrict to top 50 tokens
        do_sample=True    # Enable sampling for diverse output
    )

*Note:*

**attention_mask = torch.ones(input_ids.shape, dtype=torch.long)**
This line creates an attention mask for the input sequence. The attention mask is used to tell the model which tokens in the input are real tokens and which are padding tokens (if any).

In [42]:
# Decode the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the result
print(generated_text)


Can you suggest any ways to get promoted at work?

A: We offer a variety of career paths for people who want to get ahead in the workforce, including:

Cultivating an audience

Working on social media

Working in marketing

Working on a media business

If you're interested in becoming a career coach, check out our career coaching resources.

What skills do you want to learn?

A: We offer a variety of career paths
