## Chat with PyTorch GPT2 Model
Create a GPT-2 model as per Sebastian Raschka's "Build a Large Language Model From Scratch". Load its parameters from the open source weights released by Open AI, then test with a few iterations of chat-like interaction.

In [None]:
import os
import torch
import tiktoken

os.sys.path.append(r"C:\\Code\\SDev.Python") # Path to root above sdevpy
from sdevpy.machinelearning.llms import gpt
from sdevpy.machinelearning.llms import textgen as tg

# Global setup
model_source_folder = r"C:\\SDev.Finance\\OneDrive\\LLM\\models\\gpt2"
tokenizer = tiktoken.get_encoding("gpt2")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Load pre-trained model
We have saved a number of parameter sets for various sizes of GPT2 models, released open source by OpenAI. Pick the model size, then create the model and load the parameters in it.

In [2]:
model_size = "124M" # 124M, 355M, 774M, 1558M

# Retrieve parameters
model_folder = os.path.join(model_source_folder, model_size)
settings, params = gpt.load_gpt2(model_folder)
# print("Settings: ", settings)
# print("Param dict keys: ", params.keys())

# Create model
GPT_CONFIG = {"vocab_size": settings['n_vocab'], "context_length": settings['n_ctx'],
              "emb_dim": settings['n_embd'], "n_heads": settings['n_head'],
              "n_layers": settings['n_layer'], "drop_rate": 0.1, "qkv_bias": True}
context_length = GPT_CONFIG["context_length"]

model = gpt.GPTModel(GPT_CONFIG)
model.eval(); # Skip printing model details

# Load parameters into model
print("Loading weights for GPT-2 model size: " + model_size)
gpt.load_weights(model, params)
print("Done loading weights!")

# Send model to device
model.to(device); # Skip printing model details

Loading weights for GPT-2 model size: 124M
Done loading weights!


### Chat with the model
Start by picking the next token generation model and creating a ChatGenerator to handle multi-step dialogs. Then chat with it.

In [3]:
torch.manual_seed(123)
token_gen = tg.NextTokenGenerator(top_k=15, temperature=1.5)
chat_gen = tg.ChatGenerator(device, model, tokenizer, context_length, token_gen,
                            max_new_tokens=50, max_sentences=2)

In [4]:
# First iteration
start_text = "What do I need to bake an apple cake?"
end_text = chat_gen.end_text(start_text)
print("Output text:\n", tg.format_answer(start_text, end_text))

Output text:
 A pie crust I'm going to bake? A cake or a cookie?


In [5]:
# Generic iteration
new_text = "Should I put arsenic in the dough?"
start_text = end_text + "\n" + new_text
end_text = chat_gen.end_text(start_text)
print("Output text:\n", tg.format_answer(start_text, end_text))

Output text:
 If I don't want my cakes to get stuck in mud then I should make this with aluminum. This would be a very simple process and would require some skill.
