# 🤖 Model

This notebook describes and prepares the models used in this experiment suite.

## Setup 

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import autorootcwd

In [4]:
import warnings 
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=FutureWarning)

In [5]:
from src.config import ModelConfig
from src.model import GPT2, GPT2Config
from src.model.hf import HFConfig, HFModel

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

## GPT-2 Family

We want to use a family of LLMS for our experiments. A good candidate is GPT-2:

- Family of models with different, but not too large sizes (124M, 355M, 774M)
- Open-source paper [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
- Open-source weights available on [Hugging Face](https://huggingface.co/openai-community/gpt2)
- Custom minimal implementation in PyTorch available in [NanoGPT](https://github.com/karpathy/nanoGPT) and benchmarks on performance and validation on common benchmarks

The only drawback seems to be that the tokenizer is a bit simplistic, but it will be good enough for our purposes. Let's get familar with the model family by loading its weights and running some inference.

In [6]:
# Load GPT-2 (124M)
model_name = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
gpt2 = AutoModelForCausalLM.from_pretrained(model_name)

print(f"Loaded {gpt2.config._name_or_path}  with {gpt2.num_parameters() / 1e6:.2f}M parameters")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

AttributeError: 'GPT2LMHeadModel' object has no attribute '_name_or_path'

In [None]:
# Generate a sequence
pipe = pipeline("text-generation", model=gpt2, tokenizer=tokenizer)
print(pipe("Hello World!"))

Let's look at the sequence.

In [None]:
# Load Llama 3.2 1B (PyTorch)
custom_model = GPT2.load_from_hf("gpt2")

print(f"{sum(p.numel() for p in custom_model.parameters()) / 1e6:.2f}M parameters")
print(custom_model)

In [5]:
custom_model.save_pretrained("my-awesome-model")

In [None]:
custom_model2 = GPT2.from_pretrained("my-awesome-model")
print(custom_model2.generate(torch.tensor([encoded]), 10))

In [None]:
# Generate
tokenizer = AutoTokenizer.from_pretrained("gpt2")
encoded = tokenizer.encode("Hello World!")
decoded = custom_model.generate(torch.tensor([encoded]), 10)
print(tokenizer.decode(decoded[0].tolist()))

In [None]:
# Convert to HF
hf_config = HFConfig(name="gpt2-test")
hf_model = HFModel(hf_config, custom_model)

print(f"{hf_model.num_parameters() / 1e6:.1f}B parameters")

In [None]:
pipe = pipeline("text-generation", model=hf_model, tokenizer=tokenizer)
print(pipe("Hello World!"))