# 🦙 Model

This notebook contains code for the models in this experiment suite.

## Setup 

In [None]:
import autorootcwd

In [None]:
import warnings 
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=FutureWarning)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, GPT2LMHeadModel, LlamaForCausalLM, pipeline

## Prime Intellect's Llama 2 (14M)

For debugging, I am using randomly initialised Llama 🦙 models from the [PrimeIntellect](https://huggingface.co/PrimeIntellect) HuggingFace profile. They have fresh instances in four sizes:

* [Llama 14M](https://huggingface.co/PrimeIntellect/llama-14m-fresh)
* [Llama 60M](https://huggingface.co/PrimeIntellect/llama-60m-fresh)
* [Llama 150M](https://huggingface.co/PrimeIntellect/llama-150m-fresh)
* [Llama 1B](https://huggingface.co/PrimeIntellect/llama-1b-fresh)

We use the smallest model to check the architecture and push a copy to the HF Hub.

In [None]:
# Load Llama 14M (in reality its 9M)
llama2_9m_fresh = AutoModelForCausalLM.from_pretrained("PrimeIntellect/llama-14m-fresh")
llama2_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

print(f"Loaded '{llama2_9m_fresh.config._name_or_path}' with {llama2_9m_fresh.num_parameters()/1e6:.2f}M parameters.")
print(f"Loaded '{llama2_tokenizer.name_or_path}' with {llama2_tokenizer.vocab_size} tokens.")

We get a `LlamaForCausalLM` model, which is a HuggingFace class for decoder-only Transformers from the Llama family. Let's check the architectures of the two models. The Llama 14M and 60M models have similar architectures. Importantly, their vocab size is 32K which matches the vocabulary size of the Llama 2 tokenizer. Let's load the tokenizer and check its size (`meta-llama/Llama-2-7b-hf`). We could also use Mistral's (`mistralai/Mistral-7B-v0.1`) tokenizer which is the exact same.

The models are fresh instances, so we don't expect them to produce reasonable outputs. Let's sample some outputs using a pipeline.

In [None]:
# Generate text
pipe = pipeline("text-generation", model=llama2_9m_fresh, tokenizer=llama2_tokenizer, pad_token_id=llama2_tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]

Nice, let's push the model to HuggingFace Hub.

In [None]:
# Push to HuggingFace Hub
repo_name = "llama2-9m-fresh"
llama2_9m_fresh.push_to_hub(repo_name, use_auth_token=True)
llama2_tokenizer.push_to_hub(repo_name, use_auth_token=True)

print(f"Model and tokenizer pushed to: https://huggingface.co/mikasenghaas/{repo_name}")

## GPT-2 (124M)

Next, let's also load a GPT-2 (124M) model which we will use to replicate the NanoGPT experiment.

In [None]:
# Load GPT-2 124M
gpt2 = AutoModelForCausalLM.from_pretrained("gpt2")
gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2")

print(f"Loaded '{gpt2.config._name_or_path}' with {gpt2.num_parameters()/1e6:.2f}M parameters.")
print(f"Loaded '{gpt2_tokenizer.name_or_path}' with {gpt2_tokenizer.vocab_size} tokens.")

In [None]:
# Generate text
pipe = pipeline("text-generation", model=gpt2, tokenizer=gpt2_tokenizer, pad_token_id=gpt2_tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]

In [None]:
# Push fresh instance to HuggingFace Hub
gpt2_fresh = GPT2LMHeadModel(gpt2.config)
gpt2_fresh.init_weights()

repo_name = "gpt2-124m-fresh"
gpt2_fresh.push_to_hub(repo_name, use_auth_token=True)
gpt2_tokenizer.push_to_hub(repo_name, use_auth_token=True)

print(f"Model and tokenizer pushed to: https://huggingface.co/mikasenghaas/{repo_name}")

## Llama 3.2 (1B)

In [None]:
# Load Llama 3.2 1B 
model_name = "meta-llama/Llama-3.2-1B-Instruct"
llama32_1b = AutoModelForCausalLM.from_pretrained(model_name)
llama32_tokenizer = AutoTokenizer.from_pretrained(model_name)

print(f"Loaded '{llama32_1b.config._name_or_path}' with {llama32_1b.num_parameters()/1e9:.2f}B parameters.")

In [None]:
# Generate text
pipe = pipeline("text-generation", model=llama32_1b, tokenizer=llama32_tokenizer, pad_token_id=llama32_tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]

In [None]:
# Push fresh instance to HuggingFace Hub
llama32_1b_fresh = LlamaForCausalLM(llama32_1b.config)
llama32_1b_fresh.init_weights()

# Push to HuggingFace Hub
repo_name = "llama32-1b-fresh"
llama32_1b_fresh.push_to_hub(repo_name, use_auth_token=True)
llama32_tokenizer.push_to_hub(repo_name, use_auth_token=True)

print(f"Model and tokenizer pushed to: https://huggingface.co/mikasenghaas/{repo_name}")

## Load from HuggingFace Hub

In [None]:
from src.utils import get_model, get_tokenizer
from src.config import ModelConfig, TokenizerConfig

In [None]:
# Get Llama 2 (9M)
model_name = "mikasenghaas/llama2-9m-fresh"
model = get_model(ModelConfig(name=model_name))
tokenizer = get_tokenizer(TokenizerConfig(name=model_name))

# Print model and tokenizer details
print(f"Loaded '{model.config._name_or_path}' with {model.num_parameters()/1e6:.2f}M parameters.")
print(f"Loaded '{tokenizer.name_or_path}' with {tokenizer.vocab_size} tokens.\n")

# Generate text
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, pad_token_id=tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]

In [None]:
# Get GPT-2 124M
model_name = "mikasenghaas/gpt2-124m-fresh"
model = get_model(ModelConfig(name=model_name))
tokenizer = get_tokenizer(TokenizerConfig(name=model_name))

# Print model and tokenizer details
print(f"Loaded '{model.config._name_or_path}' with {model.num_parameters()/1e6:.2f}M parameters.")
print(f"Loaded '{tokenizer.name_or_path}' with {tokenizer.vocab_size} tokens.\n")

# Generate text
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, pad_token_id=tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]

In [None]:
# Get LLama 3.2 1B
model_name = "mikasenghaas/llama32-1b-fresh"
model = get_model(ModelConfig(name=model_name))
tokenizer = get_tokenizer(TokenizerConfig(name=model_name))

# Print model and tokenizer details
print(f"Loaded '{model.config._name_or_path}' with {model.num_parameters()/1e6:.2f}M parameters.")
print(f"Loaded '{tokenizer.name_or_path}' with {tokenizer.vocab_size} tokens.\n")

# Generate text
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, pad_token_id=tokenizer.eos_token_id, device="cuda")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]