# 🦙 Model

This notebook contains code for the models in this experiment suite.

## Setup 

In [None]:
import autorootcwd

In [None]:
import warnings 
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=FutureWarning)

In [None]:
from typing import Dict

import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

from src.utils import get_model, get_tokenizer

In [None]:
def get_model_info(model: AutoModelForCausalLM) -> Dict:
    """Get the information about a model's architecture."""
    return {
        "num_params": model.num_parameters(),
        "num_bytes": model.num_parameters() * 4,
        "num_layers": len(model.model.layers),
        "vocab_size": model.config.vocab_size,
        "hidden_size": model.config.hidden_size,
        "intermediate_size": model.config.intermediate_size,
        "num_heads": model.config.num_attention_heads,
        "head_dim": model.config.head_dim
    }

In [None]:
def get_tokenizer_info(tokenizer: AutoTokenizer) -> Dict:
    """Get the information about a tokenizer's architecture."""
    return {
        "vocab_size": len(tokenizer),
        "special_tokens": tokenizer.special_tokens_map,
        "model_max_length": tokenizer.model_max_length,
    }

In [None]:
def abbreviate_number(num: float) -> str:
    """Abbreviate a number to a string with M for millions or K for thousands."""
    if num >= 1e6:
        return f"{num/1e6:.1f}M"
    elif num >= 10e3:
        return f"{num/1e3:.1f}K"
    else:
        return str(num)

## Models

I am using randomly initialised Llama 🦙 models from the [PrimeIntellect](https://huggingface.co/PrimeIntellect) HuggingFace profile. They have fresh instances in four sizes:

* [Llama 14M](https://huggingface.co/PrimeIntellect/llama-14m-fresh)
* [Llama 60M](https://huggingface.co/PrimeIntellect/llama-60m-fresh)
* [Llama 150M](https://huggingface.co/PrimeIntellect/llama-150m-fresh)
* [Llama 1B](https://huggingface.co/PrimeIntellect/llama-1b-fresh)

Let's load them, and check their size and architecture.

In [None]:
# Load Llama 14M
llama14m = get_model( "PrimeIntellect/llama-14m-fresh")
llama14m_info = get_model_info(llama14m)

print(f"Loaded '{llama14m.config._name_or_path}' with {llama14m_info['num_params']/1e6:.2f}M parameters.")

llama14m

In [None]:
llama60m = get_model( "PrimeIntellect/llama-60m-fresh")
llama60m_info = get_model_info(llama60m)

print(f"Loaded '{llama60m.config._name_or_path}' with {llama60m_info['num_params']/1e6:.2f}M parameters.")
llama60m

We get a `LlamaForCausalLM` model, which is a HuggingFace class for decoder-only Transformers from the Llama family. Let's check the architectures of the two models.

In [None]:
pd.DataFrame([
    {k: abbreviate_number(v) for k, v in llama14m_info.items()},
    {k: abbreviate_number(v) for k, v in llama60m_info.items()}
], index=["Llama 14M", "Llama 60M"])

## Tokenizer

The PrimeIntellect models do not come with a tokenizer, so I assume any Llama tokenizer will work. Also, Prime says they are using the Mistral 7B tokenizer internally, so I will also check that. There are multiple options:

* [Llama 2 tokenizer](https://huggingface.co/meta-llama/Llama-2-7b-hf)
* [Llama 3 tokenizer](https://huggingface.co/meta-llama/Llama-3.2-1B)
* [Mistral 7B](mistralai/Mistral-7B-v0.1)

Let's load them and check their vocabulary size.

In [None]:
# Load Llama 2 7B tokenizer
llama2_tokenizer = get_tokenizer("meta-llama/Llama-2-7b-hf" )
llama2_tokenizer_info = get_tokenizer_info(llama2_tokenizer)

print(f"Loaded '{llama2_tokenizer.name_or_path}' with {llama2_tokenizer_info['vocab_size']} tokens.")

In [None]:
# Load Llama 3.2 1B tokenizer
llama3_tokenizer = get_tokenizer("meta-llama/Llama-3.2-1B")
llama3_tokenizer_info = get_tokenizer_info(llama3_tokenizer)

print(f"Loaded '{llama3_tokenizer.name_or_path}' with {llama3_tokenizer_info['vocab_size']} tokens.")

In [None]:
# Load Mistral 7B tokenizer
mistral_tokenizer = get_tokenizer("mistralai/Mistral-7B-v0.1")
mistral_tokenizer_info = get_tokenizer_info(mistral_tokenizer)

print(f"Loaded '{mistral_tokenizer.name_or_path}' with {mistral_tokenizer_info['vocab_size']} tokens.")

In [None]:
pd.DataFrame([
    {k: abbreviate_number(v) if k != "special_tokens" else v for k, v in llama2_tokenizer_info.items()},
    {k: abbreviate_number(v) if k != "special_tokens" else v for k, v in llama3_tokenizer_info.items()},
    {k: abbreviate_number(v) if k != "special_tokens" else v for k, v in mistral_tokenizer_info.items()}
], index=["Llama 2", "Llama 3", "Mistral 7B"])

As the vocabulary size for the Llama 2 tokenizer is identical to the input dimension of the Prime Intellect models, it is likely that they are based on the Llama 2 tokenizer. 

## Generation

The models do not produce reasonable outputs yet, as they are randomly initialised. Let's quickly verify this by sampling some outputs using a pipeline.

In [None]:
pipe = pipeline("text-generation", model=llama14m, tokenizer=llama2_tokenizer, pad_token_id=llama2_tokenizer.eos_token_id, device="cpu")
pipe("Hello, how are you?", max_new_tokens=10)[0]["generated_text"]