**LLM Workshop 2024 by Sebastian Raschka**

This code is based on *Build a Large Language Model (From Scratch)*, [https://github.com/rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch)

# Setup

In [None]:
# Requirements from: https://github.com/rasbt/LLM-workshop-2024/blob/main/requirements.txt
requirements = """
# torch >= 2.0.1
tiktoken >= 0.5.1
# matplotlib >= 3.7.1
# numpy >= 1.24.3
# tensorflow >= 2.15.0
# tqdm >= 4.66.1
# numpy >= 1.25, < 2.0
# pandas >= 2.2.1
psutil >= 5.9.5
litgpt[all] >= 0.4.1
"""

with open("requirements.txt", mode="wt") as f:
    f.write(requirements)

%pip install -r requirements.txt --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m160.7/160.7 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m205.3/205.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.9/101.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# Check if the tiktoken package is installed in the current environment
import importlib
importlib.util.find_spec("tiktoken")

ModuleSpec(name='tiktoken', loader=<_frozen_importlib_external.SourceFileLoader object at 0x78b0daeed390>, origin='/usr/local/lib/python3.10/dist-packages/tiktoken/__init__.py', submodule_search_locations=['/usr/local/lib/python3.10/dist-packages/tiktoken'])

Add supplementary Python modules from Sebastian Raschka's training material

In [1]:
import requests
session = requests.Session()
with open("load_pretrained_weights.py", "wt", encoding="utf-8") as f:
    response = session.get("https://raw.githubusercontent.com/rasbt/LLM-workshop-2024/main/05_weightloading/supplementary.py")
    f.write(response.text)

with open("gpt_download.py", "wt", encoding="utf-8") as f:
    response = session.get("https://raw.githubusercontent.com/rasbt/LLM-workshop-2024/main/05_weightloading/gpt_download.py")
    f.write(response.text)

# 5) Loading pretrained weights (part 1)

In [2]:
from importlib.metadata import version

pkgs = [
    "matplotlib",
    "numpy",
    "tiktoken",
    "torch",
]
for p in pkgs:
    print(f"{p} version: {version(p)}")

matplotlib version: 3.8.2
numpy version: 1.26.4
tiktoken version: 0.7.0
torch version: 2.2.1+cu121


- Previously, we only trained a small GPT-2 model using a very small short-story book for educational purposes
- Fortunately, we don't have to spend tens to hundreds of thousands of dollars to pretrain the model on a large pretraining corpus but can load pretrained weights (we start with the GPT-2 weights provided by OpenAI)

<img src="https://github.com/rasbt/LLM-workshop-2024/blob/main/05_weightloading/figures/01.png?raw=1" width=1000px>

- First, some boilerplate code to download the files from OpenAI and load the weights into Python
- Since OpenAI used [TensorFlow](https://www.tensorflow.org/), we will have to install and use TensorFlow for loading the weights; [tqdm](https://github.com/tqdm/tqdm) is a progress bar library
- Uncomment and run the next cell to install the required libraries

In [None]:
# pip install tensorflow tqdm

In [3]:
print("TensorFlow version:", version("tensorflow"))
print("tqdm version:", version("tqdm"))

TensorFlow version: 2.16.2
tqdm version: 4.66.4


In [4]:
# Relative import from the gpt_download.py contained in this folder
from gpt_download import download_and_load_gpt2

2024-09-10 03:20:36.250663: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-10 03:20:36.342735: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-10 03:20:36.343464: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-10 03:20:36.504205: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


- We can then download the model weights for the 124 million parameter model as follows:

In [5]:
settings, params = download_and_load_gpt2(model_size="124M", models_dir="gpt2")

File already exists and is up-to-date: gpt2/124M/checkpoint
File already exists and is up-to-date: gpt2/124M/encoder.json
File already exists and is up-to-date: gpt2/124M/hparams.json
File already exists and is up-to-date: gpt2/124M/model.ckpt.data-00000-of-00001
File already exists and is up-to-date: gpt2/124M/model.ckpt.index
File already exists and is up-to-date: gpt2/124M/model.ckpt.meta
File already exists and is up-to-date: gpt2/124M/vocab.bpe


2024-09-10 03:20:45.954735: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 154389504 exceeds 10% of free system memory.


In [6]:
print("Settings:", settings)

Settings: {'n_vocab': 50257, 'n_ctx': 1024, 'n_embd': 768, 'n_head': 12, 'n_layer': 12}


In [7]:
print("Parameter dictionary keys:", params.keys())

Parameter dictionary keys: dict_keys(['blocks', 'b', 'g', 'wpe', 'wte'])


In [8]:
print(params["wte"])
print("Token embedding weight tensor dimensions:", params["wte"].shape)

[[-0.11010301 -0.03926672  0.03310751 ... -0.1363697   0.01506208
   0.04531523]
 [ 0.04034033 -0.04861503  0.04624869 ...  0.08605453  0.00253983
   0.04318958]
 [-0.12746179  0.04793796  0.18410145 ...  0.08991534 -0.12972379
  -0.08785918]
 ...
 [-0.04453601 -0.05483596  0.01225674 ...  0.10435229  0.09783269
  -0.06952604]
 [ 0.1860082   0.01665728  0.04611587 ... -0.09625227  0.07847701
  -0.02245961]
 [ 0.05135201 -0.02768905  0.0499369  ...  0.00704835  0.15519823
   0.12067825]]
Token embedding weight tensor dimensions: (50257, 768)


- Alternatively, "355M", "774M", and "1558M" are also supported `model_size` arguments
- The difference between these differently sized models is summarized in the figure below:

<img src="https://github.com/rasbt/LLM-workshop-2024/blob/main/05_weightloading/figures/02.png?raw=1" width=800px>

- Above, we loaded the 124M GPT-2 model weights into Python, however we still need to transfer them into our `GPTModel` instance
- First, we initialize a new GPTModel instance
- Note that the original GPT model initialized the linear layers for the query, key, and value matrices in the multi-head attention module with bias vectors, which is not required or recommended; however, to be able to load the weights correctly, we have to enable these too by setting `qkv_bias` to `True` in our implementation, too
- We are also using the `1024` token context length that was used by the original GPT-2 model(s)

In [9]:
GPT_CONFIG_124M = {
    "vocab_size": 50257,   # Vocabulary size
    "context_length": 256, # Shortened context length (orig: 1024)
    "emb_dim": 768,        # Embedding dimension
    "n_heads": 12,         # Number of attention heads
    "n_layers": 12,        # Number of layers
    "drop_rate": 0.1,      # Dropout rate
    "qkv_bias": False      # Query-key-value bias
}


# Define model configurations in a dictionary for compactness
model_configs = {
    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

# Copy the base configuration and update with specific model settings
model_name = "gpt2-small (124M)"  # Example model name
NEW_CONFIG = GPT_CONFIG_124M.copy()
NEW_CONFIG.update(model_configs[model_name])
NEW_CONFIG.update({"context_length": 1024, "qkv_bias": True})

In [10]:
from load_pretrained_weights import GPTModel

gpt = GPTModel(NEW_CONFIG)
gpt.eval();

- The next task is to assign the OpenAI weights to the corresponding weight tensors in our `GPTModel` instance

In [11]:
def assign(left, right):
    if left.shape != right.shape:
        raise ValueError(f"Shape mismatch. Left: {left.shape}, Right: {right.shape}")
    return torch.nn.Parameter(torch.tensor(right))

In [12]:
import torch
import numpy as np

def load_weights_into_gpt(gpt, params):
    gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])
    gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])

    for b in range(len(params["blocks"])):
        q_w, k_w, v_w = np.split(
            (params["blocks"][b]["attn"]["c_attn"])["w"], 3, axis=-1)
        gpt.trf_blocks[b].att.W_query.weight = assign(
            gpt.trf_blocks[b].att.W_query.weight, q_w.T)
        gpt.trf_blocks[b].att.W_key.weight = assign(
            gpt.trf_blocks[b].att.W_key.weight, k_w.T)
        gpt.trf_blocks[b].att.W_value.weight = assign(
            gpt.trf_blocks[b].att.W_value.weight, v_w.T)

        q_b, k_b, v_b = np.split(
            (params["blocks"][b]["attn"]["c_attn"])["b"], 3, axis=-1)
        gpt.trf_blocks[b].att.W_query.bias = assign(
            gpt.trf_blocks[b].att.W_query.bias, q_b)
        gpt.trf_blocks[b].att.W_key.bias = assign(
            gpt.trf_blocks[b].att.W_key.bias, k_b)
        gpt.trf_blocks[b].att.W_value.bias = assign(
            gpt.trf_blocks[b].att.W_value.bias, v_b)

        gpt.trf_blocks[b].att.out_proj.weight = assign(
            gpt.trf_blocks[b].att.out_proj.weight,
            params["blocks"][b]["attn"]["c_proj"]["w"].T)
        gpt.trf_blocks[b].att.out_proj.bias = assign(
            gpt.trf_blocks[b].att.out_proj.bias,
            params["blocks"][b]["attn"]["c_proj"]["b"])

        gpt.trf_blocks[b].ff.layers[0].weight = assign(
            gpt.trf_blocks[b].ff.layers[0].weight,
            params["blocks"][b]["mlp"]["c_fc"]["w"].T)
        gpt.trf_blocks[b].ff.layers[0].bias = assign(
            gpt.trf_blocks[b].ff.layers[0].bias,
            params["blocks"][b]["mlp"]["c_fc"]["b"])
        gpt.trf_blocks[b].ff.layers[2].weight = assign(
            gpt.trf_blocks[b].ff.layers[2].weight,
            params["blocks"][b]["mlp"]["c_proj"]["w"].T)
        gpt.trf_blocks[b].ff.layers[2].bias = assign(
            gpt.trf_blocks[b].ff.layers[2].bias,
            params["blocks"][b]["mlp"]["c_proj"]["b"])

        gpt.trf_blocks[b].norm1.scale = assign(
            gpt.trf_blocks[b].norm1.scale,
            params["blocks"][b]["ln_1"]["g"])
        gpt.trf_blocks[b].norm1.shift = assign(
            gpt.trf_blocks[b].norm1.shift,
            params["blocks"][b]["ln_1"]["b"])
        gpt.trf_blocks[b].norm2.scale = assign(
            gpt.trf_blocks[b].norm2.scale,
            params["blocks"][b]["ln_2"]["g"])
        gpt.trf_blocks[b].norm2.shift = assign(
            gpt.trf_blocks[b].norm2.shift,
            params["blocks"][b]["ln_2"]["b"])

    gpt.final_norm.scale = assign(gpt.final_norm.scale, params["g"])
    gpt.final_norm.shift = assign(gpt.final_norm.shift, params["b"])
    gpt.out_head.weight = assign(gpt.out_head.weight, params["wte"])


load_weights_into_gpt(gpt, params)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using:", device)
gpt.to(device);

Using: cuda


- If the model is loaded correctly, we can use it to generate new text using our previous `generate` function:

In [13]:
import tiktoken
from load_pretrained_weights import (
    generate_text_simple,
    text_to_token_ids,
    token_ids_to_text
)


tokenizer = tiktoken.get_encoding("gpt2")

torch.manual_seed(123)

token_ids = generate_text_simple(
    model=gpt,
    idx=text_to_token_ids("Every effort moves you", tokenizer).to(device),
    max_new_tokens=10,
    context_size=GPT_CONFIG_124M["context_length"]
)

print("Output text:\n", token_ids_to_text(token_ids, tokenizer))

Output text:
 Every effort moves you forward.

The first step is to understand


In [14]:
token_ids = generate_text_simple(
    model=gpt,
    idx=text_to_token_ids(
        (
            "A summary of Newton's laws of motion is:"
            "\n1. Law of inertia:"
        ),
        tokenizer).to(device),
        max_new_tokens=30,
    context_size=GPT_CONFIG_124M["context_length"]
)


print("Output text:\n", token_ids_to_text(token_ids, tokenizer))

Output text:
 A summary of Newton's laws of motion is:
1. Law of inertia: The motion of the earth is proportional to the velocity of the sun.
2. Law of inertia: The motion of the earth is proportional to the


- We know that we loaded the model weights correctly because the model can generate coherent text; if we made even a small mistake, the mode would not be able to do that

# Exercise 1: Trying larger LLMs

- Load one of the larger LLMs and see how the output quality compares
- Ask it to answer specific instructions, for example to summarize text or correct the spelling of a sentence

---

I am using Lightning AI Studios with an L4 GPU and 16 CPUs.

In [15]:
# Use the largest model: 1558 M parameters
# This takes about 2.5 minutes on Lightning AI Studios
settings, params = download_and_load_gpt2(model_size="1558M", models_dir="gpt2")

# Define model configurations in a dictionary for compactness
model_configs = {
    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

# Copy the base configuration and update with specific model settings
model_name = "gpt2-xl (1558M)"  # Example model name
NEW_CONFIG = GPT_CONFIG_124M.copy()
NEW_CONFIG.update(model_configs[model_name])
NEW_CONFIG.update({"context_length": 1024, "qkv_bias": True})
gpt = GPTModel(NEW_CONFIG)
gpt.eval()

load_weights_into_gpt(gpt, params)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using:", device)
gpt.to(device)

checkpoint: 100%|██████████| 77.0/77.0 [00:00<00:00, 107kiB/s]
encoder.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 5.14MiB/s]
hparams.json: 100%|██████████| 91.0/91.0 [00:00<00:00, 144kiB/s]
model.ckpt.data-00000-of-00001: 100%|██████████| 6.23G/6.23G [02:07<00:00, 48.8MiB/s] 
model.ckpt.index: 100%|██████████| 20.7k/20.7k [00:00<00:00, 726kiB/s]
model.ckpt.meta: 100%|██████████| 1.84M/1.84M [00:00<00:00, 5.66MiB/s]
vocab.bpe: 100%|██████████| 456k/456k [00:00<00:00, 3.04MiB/s]
2024-09-10 03:26:25.928738: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 321644800 exceeds 10% of free system memory.


In [18]:
token_ids = generate_text_simple(
    model=gpt,
    idx=text_to_token_ids(
        (
            "A summary of Newton's laws of motion is:"
            "\n1. Law of inertia:"
        ),
        tokenizer).to(device),
        max_new_tokens=256,
    context_size=NEW_CONFIG["context_length"]
)


print("Output text:\n", token_ids_to_text(token_ids, tokenizer))

Output text:
 A summary of Newton's laws of motion is:
1. Law of inertia: The force of an object on another object is directly proportional to the product of the masses of the objects.
2. Law of conservation of energy: Energy is conserved.
3. Law of conservation of momentum: The momentum of an object is directly proportional to the product of the masses of the objects.
4. Law of conservation of angular momentum: The angular momentum of an object is directly proportional to the product of the masses of the objects.
5. Law of conservation of angular momentum: The angular momentum of an object is directly proportional to the product of the masses of the objects.
6. Law of conservation of angular momentum: The angular momentum of an object is directly proportional to the product of the masses of the objects.
7. Law of conservation of angular momentum: The angular momentum of an object is directly proportional to the product of the masses of the objects.
8. Law of conservation of angular mo

Ask the LLM to summarize a page from Wikipedia. For convenience, I use the [`wikipedia-api`](https://github.com/martin-majlis/Wikipedia-API) Python interface to the Wikipedia API (which we could also access using, e.g., `httpx` or `requests`).

In [None]:
%pip install wikipedia-api

In [21]:
# Fetch an article from Wikipedia and ask the LLM to summarize it
import wikipediaapi
wiki = wikipediaapi.Wikipedia(
    user_agent="LLMs from Scratch (ryan.parker2@outlook.com)",
    language='en',
    extract_format=wikipediaapi.ExtractFormat.WIKI,
)

page = wiki.page("Hubble_Deep_Field")

In [25]:
print(page.summary)

The Hubble Deep Field (HDF) is an image of a small region in the constellation Ursa Major, constructed from a series of observations by the Hubble Space Telescope. It covers an area about 2.6 arcminutes on a side, about one 24-millionth of the whole sky, which is equivalent in angular size to a tennis ball at a distance of 100 metres. The image was assembled from 342 separate exposures taken with the Space Telescope's Wide Field and Planetary Camera 2 over ten consecutive days between December 18 and 28, 1995.
The field is so small that only a few foreground stars in the Milky Way lie within it; thus, almost all of the 3,000 objects in the image are galaxies, some of which are among the youngest and most distant known. By revealing such large numbers of very young galaxies, the HDF has become a landmark image in the study of the early universe.
Three years after the HDF observations were taken, a region in the south celestial hemisphere was imaged in a similar way and named the Hubble 

In [26]:
token_ids = generate_text_simple(
    model=gpt,
    idx=text_to_token_ids(
         (
            f"Summarize the following: {page.text}"
            "\n\n" + ("-" * 50) + "\n\n"
            "Summary:\n"
        ),
        tokenizer
    ).to(device),
    max_new_tokens=256,
    context_size=NEW_CONFIG["context_length"]
)


print("Output text:\n", token_ids_to_text(token_ids, tokenizer))

Output text:
 Summarize the following: The Hubble Deep Field (HDF) is an image of a small region in the constellation Ursa Major, constructed from a series of observations by the Hubble Space Telescope. It covers an area about 2.6 arcminutes on a side, about one 24-millionth of the whole sky, which is equivalent in angular size to a tennis ball at a distance of 100 metres. The image was assembled from 342 separate exposures taken with the Space Telescope's Wide Field and Planetary Camera 2 over ten consecutive days between December 18 and 28, 1995.
The field is so small that only a few foreground stars in the Milky Way lie within it; thus, almost all of the 3,000 objects in the image are galaxies, some of which are among the youngest and most distant known. By revealing such large numbers of very young galaxies, the HDF has become a landmark image in the study of the early universe.
Three years after the HDF observations were taken, a region in the south celestial hemisphere was imaged

In [29]:
token_ids = generate_text_simple(
    model=gpt,
    idx=text_to_token_ids(
        (
            f"Summarize the following: {page.summary}"
            "\n\n" + ("-" * 50) + "\n\n"
            "Summary:\n"
        ),
        tokenizer
    ).to(device),
    max_new_tokens=256,
    context_size=NEW_CONFIG["context_length"]
)


print("Output text:\n", token_ids_to_text(token_ids, tokenizer))

Output text:
 Summarize the following: The Hubble Deep Field (HDF) is an image of a small region in the constellation Ursa Major, constructed from a series of observations by the Hubble Space Telescope. It covers an area about 2.6 arcminutes on a side, about one 24-millionth of the whole sky, which is equivalent in angular size to a tennis ball at a distance of 100 metres. The image was assembled from 342 separate exposures taken with the Space Telescope's Wide Field and Planetary Camera 2 over ten consecutive days between December 18 and 28, 1995.
The field is so small that only a few foreground stars in the Milky Way lie within it; thus, almost all of the 3,000 objects in the image are galaxies, some of which are among the youngest and most distant known. By revealing such large numbers of very young galaxies, the HDF has become a landmark image in the study of the early universe.
Three years after the HDF observations were taken, a region in the south celestial hemisphere was imaged

---

# 5) Loading pretrained weights (part 2; using LitGPT)

- Now, we are loading the weights using an open-source library called LitGPT
- LitGPT is fundamentally similar to the LLM code we implemented previously, but it is much more sophisticated and supports more than 20 different LLMs (Mistral, Gemma, Llama, Phi, and more)

# ⚡ LitGPT

**20+ high-performance LLMs with recipes to pretrain, finetune, deploy at scale.**

<pre>
✅ From scratch implementations     ✅ No abstractions    ✅ Beginner friendly   
✅ Flash attention                  ✅ FSDP               ✅ LoRA, QLoRA, Adapter
✅ Reduce GPU memory (fp4/8/16/32)  ✅ 1-1000+ GPUs/TPUs  ✅ 20+ LLMs            
</pre>

## Basic usage:

```
# ligpt [action] [model]
litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
litgpt  evaluate  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct
```


- You can learn more about LitGPT in the [corresponding GitHub repository](https://github.com/Lightning-AI/litgpt), that contains many tutorials, use cases, and examples


In [None]:
# pip install litgpt

In [1]:
from importlib.metadata import version

pkgs = [
    "litgpt",
    "torch",
]
for p in pkgs:
    print(f"{p} version: {version(p)}")

litgpt version: 0.4.3.dev0
torch version: 2.2.1+cu121


- First, let's see what LLMs are supported

In [2]:
!litgpt download list

repo_id: list
Please specify --repo_id <repo_id>. Available values:
codellama/CodeLlama-13b-hf
codellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-13b-Python-hf
codellama/CodeLlama-34b-hf
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-34b-Python-hf
codellama/CodeLlama-70b-hf
codellama/CodeLlama-70b-Instruct-hf
codellama/CodeLlama-70b-Python-hf
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Instruct-hf
codellama/CodeLlama-7b-Python-hf
databricks/dolly-v2-12b
databricks/dolly-v2-3b
databricks/dolly-v2-7b
EleutherAI/pythia-1.4b
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-12b
EleutherAI/pythia-12b-deduped
EleutherAI/pythia-14m
EleutherAI/pythia-160m
EleutherAI/pythia-160m-deduped
EleutherAI/pythia-1b
EleutherAI/pythia-1b-deduped
EleutherAI/pythia-2.8b
EleutherAI/pythia-2.8b-deduped
EleutherAI/pythia-31m
EleutherAI/pythia-410m
EleutherAI/pythia-410m-deduped
EleutherAI/pythia-6.9b
EleutherAI/pythia-6.9b-deduped
EleutherAI/pythia-70m
EleutherAI/pythia-70m-deduped
garage-bA

- We can then download an LLM via the following command

In [3]:
!litgpt download microsoft/Phi-3-mini-4k-instruct

repo_id: microsoft/Phi-3-mini-4k-instruct
Setting HF_HUB_ENABLE_HF_TRANSFER=1
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
model-00001-of-00002.safetensors: 100%|█████| 4.97G/4.97G [00:27<00:00, 181MB/s]
model-00002-of-00002.safetensors: 100%|█████| 2.67G/2.67G [00:08<00:00, 331MB/s]
Converting .safetensor files to PyTorch binaries (.bin)
checkpoints/microsoft/Phi-3-mini-4k-instruct/model-00002-of-00002.safetensors --> checkpoints/microsoft/Phi-3-mini-4k-instruct/model-00002-of-00002.bin
checkpoints/microsoft/Phi-3-mini-4k-instruct/model-00001-of-00002.safetensors --> checkpoints/microsoft/Phi-3-mini-4k-instruct/model-00001-of-00002.bin
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': PosixPath('checkpoints/microsoft/Phi-3-mini-4k-instruct'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Loading weights: model-00002-of-00002.bin: 100%|████████| 00:10<00:00,  9.61it/s
Saving c

In [None]:
!litgpt download microsoft/phi-2

repo_id: microsoft/phi-2
Setting HF_HUB_ENABLE_HF_TRANSFER=1
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
config.json: 100% 735/735 [00:00<00:00, 2.83MB/s]
generation_config.json: 100% 124/124 [00:00<00:00, 466kB/s]
model-00001-of-00002.safetensors: 100% 5.00G/5.00G [00:55<00:00, 89.9MB/s]
model-00002-of-00002.safetensors: 100% 564M/564M [00:04<00:00, 126MB/s]
model.safetensors.index.json: 100% 35.7k/35.7k [00:00<00:00, 21.1MB/s]
tokenizer.json: 100% 2.11M/2.11M [00:00<00:00, 35.8MB/s]
tokenizer_config.json: 100% 7.34k/7.34k [00:00<00:00, 24.4MB/s]
Converting .safetensor files to PyTorch binaries (.bin)
checkpoints/microsoft/phi-2/model-00002-of-00002.safetensors --> checkpoints/microsoft/phi-2/model-00002-of-00002.bin
checkpoints/microsoft/phi-2/model-00001-of-00002.safetensors --> checkpoints/microsoft/phi-2/model-00001-of-00002.bin
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': Posix

In [None]:
# This model did not perform well, I don't recommend using it
# !litgpt download EleutherAI/pythia-410m-deduped

repo_id: EleutherAI/pythia-410m-deduped
Setting HF_HUB_ENABLE_HF_TRANSFER=1
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
config.json: 100% 570/570 [00:00<00:00, 2.43MB/s]
pytorch_model.bin: 100% 911M/911M [00:24<00:00, 37.4MB/s]
tokenizer.json: 100% 2.11M/2.11M [00:00<00:00, 23.0MB/s]
tokenizer_config.json: 100% 396/396 [00:00<00:00, 1.72MB/s]
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': PosixPath('checkpoints/EleutherAI/pythia-410m-deduped'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Loading weights: pytorch_model.bin: 100% 100.0/100 [00:22<00:00,  4.52it/s]
Saving converted checkpoint to checkpoints/EleutherAI/pythia-410m-deduped


- And there's also a Python API to use the model

In [None]:
from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
# llm = LLM.load("microsoft/Phi-3-mini-4k-instruct")

llm.generate("Explain Newton's laws of motion with examples.")



" Answer: Newton's laws of motion are three principles that describe the relationship between the forces acting on an object and its motion. \n\nThe first law states that an object at rest will remain at rest and an object in motion will remain in motion"

In [31]:
from litgpt import LLM

# llm = LLM.load("microsoft/phi-2")
llm = LLM.load("microsoft/Phi-3-mini-4k-instruct")

llm.generate("Explain Newton's laws of motion with examples.")

"Newton's laws of motion, formulated by Sir Isaac Newton, are three fundamental laws that describe the relationship between a body and the forces acting upon it. These laws have been essential in advancing the field of classical mechanics.\n\n\n"

In [32]:
result = llm.generate(
    "Explain Newton's laws of motion with examples.",
    stream=True,
    max_new_tokens=512
)
for e in result:
    print(e, end="", flush=True)

 Newton's laws of motion describe the relationship between a body and the forces acting upon it, and the body's motion in response to those forces. They are three physical laws that together laid the foundation for classical mechanics.


1. Newton's First Law (Law of Inertia) states that an object at rest will stay at rest, and an object in motion will stay in motion at a constant velocity, unless acted upon by a net external force. This principle means that there is a natural tendency of objects to keep moving in a straight line at a constant speed or to remain still.


   Example: Consider a hockey puck sliding on a smooth ice surface. If no external forces like friction or another player's stick apply force on the puck, it will continue to slide indefinitely in the same direction at a constant speed.


2. Newton's Second Law of Motion states that the acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. The direction

---

# Exercise 2: Download an LLM

- Download and try out an LLM of your own choice (recommendation: 7B parameters or smaller)
- We will finetune the LLM in the next notebook
- You can also try out the `litgpt chat` command from the terminal

In [None]:
# Run this in a terminal (without the "!")
!litgpt chat "microsoft/Phi-3-mini-4k-instruct"

In [None]:
# You can also try quantizing. This runs 2x tokens/second, but
# the accuracy is slightly lower, with typos and occassional
# repetitive phrases.
!litgpt chat --quantize="bnb.nf4" "microsoft/Phi-3-mini-4k-instruct"