<a href="https://colab.research.google.com/github/joshtimmons/llm-demos/blob/main/difference-between-models/01_number_of_parameters.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Number of Parameters

Our first demo is to use the same generative prompt against different sizes of the same model. We're using gpt2, an older model, because it is possible to get smaller models with fewer parameters.

GPT2 is the direct ancestor of GPT3, GPT4, and ChatGPT. It's much smaller and more experimental.

* small: 124M parameters
* medium: 355M parameters
* large: 774M parameters
* xl: 1.5M parameters

We'll also use openllama 2 for comparison at 3B and 7B parameters.

Before you start, ensure you have at least a basic GPU for this model.

1. Click the "Runtime" menu
2. Click the "Change runtime type" option
3. Select the "T4" GPU. That's enough for the GPT2 and the 3B openllama model.




First we just need to install some libraries

In [1]:
!pip install transformers sentence-transformers einops sentencepiece accelerate

Collecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.23.0-py3-none-any.whl (

Our first demo is to use the same generative prompt against different sizes of the same model. We're using gpt2, an older model, because it is possible to get smaller models with fewer parameters.

 * small: 124M parameters
 * medium: 355M parameters
 * large: 774M parameters
 * xl: 1.5M parameters

 We'll also use openllama 2 for comparison at 3B and 7B parameters.


In [3]:
# Use a pipeline as a high-level helper
from transformers import pipeline
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig

In [2]:
prompt = "It was a dark and stormy "

In [4]:
# Create a base gpt2 pipeline and execute our "it was a dark and stormy" prompt

pipe = pipeline("text-generation", model="gpt2")
text = pipe(prompt)[0]["generated_text"]
print(text)

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was a dark and stormy vernal evening—and the sun was rising with so much heat that it was the best time to sleep and lighten our load at night.

No man ventured so far as to find the fire to


In [5]:
# Step up to the medium size model and run it again

pipe = pipeline("text-generation", model="gpt2-medium")
text = pipe(prompt)[0]["generated_text"]
print(text)

Downloading (…)lve/main/config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was a dark and stormy vernal month in which I made another trip to see the sun, just before winter sets. This time it was a bit later but still, it was quite exciting. A bit of what I experienced was to


In [6]:
# Step up to the large model and run it again.

pipe = pipeline("text-generation", model="gpt2-large")
text = pipe(prompt)[0]["generated_text"]
print(text)

Downloading (…)lve/main/config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.25G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was a dark and stormy 」

「Eh heee! I wanted to be by your side」

Ria laughed from the other side and hugged Celia, while she felt very comfortable in her arms.

「


In [None]:
# XL is the largest GPT2 model

pipe = pipeline("text-generation", model="gpt2-xl")
text = pipe(prompt)[0]["generated_text"]
print(text)

Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

In [None]:
# let's switch to open_llama_3b_v2 for our first bigger model. This one is 3B parameters vs 1.5M on GPT2 XXL

pipe = pipeline("text-generation", model="openlm-research/open_llama_3b_v2")
text = pipe(prompt)[0]["generated_text"]
print(text)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


It was a dark and stormy 19th century night in the city of New York.
