# Simple Perplexity Calculations

Given a specific dataset please calculate the perplexity of a number of different models

In [1]:
import evaluate
import random
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


## Create the dataset

In [2]:
dataset : list[str] = load_dataset("imdb", split="test").shuffle(seed=42).select(range(1))["text"] # (seed=42)

dataset[0]

Found cached dataset imdb (/Users/addisonhanrattie/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0)
Loading cached shuffled indices for dataset at /Users/addisonhanrattie/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-c1eaa46e94dfbfd3.arrow


"<br /><br />When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong?<br /><br />Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness!<br /><br />I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there.<br /><br />Since

## Generate a list of models we will use

In [37]:
# models = ["cerebras/Cerebras-GPT-111M", "cerebras/Cerebras-GPT-256M", "cerebras/Cerebras-GPT-590M", "EleutherAI/pythia-70m", "EleutherAI/pythia-160m", "EleutherAI/pythia-410m", "EleutherAI/gpt-neo-125m"]
models = [ # Jagged comments represent models that are too large to fit on my computer
    "cerebras/Cerebras-GPT-111M", "cerebras/Cerebras-GPT-256M", "cerebras/Cerebras-GPT-590M", "cerebras/Cerebras-GPT-1.3B", "cerebras/Cerebras-GPT-2.7B", "cerebras/Cerebras-GPT-6.7B", # "cerebras/Cerebras-GPT-13.7B",
    "EleutherAI/gpt-neo-125m", "EleutherAI/gpt-neo-1.3B", "EleutherAI/gpt-neo-2.7B", "EleutherAI/gpt-j-6b", # "EleutherAI/gpt-neox-20b",
    "EleutherAI/pythia-70m", "EleutherAI/pythia-160m", "EleutherAI/pythia-410m", # "EleutherAI/pythia-1b", "EleutherAI/pythia-1.4b", "EleutherAI/pythia-2.8b", "EleutherAI/pythia-6.9b", # "EleutherAI/pythia-12b",
    "EleutherAI/pythia-70m-deduped", "EleutherAI/pythia-160m-deduped", "EleutherAI/pythia-410m-deduped", "EleutherAI/pythia-1b-deduped", "EleutherAI/pythia-1.4b-deduped", "EleutherAI/pythia-2.8b-deduped", "EleutherAI/pythia-6.9b-deduped", # "EleutherAI/pythia-12b-deduped",
    "mosaicml/mpt-7b", # "mosaicml/mpt-30b",
    "tiiuae/falcon-7b", # "tiiuae/falcon-40b", "tiiuae/falcon-180b"
    ]

len(models)

6

## Run the test

In [38]:
perplexity = evaluate.load("perplexity", module_type="metric")

In [39]:
perplexities = []
for model in models: # CPU 40.0 vs GPU 30.8
	print(model)
	result = perplexity.compute(predictions=dataset, model_id=model, add_start_token=False, device="cpu") #device = CPU
	perplexities.append(result["mean_perplexity"])

results = dict(zip(models, perplexities))
results

cerebras/Cerebras-GPT-111M


Downloading pytorch_model.bin:  14%|█▍        | 1.49G/10.7G [24:07<2:29:51, 1.03MB/s]
Downloading pytorch_model.bin:   4%|▍         | 482M/10.7G [18:45<6:38:54, 429kB/s]
Downloading pytorch_model.bin:  16%|█▌        | 1.73G/10.7G [16:56<1:28:14, 1.70MB/s]
Downloading pytorch_model.bin:   1%|          | 62.9M/10.7G [13:21<37:48:30, 78.4kB/s]
Downloading pytorch_model.bin:   2%|▏         | 189M/10.7G [13:12<12:18:00, 238kB/s]
Downloading model.safetensors:  34%|███▎      | 1.78G/5.31G [11:42<23:12, 2.54MB/s]
Downloading model.safetensors:   8%|▊         | 398M/5.31G [08:15<1:41:48, 805kB/s]
Downloading model.safetensors:  36%|███▌      | 755M/2.09G [06:57<12:19, 1.81MB/s]
Downloading model.safetensors:  79%|███████▉  | 724M/911M [04:30<01:10, 2.68MB/s]
Downloading model.safetensors:  83%|████████▎ | 755M/911M [03:06<00:38, 4.04MB/s]
Using pad_token, but it is not set yet.
100%|██████████| 1/1 [00:00<00:00,  2.59it/s]


cerebras/Cerebras-GPT-256M


Using pad_token, but it is not set yet.
100%|██████████| 1/1 [00:00<00:00,  1.62it/s]


cerebras/Cerebras-GPT-590M


Using pad_token, but it is not set yet.
100%|██████████| 1/1 [00:01<00:00,  1.09s/it]


cerebras/Cerebras-GPT-1.3B


Using pad_token, but it is not set yet.
100%|██████████| 1/1 [00:02<00:00,  2.47s/it]


cerebras/Cerebras-GPT-2.7B


Downloading pytorch_model.bin:  41%|████▏     | 4.45G/10.7G [16:17<1:19:22, 1.32MB/s]

In [None]:
import torch
torch.backends.mps.is_available()