# Simple Perplexity Calculations

Given a specific dataset please calculate the perplexity of a number of different models

In [1]:
import evaluate
import random
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


## Create the dataset

In [2]:
dataset : list[str] = load_dataset("imdb", split="test").shuffle(seed=42).select(range(1))["text"] # (seed=42)

dataset[0]

Found cached dataset imdb (/Users/addisonhanrattie/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0)
Loading cached shuffled indices for dataset at /Users/addisonhanrattie/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-c1eaa46e94dfbfd3.arrow


"<br /><br />When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong?<br /><br />Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness!<br /><br />I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there.<br /><br />Since

## Generate a list of models we will use

In [3]:
models = [ # Jagged comments represent models that are too large to fit on my computer
    "cerebras/Cerebras-GPT-111M", "cerebras/Cerebras-GPT-256M", "cerebras/Cerebras-GPT-590M", "cerebras/Cerebras-GPT-1.3B", "cerebras/Cerebras-GPT-2.7B", "cerebras/Cerebras-GPT-6.7B", # "cerebras/Cerebras-GPT-13.7B",
    "EleutherAI/gpt-neo-125m", "EleutherAI/gpt-neo-1.3B", "EleutherAI/gpt-neo-2.7B", "EleutherAI/gpt-j-6b", # "EleutherAI/gpt-neox-20b",
    "EleutherAI/pythia-70m", "EleutherAI/pythia-160m", "EleutherAI/pythia-410m", "EleutherAI/pythia-1b", "EleutherAI/pythia-1.4b", "EleutherAI/pythia-2.8b", "EleutherAI/pythia-6.9b", # "EleutherAI/pythia-12b",
    "EleutherAI/pythia-70m-deduped", "EleutherAI/pythia-160m-deduped", "EleutherAI/pythia-410m-deduped", "EleutherAI/pythia-1b-deduped", "EleutherAI/pythia-1.4b-deduped", "EleutherAI/pythia-2.8b-deduped", "EleutherAI/pythia-6.9b-deduped", # "EleutherAI/pythia-12b-deduped",
    "mosaicml/mpt-7b", # "mosaicml/mpt-30b",
    "tiiuae/falcon-7b", # "tiiuae/falcon-40b", "tiiuae/falcon-180b"
    "bigscience/bloom-560m", "bigscience/bloom-1b1", "bigscience/bloom-1b7", "bigscience/bloom-3b", "bigscience/bloom-7b1", # "bigscience/bloom",
    "openlm-research/open_llama_3b", "openlm-research/open_llama_7b", # "openlm-research/open_llama_13b",
    "openlm-research/open_llama_3b_v2", "openlm-research/open_llama_7b_v2",
    ]

len(models)

9

## Run the test

In [4]:
perplexity = evaluate.load("perplexity", module_type="metric")

In [5]:
perplexities = []
for model in models: # CPU 40.0 vs GPU 30.8
	print(model)
	result = perplexity.compute(predictions=dataset, model_id=model, add_start_token=False, device="cpu") #device = CPU
	perplexities.append(result["mean_perplexity"])

results = dict(zip(models, perplexities))
results

bigscience/bloom-560m


Downloading (…)lve/main/config.json: 100%|██████████| 693/693 [00:00<00:00, 2.89MB/s]
Downloading model.safetensors: 100%|██████████| 1.12G/1.12G [01:40<00:00, 11.1MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 504kB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:00<00:00, 15.3MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 458kB/s]
100%|██████████| 1/1 [00:01<00:00,  1.31s/it]


bigscience/bloom-1b1


Downloading (…)lve/main/config.json: 100%|██████████| 693/693 [00:00<00:00, 3.07MB/s]
Downloading model.safetensors: 100%|██████████| 2.13G/2.13G [02:31<00:00, 14.0MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 1.44MB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:00<00:00, 15.5MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 353kB/s]
100%|██████████| 1/1 [00:01<00:00,  1.68s/it]


bigscience/bloom-1b7


Downloading (…)lve/main/config.json: 100%|██████████| 715/715 [00:00<00:00, 4.80MB/s]
Downloading model.safetensors: 100%|██████████| 3.44G/3.44G [04:05<00:00, 14.0MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 412kB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:02<00:00, 6.15MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 114kB/s]
100%|██████████| 1/1 [00:01<00:00,  1.94s/it]


bigscience/bloom-3b


Downloading (…)lve/main/config.json: 100%|██████████| 693/693 [00:00<00:00, 10.5MB/s]
Downloading model.safetensors: 100%|██████████| 6.01G/6.01G [11:23<00:00, 8.78MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 222/222 [00:00<00:00, 298kB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:00<00:00, 16.5MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 258kB/s]
100%|██████████| 1/1 [00:04<00:00,  4.76s/it]


bigscience/bloom-7b1


Downloading (…)lve/main/config.json: 100%|██████████| 739/739 [00:00<00:00, 3.91MB/s]
Downloading (…)model.bin.index.json: 100%|██████████| 27.5k/27.5k [00:00<00:00, 23.1MB/s]
Downloading (…)l-00001-of-00002.bin: 100%|██████████| 9.98G/9.98G [12:31<00:00, 13.3MB/s]
Downloading shards:  50%|█████     | 1/2 [12:31<12:31, 751.39s/it]

In [None]:
import torch
torch.backends.mps.is_available()