Perplexity Script and Notebook #55

Stillerman · 2023-05-14T21:03:29Z

Here is a notebook and script that can be used to calculate perplexity. The notebook runs on colab, but I cannot test the script locally so that will need to be tested. Should output perplexity results toperplexity-results.jsonl.

Should be able to run

python3 calculate_perplexity.py --model Multi-Domain-Expert-Layers/expert-arxiv --dataset Multi-Domain-Expert-Layers/arxiv --split validation_domain

and test any model/dataset/split we want.

Stillerman · 2023-05-15T14:41:50Z

Perplexity script produced the following results

Model	arxiv	freelaw	github
expert-arxiv	6.588	6.077	5.675
expert-freelaw	6.744	6.038	5.673
expert-github	6.827	6.145	5.680
pythia-1b-deduped	6.560	6.013	5.604

All makes sense except github expert does worse on github dataset. Is this possible @mrcabbage972

Stillerman · 2023-05-17T14:54:03Z

We are still getting unexpected results but we believe it is not due to a bug in the perplexity script so undrafted. Details of weird perplexities in the issue #53

Stillerman added 13 commits May 14, 2023 16:10

WIP: perplexity script

788c756

run on whole dataset

32e5c0f

output to JSONL

7b7ffae

add tokenizer to result

9712427

Added perplexity notebook

20febff

perplexity readme

ff2acfb

merge upstream

e2b3fd0

fix pre-commit hooks

ac1b141

and flake8 too

592ee43

actually save the time

5e636b3

calc perplex job

3d653e5

no echo

74b87a3

Added shebang

26f4bdf

Stillerman added 8 commits May 15, 2023 21:51

thorough perplexity test

cf5df4b

do done

7da2574

max samples and slurm script

6cf9d05

truncate maybe

4e3506a

bigger sample default

86a7a00

int

a71f1e0

clean up

15434e9

precommit fixes

acf6858

Stillerman changed the title ~~Draft: Perplexity Script and Notebook~~ Perplexity Script and Notebook May 17, 2023

mrcabbage972 approved these changes May 17, 2023

View reviewed changes

mrcabbage972 merged commit effe662 into huu4ontocord:main May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perplexity Script and Notebook #55

Perplexity Script and Notebook #55

Stillerman commented May 14, 2023

Stillerman commented May 15, 2023 •

edited

Loading

Stillerman commented May 17, 2023

Perplexity Script and Notebook #55

Perplexity Script and Notebook #55

Conversation

Stillerman commented May 14, 2023

Stillerman commented May 15, 2023 • edited Loading

Stillerman commented May 17, 2023

Stillerman commented May 15, 2023 •

edited

Loading