chains/benchmarks, other LLMs #30

dovinmu · 2023-06-10T18:34:39Z

(closing #28 in favor of this)

At minimum before we merge this branch:

generalize the llm.py op to route to different specific LLM sources
generic bigbench binary classification script

for the binary classification script:

accepts LLM ID and task name as input
calculates at least one accuracy metric
puts results somewhere useful (json file?)

these are nice to haves, basically they'd make it so we were using the bigbench tasks more as intended:

(blocked) need to be able to extract multiple key/values from json
(blocked) use "preferred_metric" from the task.json spec to decide metric

If #27 is resolved then we could compute multiple accuracy metrics. And we could also optionally explore having another script that runs the entire slate of tasks on a set of models.

…suite of big bench

aipl/ops/llm.py

dovinmu · 2023-06-11T16:34:15Z

I ended up writing a few more ops for this PR, some of which seem obvious (!csv-parse) and some of which I expect will change (the !metrics- ones). I'm especially interested in getting the changes to llm.py merged so we have a way of calling LLMs from different sources

aipl/ops/csv.py

aipl/ops/llm.py

saulpw · 2023-06-12T00:05:01Z

chains/benchmarks/bigbench-binary-classification.aipl

+s = 'model,task\n' + "\n".join([model+","+task for model in models for task in tasks])
+with open('model-task.csv', 'w') as f:
+    f.write(s)
+!csv-parse model-task.csv


This whole block is clever, but we should figure out how a !cross or something should work.

totally, if you're doing that separately then we can hold this PR off until that's merged

should be merged now!

dovinmu · 2023-06-12T19:51:04Z

chains/benchmarks/bigbench-binary-classification.aipl

+!format
+https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/{task}/task.json
+!fetch-url
+# name=name description=description 


The problem with this not working is that we can't automatically construct good prompts from the JSON file, since it has context for what the LLM is actually classifying

dovinmu added 11 commits June 6, 2023 22:22

add benchmark start and some geese

1ccba2f

query open source LLMs

0fa1914

add err msg

51a4d0f

add targets for json manipulation

95f87ec

Merge branch 'develop' into benchmarks

d0dec2e

allow control of token output for gooseai

d0238b5

handle model as input (WIP)

e49ae30

successfully computing accuracy metrics for complete irony detection …

916cdbe

…suite of big bench

add start of narrative understanding

d0d6549

await language feature: multiple key/val pairs extracted from json

a7d35ce

return json.py as it was pending update

9c818e3

dovinmu mentioned this pull request Jun 10, 2023

chains/benchmarks #28

Closed

3 tasks

dovinmu marked this pull request as draft June 10, 2023 18:35

saulpw reviewed Jun 10, 2023

View reviewed changes

aipl/ops/llm.py Outdated Show resolved Hide resolved

dovinmu added 2 commits June 10, 2023 12:15

remove unnecessary whitespace changes

6497869

remove unnecessary whitespace changes redux

09ec777

dovinmu mentioned this pull request Jun 10, 2023

tracking tokenization and cost for different LLMs #31

Open

dovinmu added 8 commits June 10, 2023 12:52

add routing functions for completions and embeddings

db57574

Merge branch 'develop' into benchmarks

5a802f3

put in dedicated dir

422025e

remove script irrelevant to this PR

5581569

remove dupe

a2e5502

remove whitespace change

683201b

compute precision for multiple models in one script

5dfb58d

add csv loader

72d16f2

dovinmu mentioned this pull request Jun 11, 2023

make all combinations of elements in multiple columns #34

Closed

dovinmu added 2 commits June 10, 2023 19:25

load model-task pairs from csv, add balanced accuracy to metrics

7aba746

make matrics ops file

d85aab2

dovinmu marked this pull request as ready for review June 11, 2023 03:05

allow for commenting out models / tasks in txt files

137844b

dovinmu requested review from saulpw, cthulahoops and anjakefala June 11, 2023 16:23

saulpw reviewed Jun 12, 2023

View reviewed changes

dovinmu added 3 commits June 12, 2023 12:23

add pricing for gooseai models

9150636

switch to yielding rows

54ccbf5

fix typing

3bebe0d

dovinmu commented Jun 12, 2023

View reviewed changes

dovinmu mentioned this pull request Jun 13, 2023

update bigbench-binary-classifier to use !cross #38

Open

dovinmu merged commit 1b8e70e into saulpw:develop Jun 13, 2023

dovinmu deleted the benchmarks branch June 13, 2023 23:06

saulpw pushed a commit that referenced this pull request Jun 30, 2023

remove llm-local until PR #30 goes through

c741822

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chains/benchmarks, other LLMs #30

chains/benchmarks, other LLMs #30

dovinmu commented Jun 10, 2023 •

edited

Loading

dovinmu commented Jun 11, 2023

saulpw Jun 12, 2023

dovinmu Jun 12, 2023

saulpw Jun 12, 2023

dovinmu Jun 12, 2023

chains/benchmarks, other LLMs #30

chains/benchmarks, other LLMs #30

Conversation

dovinmu commented Jun 10, 2023 • edited Loading

dovinmu commented Jun 11, 2023

saulpw Jun 12, 2023

Choose a reason for hiding this comment

dovinmu Jun 12, 2023

Choose a reason for hiding this comment

saulpw Jun 12, 2023

Choose a reason for hiding this comment

dovinmu Jun 12, 2023

Choose a reason for hiding this comment

dovinmu commented Jun 10, 2023 •

edited

Loading