Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chains/benchmarks, other LLMs #30

Merged
merged 27 commits into from
Jun 13, 2023
Merged

chains/benchmarks, other LLMs #30

merged 27 commits into from
Jun 13, 2023

Conversation

dovinmu
Copy link
Collaborator

@dovinmu dovinmu commented Jun 10, 2023

(closing #28 in favor of this)

At minimum before we merge this branch:

  • generalize the llm.py op to route to different specific LLM sources
  • generic bigbench binary classification script

for the binary classification script:

  • accepts LLM ID and task name as input
  • calculates at least one accuracy metric
  • puts results somewhere useful (json file?)

these are nice to haves, basically they'd make it so we were using the bigbench tasks more as intended:

  • (blocked) need to be able to extract multiple key/values from json
  • (blocked) use "preferred_metric" from the task.json spec to decide metric

If #27 is resolved then we could compute multiple accuracy metrics. And we could also optionally explore having another script that runs the entire slate of tasks on a set of models.

@dovinmu dovinmu mentioned this pull request Jun 10, 2023
3 tasks
@dovinmu dovinmu marked this pull request as draft June 10, 2023 18:35
aipl/ops/llm.py Outdated Show resolved Hide resolved
@dovinmu dovinmu marked this pull request as ready for review June 11, 2023 03:05
@dovinmu
Copy link
Collaborator Author

dovinmu commented Jun 11, 2023

I ended up writing a few more ops for this PR, some of which seem obvious (!csv-parse) and some of which I expect will change (the !metrics- ones). I'm especially interested in getting the changes to llm.py merged so we have a way of calling LLMs from different sources

aipl/ops/csv.py Outdated Show resolved Hide resolved
aipl/ops/csv.py Outdated Show resolved Hide resolved
aipl/ops/llm.py Outdated Show resolved Hide resolved
s = 'model,task\n' + "\n".join([model+","+task for model in models for task in tasks])
with open('model-task.csv', 'w') as f:
f.write(s)
!csv-parse model-task.csv
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block is clever, but we should figure out how a !cross or something should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally, if you're doing that separately then we can hold this PR off until that's merged

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be merged now!

!format
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/{task}/task.json
!fetch-url
# name=name description=description
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this not working is that we can't automatically construct good prompts from the JSON file, since it has context for what the LLM is actually classifying

@dovinmu dovinmu merged commit 1b8e70e into saulpw:develop Jun 13, 2023
@dovinmu dovinmu deleted the benchmarks branch June 13, 2023 23:06
saulpw pushed a commit that referenced this pull request Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants