-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chains/benchmarks, other LLMs #30
Conversation
…suite of big bench
I ended up writing a few more ops for this PR, some of which seem obvious (!csv-parse) and some of which I expect will change (the !metrics- ones). I'm especially interested in getting the changes to llm.py merged so we have a way of calling LLMs from different sources |
s = 'model,task\n' + "\n".join([model+","+task for model in models for task in tasks]) | ||
with open('model-task.csv', 'w') as f: | ||
f.write(s) | ||
!csv-parse model-task.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole block is clever, but we should figure out how a !cross
or something should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally, if you're doing that separately then we can hold this PR off until that's merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be merged now!
!format | ||
https://raw.githubusercontent.com/google/BIG-bench/main/bigbench/benchmark_tasks/{task}/task.json | ||
!fetch-url | ||
# name=name description=description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this not working is that we can't automatically construct good prompts from the JSON file, since it has context for what the LLM is actually classifying
(closing #28 in favor of this)
At minimum before we merge this branch:
for the binary classification script:
these are nice to haves, basically they'd make it so we were using the bigbench tasks more as intended:
If #27 is resolved then we could compute multiple accuracy metrics. And we could also optionally explore having another script that runs the entire slate of tasks on a set of models.