GitHub

Code for the paper "tinyBenchmarks: evaluating LLMs with fewer examples"

On this repository, you will find the code to reproduce the results in

Maia Polo, Felipe, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, and Mikhail Yurochkin. "tinyBenchmarks: evaluating LLMs with fewer examples." arXiv preprint arXiv:2402.14992 (2024)

Use our methods/tools: If you are interested in using our methods/tools, please check the tinyBenchmarks GitHub repository instead.

Use our datasets from the paper: If you are interested in using the datasets we used in this work, please find them in the "data" folder. They are all in ".pickle" format and can easily be loaded into Python. If you want to check the code used to generate/process the datasets, please check the "generating_data" folder.

Running our code:

Install requirements.txt with pip and our version of py-IRT (please check here).
Please run our main experiments using the "run_experiment.py" file. You will need to pass some parameters to the script.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
data		data
generating_data		generating_data
plots		plots
results		results
README.md		README.md
acc.py		acc.py
experiments.py		experiments.py
generate_plots.ipynb		generate_plots.ipynb
generate_plots2.ipynb		generate_plots2.ipynb
generate_plots3.ipynb		generate_plots3.ipynb
irt.py		irt.py
plots.py		plots.py
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
selection.py		selection.py
tinyBenchmarks.ipynb		tinyBenchmarks.ipynb
tinyBenchmarks.pkl		tinyBenchmarks.pkl
utils.py		utils.py

felipemaiapolo/efficbench

Folders and files

Latest commit

History

Repository files navigation

Code for the paper "tinyBenchmarks: evaluating LLMs with fewer examples"

About

Resources

Stars

Watchers

Forks

Languages