Skip to content

Latest commit

 

History

History
4 lines (2 loc) · 560 Bytes

README.md

File metadata and controls

4 lines (2 loc) · 560 Bytes

Evaluation

We mainly employ three evaluation datasets to assess the performance of our data selection pipeline: MMLU, Tydiqa, and BBH. We use the evaluation pipeline open-instruct. We keep a version we use to evaluate the models in eval folder. To evaluate a trained model, please check out the eval_mmlu.sh, eval_tydiqa.sh, and eval_bbh.sh scripts in the evaluation directory. These scripts contain the necessary commands to evaluate the model on the respective datasets.