Serbian LLM eval 🇷🇸

Note: it can likely also be used for other HBS languages (Croatian, Bosnian, Montenegrin) - support for these languages is on my roadmap (see future work).

What is currently covered:

Common sense reasoning: Hellaswag, Winogrande, PIQA, OpenbookQA, ARC-Easy, ARC-Challenge
World knowledge: NaturalQuestions, TriviaQA
Reading comprehension: BoolQ

You can find the Serbian LLM eval dataset on HuggingFace. For more details on how the dataset was built see this technical report on Weights & Biases. The branch serb_eval_translate was used to do machine translation, while serb_eval_refine was used to do further refinement using GPT-4.

Please email me at gordicaleksa at gmail com in case you're willing to sponsor the projects I'm working on.

You will get the credits and eternal glory. :)

In Serbian:

I na srpskom, ukoliko ste voljni da finansijski podržite ovaj poduhvat korišćenja ChatGPT da se dobiju kvalitetniji podaci, i koji je od nacionalnog/regionalnog interesa, moj email je gordicaleksa at gmail com. Dobićete priznanje na ovom projektu da ste sponzor (i postaćete deo istorije). :)

Dalje ovaj projekat će pomoći da se pokrene lokalni large language model ekoksistem.

Run the evals

Step 1. Create Python environment

git clone https://github.com/gordicaleksa/lm-evaluation-harness-serbian
cd lm-evaluation-harness-serbian
pip install -e .

Currently you might need to manually install also the following packages (do pip install): sentencepiece, protobuf, and one more (submit PR if you hit this).

Step 2. Tweak the launch json and run

--model_args <- any name from HuggingFace or a path to HuggingFace compatible checkpoint will work

--tasks <- pick any subset of these arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande,nq_open,triviaqa

--num_fewshot <- set the number of shots, should be 0 for all tasks except for nq_open and triviaqa (these should be run in 5-shot manner if you want to compare against Mistral 7B)

--batch_size <- depending on your available VRAM set this as high as possible to get the max speed up

Future work:

Cover popular aggregated results benchmarks: MMLU, BBH, AGI Eval and math: GSM8K, MATH
Explicit support for other HBS languages.

Sponsors

Thanks to all of our sponsor(s) for donating for the yugoGPT (first 7B HBS LLM) & Serbian LLM eval projects.

yugoGPT base model will soon be open-source under permissive Apache 2.0 license.

Platinum sponsors

Ivan (anon)

Gold sponsors

Silver sponsors

Also a big thank you to the following individuals:

Slobodan Marković - for spreading the word! :)
Aleksander Segedi - for help around bookkeeping

Credits

A huge thank you to the following technical contributors who helped translate the evals from English into Serbian:

License

Apache 2.0

Citation

@article{serbian-llm-eval,
  author    = "Gordić Aleksa",
  title     = "Serbian LLM Eval",
  year      = "2023"
  howpublished = {\url{https://huggingface.co/datasets/gordicaleksa/serbian-llm-eval-v1}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,532 Commits
.vscode		.vscode
docs		docs
lm_eval		lm_eval
scripts		scripts
templates		templates
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
ignore.txt		ignore.txt
main.py		main.py
pile_statistics.json		pile_statistics.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serbian LLM eval 🇷🇸

What is currently covered:

Run the evals

Step 1. Create Python environment

Step 2. Tweak the launch json and run

Future work:

Sponsors

Platinum sponsors

Gold sponsors

Silver sponsors

Credits

License

Citation

About

Contributors 82

Languages

License

gordicaleksa/serbian-llm-eval

Folders and files

Latest commit

History

Repository files navigation

Serbian LLM eval 🇷🇸

What is currently covered:

Run the evals

Step 1. Create Python environment

Step 2. Tweak the launch json and run

Future work:

Sponsors

Platinum sponsors

Gold sponsors

Silver sponsors

Credits

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 82

Languages