GitHub - mbzuai-nlp/ArabicMMLU

Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, and Timothy Baldwin

MBZUAI, Prince Sattam bin Abdulaziz University, KFUPM, Core42, NYU Abu Dhabi, The University of Melbourne

🔥 News

[2024-02.21] The preprint of our paper can be found here.

Introduction

We present ArabicMMLU, the first multi-task language understanding benchmark for Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA), and is carefully constructed by collaborating with native speakers in the region.

Data

Each question in the dataset is a multiple-choice question with up to 5 choices and only one choice as the correct answer. The dataset can be accessed in data folder, and Hugging Face.

import datasets
data = datasets.load_dataset('MBZUAI/ArabicMMLU')

Statistics

The data construction process involved a total of 10 Arabic native speakers from different countries: 6 internal workers (1 Jordanian, 1 Egyptian, 1 Lebanese, 1 from UAE, and 2 from KSA) and 4 external workers (3 Jordanian and 1 Egyptian). The resulting corpus is sourced from the eight countries, with Jordan, Egypt, and Palestine being the top three sources. We categorize the collected questions into different subject areas, including: (1) STEM (Science, Technology, Engineering, and Mathematics); (2) Social Science; (3) Humanities; (4) Arabic Language; and (5) Others.

Examples

These questions are written in Arabic.

Evaluation

We evaluate 22 open-source multilingual models, 11 open-source Arabic-centric models, and 2 closed-source models. We experimented with different prompts in Arabic and English, and found the English prompt is the best. Below is the examples of input with the prompt.

Zero-shot Evaluation

Few-shot Evaluation

Evaluation

The code for the evaluation of each model we used is in evaluate.py, and the code to run them is listed in run.sh.

Citation

@misc{koto2024arabicmmlu,
    title={"ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic"},
    author={"Fajri Koto and Haonan Li and Sara Shatanawi and Jad Doughman and Abdelrahman Boda Sadallah and Aisha Alraeesi and Khalid Almubarak and Zaid Alyafeai and Neha Sengupta and Shady Shehata and Nizar Habash and Preslav Nakov and Timothy Baldwin"},
    eprint={"2402.12840"},
    year={"2024"},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

License

The ArabicMMLU dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run.sh		run.sh
util_compute.py		util_compute.py
util_prompt.py		util_prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

README.md

README.md

evaluate.py

evaluate.py

requirements.txt

requirements.txt

run.sh

run.sh

util_compute.py

util_compute.py

util_prompt.py

util_prompt.py

Repository files navigation

MBZUAI, Prince Sattam bin Abdulaziz University, KFUPM, Core42, NYU Abu Dhabi, The University of Melbourne

🔥 News

Introduction

Data

Statistics

Examples

Evaluation

Zero-shot Evaluation

Few-shot Evaluation

Evaluation

Citation

License

About

Releases

Packages

Languages

mbzuai-nlp/ArabicMMLU

Folders and files

Latest commit

History

Repository files navigation

MBZUAI, Prince Sattam bin Abdulaziz University, KFUPM, Core42, NYU Abu Dhabi, The University of Melbourne

🔥 News

Introduction

Data

Statistics

Examples

Evaluation

Zero-shot Evaluation

Few-shot Evaluation

Evaluation

Citation

License

About

Resources

Stars

Watchers

Forks

Languages