<a href="https://colab.research.google.com/github/melsiddieg/New_Collabs/blob/master/Copy_of_Finetuning_ORCA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Fin-tunig a model on ORCA
<img src="https://orca.dlnlp.ai/assets/orca_logo.png" width="50%"/>

**ORCA** is a large-scale Arabic Language Understanding Evaluation Benchmark

**Website:** https://orcs.dlnlp.ai/

## Intordction
ORCA has 29 tasks that can categorze into three types. Token-level, Text- Cassificaion and Question Answering traniers. In this notebook, we show how to use these trainiers. As examples, we fine-tune `MARBERTv2` on ORCA's tasks for ONLY 3 epochs as well.

##Install Requirments

In [None]:
!pip install datasets transformers seqeval accelerate

## Request Access ##
To obtain access to the ORCA benchmark on Huggingface, follow the following steps:
- Login on your Haggingface account

  <img src="https://raw.githubusercontent.com/UBC-NLP/orca/main/orca_request1.png" width="55%"/>
- Request access

  <img src="https://raw.githubusercontent.com/UBC-NLP/orca/main/orca_request2.png" width="55%"/>

In [None]:
!huggingface-cli login --token "YOUR_TOKEN"

## Download the fine-tuning code

In [None]:
!git clone https://github.com/UBC-NLP/orca.git

Cloning into 'orca'...
remote: Enumerating objects: 220, done.[K
remote: Counting objects: 100% (85/85), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 220 (delta 62), reused 11 (delta 11), pack-reused 135[K
Receiving objects: 100% (220/220), 93.41 KiB | 6.67 MiB/s, done.
Resolving deltas: 100% (128/128), done.


In [None]:
%cd /content/orca/fine-tuning_code

/content/orca/fine-tuning_code


## Fine-tuning a model on token-level taks
ORCA has 4 token-level datasets, they are `arabic-ner`, `aqmar-ner`, `msa-pos`, and `dialect-pos'

In [None]:
task='dialect-pos' #task should be on of these tasks ['arabic-ner', 'aqmar-ner', 'msa-pos', 'dialect-pos']
cache_dir='./cache'
le=5e-5
batch_size=32
model_name='UBC-NLP/ARBERTv2'
output_dir='./output/'+task
epochs_num=1
metric_for_best_model="eval_f1"
seed=41


In [None]:
!python run_orca_tokens_trainer.py \
        --dataset_name "UBC-NLP/orca" \
        --dataset_config_name $task \
        --cache_dir $cache_dir \
        --learning_rate $le \
        --per_device_train_batch_size $batch_size \
        --per_device_eval_batch_size $batch_size \
        --model_name_or_path  $model_name \
        --output_dir $output_dir \
        --num_train_epochs $epochs_num \
        --do_train \
        --do_eval \
        --do_predict \
        --evaluation_strategy epoch \
        --logging_strategy epoch \
        --save_strategy epoch \
        --load_best_model_at_end \
        --metric_for_best_model $metric_for_best_model \
        --greater_is_better True \
        --save_total_limit 3 \
        --seed $seed --overwrite_output_dir

## Fine-tuning a model on text classification taks
ORCA has 24 token-level datasets, they are
`sentiment`, `dialect-region`, `dialect-binary`, `dialect-country`, `topic`, `ans-claim`, `machine-generation`,`age`, `gender`, `adult`, `dangerous`, `emotion`,`hate-speech`,`offensive`,`irony`,`sarcasm`,`abusive`, `wsd`, `mq2q`, `ans-stance`, `baly-stance`,`xlni`,`sts`, and `emotion-reg`

In [None]:
task='hate-speech'
cache_dir='./cache'
le=5e-5
batch_size=8
model_name='UBC-NLP/ARBERTv2'
output_dir='./output5/'+task
epochs_num=1
metric_for_best_model="eval_f1" # eval_spearmanr is the evaluation matric for these tasks ['sts', 'emotion-reg'], and eval_f1 for the rest.
seed=41
sq=512

In [None]:
!python run_orca_text_classification_wz_reg.py \
        --dataset_name "UBC-NLP/orca" \
        --task_name $task \
        --cache_dir $cache_dir \
        --learning_rate $le \
        --per_device_train_batch_size $batch_size --per_device_eval_batch_size $batch_size \
        --model_name_or_path  $model_name \
        --output_dir $output_dir \
        --num_train_epochs $epochs_num \
        --do_train \
        --do_eval \
        --do_predict \
        --max_seq_length $sq \
        --evaluation_strategy epoch --logging_strategy epoch --save_strategy epoch \
        --load_best_model_at_end --metric_for_best_model $metric_for_best_model --greater_is_better True \
        --save_total_limit 3 --seed $seed --overwrite_output_dir

## Fine-tuning a model on QA task
ORCA has one QA task `qa`

In [None]:
task='qa'
cache_dir='./cache'
le=5e-5
model_name='UBC-NLP/MARBERTv2'
output_dir='./output/'+task
seed=41

In [None]:
!python run_orca_qa.py \
    --model_name_or_path  $model_name \
    --dataset_name  "UBC-NLP/orca" \
    --dataset_config_name $task \
    --cache_dir $cache_dir \
    --do_train \
    --do_eval \
    --do_predict \
    --per_device_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 3 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir $output_dir \
    --seed $seed #--overwrite_output_dir